Boosting grape bunch detection in RGB-D images using zero-shot annotation with Segment Anything and GroundingDINO

Devanna, R.P., Reina, G., Cheein, F.A.A. and Milella, A. (2024) Boosting grape bunch detection in RGB-D images using zero-shot annotation with Segment Anything and GroundingDINO. Computers and Electronics in Agriculture, 229. ISSN 01681699

[img]
Preview
Text
Fernando Auat Cheein Boosting grape bunch detection in RGB-D images VoR OCR UPLOAD.pdf - Published Version
Available under License Creative Commons Attribution.

Download (6MB) | Preview

Abstract

Latest advances in artificial intelligence, particularly in object recognition and segmentation, provide unprecedented opportunities for precision agriculture. This work investigates the use of state-of-the-art AI models, namely Meta’s Segment Anything (SAM) and GroundingDino, for the task of grape cluster detection in vineyards. Three different methods aimed at enhancing the instance segmentation process are proposed: (i) SAM-Refine (SAM-R), which refines a previously proposed depth-based clustering approach, referred to as DepthSeg, using SAM; (ii) SAM-Segmentation (SAM-S), which integrates SAM with a pre-trained semantic segmentation model to improve cluster separation; and (iii) AutoSAM-Dino (ASD), which eliminates the need for manual labeling and transfer learning through the combined use of GroundingDino and SAM. Analysis is conducted on both the object counting and pixel-level segmentation accuracy against a manually labeled ground truth. Metrics such as mean Average Precision (mAP), Intersection over Union (IoU), and precision and recall are calculated to assess the system performance. Compared to the original DepthSeg algorithm, SAM-R slightly advances object counting (mAP: +0.5%) and excels in pixel-level segmentation (IoU: +17.0%). SAM-S, despite a mAP decrease, improves segmentation accuracy (IoU: +13.9%, Precision: +9.2%, Recall: +11.7%). Similarly, ASD, although with a lower mAP, shows significant accuracy enhancement (IoU: +7.8%, Precision: +4.2%, Recall: +4.9%). Additionally, from a labor effort point of view, instance segmentation techniques require much less time for training than manual labeling.

Item Type: Article
Keywords: Grape bunch detection, Instance segmentation, Zero-shot networks, Precision agriculture, Agriculture robotics
Divisions: Engineering
Depositing User: Miss Anna Cope
Date Deposited: 21 Jan 2025 11:53
Last Modified: 21 Jan 2025 11:53
URI: https://hau.repository.guildhe.ac.uk/id/eprint/18166

Actions (login required)

Edit Item Edit Item