Real-world chemistry lab image dataset for equipment recognition across 25 apparatus categories

Background & Summary

In computer vision, object detection is a key task that enables automated systems to understand and interact with their environment[1](https://www.nature.com/articles/s41597-025-05952-3#ref-CR1 “Ma, D. Recent advances in deep learning based computer vision. 2022 International Conference on Computers, Information Processing and Advanced Education (CIPAE), 174–179, https://doi.org/10.1109/CIPAE55637.2022.00044

(2022).“),[2](https://www.nature.com/articles/s41597-025-05952-3#ref-CR2 “Bertolini, M., Mezzogori, D., Neroni, M. & Zammori, F. Machine learning for industrial applications: A comprehensive literature review. Expert Syst. Appl. 175, 114820, https://doi.org/10.1016/j.eswa.2021.114820

(2021).“). Similarly, in chemical laboratory operations, apparatus detectio…

Background & Summary

(2021).“). Similarly, in chemical laboratory operations, apparatus detection is essential for efficient resource management and optimal equipment usage across various sectors. Many institutions struggle with equipment not being used enough because of neglect and mismanagement, so tracking and monitoring are important to make sure everything is used efficiently[3](https://www.nature.com/articles/s41597-025-05952-3#ref-CR3 “Macayana, L. & Mangarin, R. Why schools lack laboratory and equipment in science? Through the lens of research studies. International Journal of Research and Innovation in Social Science 8, 2845–2840, https://doi.org/10.47772/IJRISS.2024.8100238

(2024).“). In areas like pharmaceutical analysis[4](https://www.nature.com/articles/s41597-025-05952-3#ref-CR4 “Razzak, M. I., Naz, S. & Zaib, A. Deep Learning for Medical Image Processing: Overview, Challenges and the Future. Classification in BioApps: Automation of Decision Making 26, 323–350, https://doi.org/10.1007/978-3-319-65981-7_12

(2018).“), apparatus detection ensures the effective use of instruments, helping to develop practical skills and readiness for real-world applications[5](https://www.nature.com/articles/s41597-025-05952-3#ref-CR5 “Zheng, H., Hu, B., Sun, Q., Cao, J. & Liu, F. Applying a chemical structure teaching method in the pharmaceutical analysis curriculum to improve student engagement and learning. Journal of Chemical Education 97, 421–426, https://doi.org/10.1021/acs.jchemed.9b00551

(2020).“). In industrial research, detecting equipment functionality facilitates the integration of portable, flexible devices with existing setups, enhancing precision, cost-effectiveness, and efficiency without requiring extensive modifications[6](https://www.nature.com/articles/s41597-025-05952-3#ref-CR6 “Camacho, A. F. J. et al. Diseño de un espectrofotómetro UV-VIS de bajo costo para la industria bioquímica: una revisión. Pädi Boletín Científico de Ciencias Básicas e Ingenierías del ICBI 9, 19–28, https://doi.org/10.29057/icbi.v9iEspecial2.7788

(2021).“),[7](https://www.nature.com/articles/s41597-025-05952-3#ref-CR7 “Lim, J. X. Y., Leow, D., Pham, Q. C. & Tan, C. H. Development of a robotic system for automatic organic chemistry synthesis. IEEE Transactions on Automation Science and Engineering 18, 2185–2190, https://doi.org/10.1109/TASE.2020.3036055

(2021).“). Recently, object detection has become increasingly important in automating laboratory operations, particularly in the chemical sciences[8](https://www.nature.com/articles/s41597-025-05952-3#ref-CR8 “Sasaki, R., Fujinami, M. & Nakai, H. Application of object detection and action recognition toward automated recognition of chemical experiments. Digital Discovery 3, 2458–2464, https://doi.org/10.1039/D4DD00015C

(2024).“). With the growth of deep learning methods, there is an increasing need for large, diverse, and high-quality datasets to train and evaluate these algorithms[9](https://www.nature.com/articles/s41597-025-05952-3#ref-CR9 “Shin, H. C. et al. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Transactions on Medical Imaging 35, 1285–1298, https://doi.org/10.1109/TMI.2016.2528162

(2016).“). However, object detection in chemical laboratories presents unique challenges, such as dealing with transparent materials, overlapping equipment, and complex backgrounds[10](https://www.nature.com/articles/s41597-025-05952-3#ref-CR10 “Xie, E. et al. Segmenting transparent objects in the wild with transformer. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), 1194–1200, https://doi.org/10.24963/ijcai.2021/165

(2021).“).

Various datasets have been proposed to support automated lab operations. For example, the CViG dataset was developed to detect chemistry glassware in real time and under varying lab conditions[11](https://www.nature.com/articles/s41597-025-05952-3#ref-CR11 “Cheng, X. et al. Intelligent vision for the detection of chemistry glassware toward AI robotic chemists. Artificial Intelligence Chemistry. 1, 100016, https://doi.org/10.1016/j.aichem.2023.100016

(2023).“). Another study introduced a dataset focused on improving lab safety by detecting personal protective equipment (PPE) compliance in real-time, reducing the need for manual safety checks[12](https://www.nature.com/articles/s41597-025-05952-3#ref-CR12 “Ali, L. et al. Development of YOLOv5-Based Real-Time Smart Monitoring System for Increasing Lab Safety Awareness in Educational Institutions. Sensors 22, 8820, https://doi.org/10.3390/s22228820

(2022).“). A different dataset includes hands of experimenters interacting with the apparatus, which helps simulate actual lab working conditions more accurately[13](https://www.nature.com/articles/s41597-025-05952-3#ref-CR13 “Sasaki, R., Fujinami, M., Nakai, H. Annotated Chemical Apparatus Image Dataset. https://doi.org/10.17632/8p2hvgdvpn.1

(2023).“). Other studies have investigated the automatic identification of lab equipment to enhance safety and adaptability in changing lab environments[14](https://www.nature.com/articles/s41597-025-05952-3#ref-CR14 “Ding, Z. et al. A new benchmark data set for chemical laboratory apparatus detection. Artificial Intelligence in Data and Big Data Processing 124, 201–210, https://doi.org/10.1007/978-3-030-97610-1_17

(2022).“), analyzed the importance of accurate object localization for robotic systems[15](https://www.nature.com/articles/s41597-025-05952-3#ref-CR15 “Zou, L. et al. A benchmark dataset in chemical apparatus: recognition and detection. Multimedia Tools and Applications 83, 26419–26437, https://doi.org/10.1007/s11042-023-16563-8

(2024).“), and addressed the challenges of recognizing transparent vessels and overlapping tools[16](https://www.nature.com/articles/s41597-025-05952-3#ref-CR16 “Eppel, S., Xu, H., Bismuth, M., Aspuru-Guzik, A. Vector-LabPics Dataset: Images of materials in transparent vessels for computer vision applications. University of Toronto - ChemSelfies Project https://www.cs.toronto.edu/chemselfies/

(2020).“). Additional research has shown that using AI-driven vision systems with lab robots helps spot missing pipette tips and wrong liquid levels in real-time detection, making lab work more accurate and reliable[17](https://www.nature.com/articles/s41597-025-05952-3#ref-CR17 “Khan, S. U., Møller, V. K., Frandsen, R. J. N. & Mansourvar, M. Real-time AI-driven quality control for laboratory automation: a novel computer vision solution for the Opentrons OT-2 liquid handling robot. Applied Intelligence 55, 524, https://doi.org/10.1007/s10489-025-06334-3

(2025).“). Besides, another study analyzes how combining object detection with action recognition can provide a deeper understanding of lab tasks[18](https://www.nature.com/articles/s41597-025-05952-3#ref-CR18 “Sasaki, R., Fujinami, M. & Nakai, H. Annotated Chemical Apparatus Image Dataset. Mendeley Data https://doi.org/10.17632/8p2hvgdvpn.1

(2023).“). After that, object detection has been used to monitor safety risks and assist with automated experiment documentation[19](https://www.nature.com/articles/s41597-025-05952-3#ref-CR19 “Sasaki, R., Fujinami, M. & Nakai, H. Application and integration of computer vision technologies for automated recognition and recording of chemical experiments. Bulletin of the Chemical Society of Japan 97, https://doi.org/10.1093/bulcsj/uoae110

(2024).“).

We present a dataset consisting of 4,599 images captured from actual laboratory environments. The dataset contains annotations for 25 commonly used chemical laboratory apparatuses, including Beaker, Conical Flask, Funnel, Glass Rod, Pipette, and so on. Unlike earlier datasets, our images cover a wide range of practical conditions such as occlusion, overlapping tools, varied lighting, different viewing angles, and partial visibility due to the complex environment shown in the figure. These characteristics make our dataset highly suitable for training deep learning models that need to perform well in non-ideal, real-world scenarios. The data are split into training, validation, and test sets to support robust model development and benchmarking. Our dataset directly addresses several challenges found in previous research studies:

Limited class diversity and real-world variability: Many existing datasets focus on a few types of glassware or tools. All the related datasets are cited in Table 1. Our dataset introduces 25 different classes in varied environments to reflect real chemical lab usage.

Inadequate support for downstream lab applications: Object detection plays a crucial role in enabling functions such as inventory tracking, automated experiment recording, and real-time safety alerts[20](https://www.nature.com/articles/s41597-025-05952-3#ref-CR20 “Hanafi, A., Elaachak, L. & Bouhorma, M. Machine learning based augmented reality for improved learning application through object detection algorithms. International Journal of Electrical and Computer Engineering (IJECE) 13, 1724–1733, https://doi.org/10.11591/ijece.v13i2.pp1724-1733

(2023).“). Our dataset is built to support these applications by providing diverse, annotated images that reflect realistic laboratory settings. Prior studies have emphasized the significance of identifying and localizing lab apparatus to improve robotic adaptability and enhance experimental safety[15](https://www.nature.com/articles/s41597-025-05952-3#ref-CR15 “Zou, L. et al. A benchmark dataset in chemical apparatus: recognition and detection. Multimedia Tools and Applications 83, 26419–26437, https://doi.org/10.1007/s11042-023-16563-8

(2024).“). Additionally, detecting apparatus accurately in complex environments has been shown to improve safety monitoring and experiment documentation[19](https://www.nature.com/articles/s41597-025-05952-3#ref-CR19 “Sasaki, R., Fujinami, M. & Nakai, H. Application and integration of computer vision technologies for automated recognition and recording of chemical experiments. Bulletin of the Chemical Society of Japan 97, https://doi.org/10.1093/bulcsj/uoae110

(2024).“).

This work presents one of the most diverse and realistic public datasets of chemical laboratory equipment detection. It supports a wide range of research directions including equipment tracking, experiment documentation, lab safety, and human-object interaction modeling in scientific environments.

Methods

Dataset creation: acquisition, annotation, and preprocessing workflow

Acquisition equipment

In developing a dataset for automated laboratory operations, incorporating images from devices with varying camera specifications is essential to ensure diverse data collection[21](https://www.nature.com/articles/s41597-025-05952-3#ref-CR21 “Liu, Y. et al. Tri-band vehicle and vessel dataset for artificial intelligence research. Scientific Data 12, 592, https://doi.org/10.1038/s41597-025-04945-6

(2025).“). As detailed in Table 2, the use of cameras with different resolutions, focal lengths, and sensor qualities allows for the capture of distinct visual details of lab equipment and environments. This diversity enhances the dataset’s robustness, enabling models to generalize more effectively across a wide range of real-world setups and conditions.

Data collection

Data for this study were gathered through fieldwork conducted in controlled laboratory settings at two locations in Dhaka, Bangladesh: the UIU Bio-Chemical Laboratory at United International University and Dhaka Imperial College. To capture a diverse range of image qualities and perspectives, four different smartphone cameras were used. This multi-device approach aimed to replicate the varying conditions under which chemical apparatuses may appear in real-world laboratory environments, enhancing the dataset’s realism and generalizability.

Dataset annotation and labeling

The dataset was annotated using Roboflow, an online platform that streamlines image labeling for object detection tasks. We employed bounding box regression[22](https://www.nature.com/articles/s41597-025-05952-3#ref-CR22 “Ravi, N., Naqvi, S. & El-Sharkawy, M. BIoU: An improved bounding box regression for object detection. Journal of Low Power Electronics and Applications 12, 51, https://doi.org/10.3390/jlpea12040051

(2022).“) to annotate each image. This method involves drawing rectangular boxes around objects to highlight their presence within an image. Each bounding box is defined by four parameters: width (bw) and height (bh), representing the dimensions of the box; class (c), denoting the object category; and center coordinates (bx, by), indicating the box’s position within the image. Bounding box regression is a widely adopted technique in object detection, enabling precise localization and classification of objects. Figure 1 illustrates the bounding box regression concept, while Fig. 2 shows the annotation process using Roboflow. Each pair of images (A–E and A(^{{\prime} )–E(}{\prime} )) shows the original lab setup and the corresponding annotated image with bounding boxes highlighting different types of laboratory equipment.

Fig. 1

Image annotation with bounding box regression.

Fig. 2

Visualization of bounding box annotations for lab apparatus.

Data preprocessing

When creating the dataset version in Roboflow, some preprocessing steps were applied to maintain consistency and improve model accuracy. First, the “Auto-Orient” feature ensures that all images are correctly aligned by fixing any rotation issues caused by EXIF data. Then, the “Resize” step resizes all images to 640 × 640 pixels, making them uniform and easier for the model to process efficiently. These adjustments help models learn better and perform more accurately in complex and real-world scenarios.

Experimental design

This experiment was designed to train object detection models for identifying chemical laboratory apparatus by first constructing a well-structured and diverse dataset. Images were captured across multiple locations to include a wide range of laboratory equipment and environmental conditions. To enhance robustness and generalizability, we varied lighting, backgrounds, and camera angles, enabling the models to learn object variations caused by real-world factors. This approach minimizes bias and improves model performance in practical applications.

After image collection, the files were stored in both local and cloud storage to ensure accessibility and secure backup. The dataset then underwent a data cleaning process, during which low-quality, blurry, or irrelevant images were removed to maintain a high standard for subsequent annotation and model training. Following cleaning, data annotation was conducted using Roboflow, where each object was accurately labeled with bounding boxes and corresponding class names. To ensure labeling consistency and precision, multiple annotators participated in the process. The annotated dataset was then refined in the data preparation phase, involving preprocessing techniques such as orientation correction, resizing, normalization, and contrast enhancement. Additionally, feature extraction was performed to emphasize key visual attributes critical for effective object detection.

To ensure effective model training, the dataset was randomly split into three subsets: 70% for training, 20% for validation, and 10% for testing. This distribution allowed the models to learn from a substantial portion of the data while reserving unseen samples to evaluate generalization performance. During the training phase, we configured hyperparameters, selected suitable neural network architectures, and trained the models using the training set. The validation set was used to fine-tune parameters and mitigate overfitting.

In the evaluation phase, further detailed in the technical validation, we assessed model performance using metrics such as precision, recall, confusion matrix, and mAP@50. We also conducted error analysis by identifying misclassified instances, correcting annotation inconsistencies, and refining preprocessing steps. When performance fell short of expectations, iterative adjustments were made across the pipeline. The overall experimental framework is depicted in Fig. 3.

Fig. 3

Framework of the experimental design pipeline.

Data Record

The dataset used in this study is freely available to everyone and can be accessed online through the Figshare platform[23](https://www.nature.com/articles/s41597-025-05952-3#ref-CR23 “Hossain, M. S. et al. Chemistry lab image dataset covering 25 apparatus categories. figshare https://doi.org/10.6084/m9.figshare.29110433.v3

(2025).“). This means that researchers, developers, and anyone interested can download and use the dataset for their experiments or analysis without any restrictions. Sharing the dataset openly helps promote transparency, reproducibility, and further research in the field. A detailed breakdown of the dataset structure and the type of information it contains is provided below.

Dataset distribution analysis

The dataset comprises 4,599 images spanning 25 classes, with a total of 6,960 annotated instances. To support balanced and unbiased model training, we employed a random splitting technique to divide the dataset into three subsets: 70% (3,220 images) for training, 20% (920 images) for validation, and 10% (459 images) for testing. In this method, images are shuffled and distributed without regard to their original order or grouping[24](https://www.nature.com/articles/s41597-025-05952-3#ref-CR24 “Kahloot, K. M. & Ekler, P. Algorithmic splitting: a method for dataset preparation. IEEE Access 9, 125229–125237, https://doi.org/10.1109/ACCESS.2021.3110745

(2021).“), minimizing the risk of bias that could arise from sorted or clustered data. This randomized approach ensures that each subset contains a representative mix of classes and conditions, thereby enhancing model generalization and performance. Images for each apparatus class were captured using all four smartphones to incorporate device-level variation in the dataset. This deliberate strategy helps reduce device-specific bias and introduces variation in lighting, resolution, and angle. The complete list of classes, their annotation counts, and the annotation distribution across devices is presented in Table 3.

Folder structure of the dataset

We created a custom dataset specially designed for our research on detecting and recognizing objects in chemical laboratories. The dataset folder includes important files like metadata.csv, which lists detailed information about each image, and data.yaml, which tells the model how to find and use the data.

The dataset is divided into three main folders: train, valid, and test, to organize the data effectively. Each folder has two subfolders: images, which contain uniquely named pictures of chemical apparatus, and labels, which hold text files with annotations. These annotations provide essential details like class labels and bounding box coordinates for object detection. Figure 4 demonstrates the folder structure.

Fig. 4

Folder structure of the dataset.

Metadata

The “metadata.csv” file, where relevant data concerning each image and corresponding annotations. There is one record for each image, including the name of the image used as an individual identifier for distinct images of chemical lab equipment. It also specifies the set type: train, valid, and test, which indicates the role of every image in the model-building process. For interpretability, every image is referenced against a Class_ID and a Class_Name that identify the object class identified within the image. The annotation also includes the X and Y coordinates and the Width and Height of the bounding box that define the exact position and size of the identified object in the image. In addition to that, a Device_Name is included to identify the target device on which each image was captured. The efficiently organized and structured nature of the metadata facilitates convenient data management and supports successful model training on object detection tasks. The fields included in “metadata.csv” are briefly described in Table 4.

Technical Validation

Evaluation matrics

In this study, several performance metrics are used to evaluate the object detection model. Precision is a performance metric that measures the proportion of correctly identified positive instances among all instances predicted as positive[25](https://www.nature.com/articles/s41597-025-05952-3#ref-CR25 “Padilla, R., Passos, W. L., Dias, T. L. B., Netto, S. L. & da Silva, E. A. B. A comparative analysis of object detection metrics with a companion open-source toolkit. Electronics 10, 279, https://doi.org/10.3390/electronics10030279

(2021).“). In object detection, it indicates the ratio of correctly predicted bounding boxes to the total number of predicted bounding boxes, reflecting the model’s accuracy in positive predictions (Equation 1). Recall measures the ability of the model to correctly identify all relevant objects within an image[26](https://www.nature.com/articles/s41597-025-05952-3#ref-CR26 “Fränti, P. & Mariescu-Istodor, R. Soft precision and recall. Pattern Recognition Letters 167, 115–121, https://doi.org/10.1016/j.patrec.2023.02.005

(2023).“). It is the ratio of correctly predicted positive instances to the total actual positives, indicating how well the model captures all target objects (Equation 2). Mean Average Precision (mAP) is a comprehensive evaluation metric used in object detection tasks to assess both precision and recall across different object classes[27](https://www.nature.com/articles/s41597-025-05952-3#ref-CR27 “Henderson, P. & Ferrari, V. End-to-end training of object class detectors for mean average precision. Computer Vision - ECCV 2016, 198–213, https://doi.org/10.1007/978-3-319-54193-8_13

(2017).“). It is calculated by first determining the Average Precision (AP) for each class, based on the area under the precision-recall curve, and then averaging the average precision scores over all classes (Equation 3). Additionally, the confusion matrix serves as a performance evaluation tool that summarizes classification outcomes by comparing the predicted outputs with the actual ground truth labels, indicating the numbers of True Positives, True Negatives, False Positives, and False Negatives[28](https://www.nature.com/articles/s41597-025-05952-3#ref-CR28 “Swaminathan, S. & Tantri, B. R. Confusion matrix-based performance evaluation metrics. African Journal of Biomedical Research 27, 4023–4031, https://doi.org/10.53555/AJBR.v27i4S.4345

(2024).“). In binary or multi-class classification, and object detection tasks involving classification, it provides a tabular representation in Table 5 of four key outcomes.

$$,{\rm{Precision}},=\frac{TP}{TP+FP}$$

(1)

$$,{\rm{Recall}},=\frac{TP}{TP+FN}$$

(2)

$$,{\rm{mAP}},=\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}A{P}_{i}$$

(3)

Key terms and abbreviations used in the evaluation metrics are defined as follows:

TP (True Positives): Correctly detected objects.

TN (True Negatives): Correctly identified the absence of objects.

FP (False Positives): Incorrectly detected objects.

FN (False Negatives): Objects that were present in the image but not detected by the model.

N: Total number of object classes.

APi Average Precision for class i.

Result analysis

Our dataset comprises images containing various laboratory objects captured using multiple smartphone cameras. Each image is annotated with bounding boxes and class labels to indicate the location and type of each object. To utilize this annotated data effectively for tasks such as automatic identification and localization of objects in new images, object detection models are essential. These models simultaneously classify and locate objects within an image[29](https://www.nature.com/articles/s41597-025-05952-3#ref-CR29 “Qaddour, J. Object detection performance: a comparative study. Research Square, https://doi.org/10.21203/rs.3.rs-3181849/v1

(2023).“), making them critical for applications like inventory management, real-time recognition, augmented reality, robotics, and smart device integration. Table 6 presents the distribution of images across the training, validation, and testing subsets, ensuring a balanced split for reliable model evaluation.

Among the most used object detection models are the different versions of the YOLO models[30](https://www.nature.com/articles/s41597-025-05952-3#ref-CR30 “Sharma, A., Kumar, V. & Longchamps, L. Comparative performance of YOLOv8, YOLOv9, YOLOv10, YOLOv11 and Faster R-CNN models for detection of multiple weed species. Smart Agricultural Technology 9, 100648, https://doi.org/10.1016/j.atech.2024.100648

(2024).“). YOLO models can process images fast and maintain high accuracy, making them well-suited for real-time object detection tasks, which makes the models valuable[31](https://www.nature.com/articles/s41597-025-05952-3#ref-CR31 “Jegham, N., Koh, C. Y., Abdelatti, M. & Hendawi, A. YOLO evolution: a comprehensive benchmark and architectural review of YOLOv12, YOLOv11, and their previous versions. arXiv, https://arxiv.org/abs/2411.00201

(2025).“). The RF-DETR model is also important because it uses advanced transformer-based architectures to detect objects with greater contextual understanding, particularly in cluttered or complex scenes. The flexibility of our dataset, exported in both YOLO and COCO formats, ensures it can be used to train a variety of object detection models, enabling broad applicability across research and real-world scenarios.

We evaluated our dataset using seven state-of-the-art object detection models, all achieving strong performance with mAP@50 scores exceeding 0.90. RF-DETR, a transformer-based model[32](https://www.nature.com/articles/s41597-025-05952-3#ref-CR32 “Sapkota, R., Cheppally, R. H., Sharda, A. & Karkee, M. RF-DETR object detection vs YOLOv12: a study of transformer-based and CNN-based architectures for single-class and multi-class greenfruit detection in complex orchard environments under label ambiguity. arXiv, https://doi.org/10.48550/arXiv.2504.13099

(2025).“), achieved the highest accuracy with an impressive mAP@50 of 0.992, while YOLOv11[33](https://www.nature.com/articles/s41597-025-05952-3#ref-CR33 “Khanam, R. & Hussain, M. YOLOv11: an overview of the key architectural enhancements. arXiv, https://doi.org/10.48550/arXiv.2410.17725

(2024).“) delivered the best results among all YOLO models with a score of 0.987. Other models, including YOLOv9[34](https://www.nature.com/articles/s41597-025-05952-3#ref-CR34 “Wang, C. Y., Yeh, I. H. & Mark Liao, H. Y. YOLOv9: learning what you want to learn using programmable gradient information. Computer Vision - ECCV 2024 15089, 1–21, https://doi.org/10.1007/978-3-031-72751-1_1

(2025).“) (0.986), YOLOv5[35](https://www.nature.com/articles/s41597-025-05952-3#ref-CR35 “Khanam, R. & Hussain, M. What is YOLOv5: a deep look into the internal features of the popular object detector. arXiv, https://doi.org/10.48550/arXiv.2407.20892

(2024).“) (0.985), YOLOv8[36](https://www.nature.com/articles/s41597-025-05952-3#ref-CR36 “Varghese, R. & Sambath, M. YOLOv8: a novel object detection algorithm with enhanced performance and robustness. 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS) 1–6, https://doi.org/10.1109/ADICS58448.2024.10533619

(2024).“) (0.983), YOLOv7[37](https://www.nature.com/articles/s41597-025-05952-3#ref-CR37 “Wang, C. Y., Bochkovskiy, A. & Liao, H. Y. M. YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 7464–7475, https://doi.org/10.1109/CVPR52729.2023.00721

(2023).“) (0.947), and YOLOv12[38](https://www.nature.com/articles/s41597-025-05952-3#ref-CR38 “Alif, M. A. R. & Hussain, M. YOLOv12: a breakdown of the key architectural features. arXiv, https://doi.org/10.48550/arXiv.2502.14740

(2025).“) (0.92), also demonstrated solid results. Detailed performance metrics such as Precision, Recall, and Mean Average Precision (mAP)[39](https://www.nature.com/articles/s41597-025-05952-3#ref-CR39 “Padilla, R., Netto, S. L. & da Silva, E. A. B. A survey on performance metrics for object-detection algorithms. International Conference on Systems, Signals and Image Processing, IWSSIP, 237–242, https://doi.org/10.1109/IWSSIP48289.2020.9145130

(2020).“) of the best two models are presented in Table 7.

To visualize detection results, we also include confusion matrices of best two models[40](https://www.nature.com/articles/s41597-025-05952-3#ref-CR40 “Ting, K. M. Confusion matrix. Encyclopedia of Machine Learning, 209, https://doi.org/10.1007/978-0-387-30164-8_157

(2010).“). These confusion matrices in Fig. 5 provide a detailed, class-by-class overview of how accurately the model classified 25 different laboratory objects. The diagonal cells show correct predictions for each class, while the off-diagonal cells highlight where the model confused one object with another. The accompanying color scale visualizes the frequency of correct and incorrect predictions, making it easy to spot patterns of misclassification. Overall, these matrices clearly illustrate the model’s strengths and weaknesses by comparing the true and predicted labels side by side.

Fig. 5

Confusion matrices of YOLOv11 and RF-DETR object detection models. (a) Confusion matrix of YOLOv11. (b) Confusion matrix of RF-DETR.

In addition to the quantitative metrics, we attach model-wise visualizations such as F1-Score curves, Precision-Recall (PR) curves, and Result Matrices for each object detection model[41](https://www.nature.com/articles/s41597-025-05952-3#ref-CR41 “Sokolova, M., Japkowicz, N. & Szpakowicz, S. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. AI 2006: Advances in Artificial Intelligence, Lecture Notes in Computer Science 4304, 1015–1021, https://doi.org/10.1007/11941439_114

(2006).“). These plots provide deeper insights into the performance characteristics of each model across different confidence thresholds. Figure 6 presents these curves, offering a detailed view of how each model performs across differing confidence thresholds, including the top-performing models YOLOv11 and RT-DETR.

Fig. 6

Performance analysis of YOLOv11 and RF-DETR object detection models. (a) F1 score versus confidence curve of YOLOv11. (b) Training and validation loss curves of YOLOv11. (c) Precision-recall curve of YOLOv11. (d) Training epoch evaluation metrics of RF-DETR.

Following the performance curves, we present visual examples of object detection results produced by each model. These Fig. 7 images showcase how each model identifies and localizes objects within various scenes from the test dataset.

Fig. 7

Object detection results of YOLOv11 and RF-DETR models. (a) YOLOv11 predictions on batch 0. (b) YOLOv11 predictions on batch 1. (c) RF-DETR output on annotated batch. (d) RF-DETR predictions vs annotations on batch 1.

In our research, the configuration parameters of the training environment are outlined in Table 8, and the hyperparameters used during the training process are listed in Table 9.

Usage Notes

The dataset follows the standard annotation format, so most of the object detection methods like the YOLO family methods (from YOLOv5 onward) and RF-DETR can readily use it. Each image is paired with a “.txt” annotation file containing class IDs and bounding box coordinates. Images are “.jpg” format and sorted into separate folders for training, validation, and testing. The dataset is freely available for researchers, institutions, and developers to use. Users are welcome to apply it to a wide range of different applications in laboratory settings. We also encourage users to convert the dataset into other formats (e.g., COCO (JSON), XML, CSV) using common open-source tools to suit their pipelines. Although no special software is required, we recommend using Python libraries such as OpenCV or Roboflow for previewing, pre-processing, or visualizing the data. We encourage users to share feedback, suggestions, and ideas, or to collaborate on future developments.

Code availability

The dataset was manually annotated using Roboflow (https://universe.roboflow.com/). Training used open-source object detection frameworks. YOLO models were implemented using Ultralytics codebase (https://github.com/ultralytics/ultralytics), and RF-DETR was trained using its official PyTorch repository (https://github.com/roboflow/rf-detr.git). We have shared all the code in a public GitHub repository: (https://github.com/SakhawatHossain/chem-lab-equipment-recognition.git). This includes scripts for training, evaluation, and visualizations, and simple guidance. Minor code modifications were made for compatibility, but no proprietary code was created. We suggest using a virtual environment to avoid dependency conflicts.

References

Ma, D. Recent advances in deep learning based computer vision. 2022 International Conference on Computers, Information Processing and Advanced Education (CIPAE), 174–179, https://doi.org/10.1109/CIPAE55637.2022.00044 (2022). 1.

Bertolini, M., Mezzogori, D., Neroni, M. & Zammori, F. Machine learning for industrial applications: A comprehensive literature review. Expert Syst. Appl. 175, 114820, https://doi.org/10.1016/j.eswa.2021.114820 (2021).

Article Google Scholar 1.

Macayana, L. & Mangarin, R. Why schools lack laboratory and equipment in science? Through the lens of research studies. International Journal of Research and Innovation in Social Science 8, 2845–2840, https://doi.org/10.47772/IJRISS.2024.8100238 (2024).

Article Google Scholar 1.

Razzak, M. I., Naz, S. & Zaib, A. Deep Learning for Medical Image Processing: Overview, Challenges and the Future. Classification in BioApps: Automation of Decision Making 26, 323–350, https://doi.org/10.1007/978-3-319-65981-7_12 (2018).

Article Google Scholar 1.

Zheng, H., Hu, B., Sun, Q., Cao, J. & Liu, F. Applying a chemical structure teaching method in the pharmaceutical analysis curriculum to improve student engagement and learning. Journal of Chemical Education 97, 421–426, https://doi.org/10.1021/acs.jchemed.9b00551 (2020).

Article ADS Google Scholar 1.

Camacho, A. F. J. et al. Diseño de un espectrofotómetro UV-VIS de bajo costo para la industria bioquímica: una revisión. Pädi Boletín Científico de Ciencias Básicas e Ingenierías del ICBI 9, 19–28, https://doi.org/10.29057/icbi.v9iEspecial2.7788 (2021).

Article Google Scholar 1.

Lim, J. X. Y., Leow, D., Pham, Q. C. & Tan, C. H. Development of a robotic system for automatic organic chemistry synthesis. IEEE Transactions on Automation Science and Engineering 18, 2185–2190, https://doi.org/10.1109/TASE.2020.3036055 (2021).

Article Google Scholar 1.

Sasaki, R., Fujinami, M. & Nakai, H. Application of object detection and action recognition toward automated recognition of chemical experiments. Digital Discovery 3, 2458–2464, https://doi.org/10.1039/D4DD00015C (2024).

Article CAS Google Scholar 1.

Shin, H. C. et al. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning. IEEE Transactions on Medical Imaging 35, 1285–1298, https://doi.org/10.1109/TMI.2016.2528162 (2016).

Article ADS PubMed Google Scholar 1.

Xie, E. et al. Segmenting transparent objects in the wild with transformer. Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), 1194–1200, https://doi.org/10.24963/ijcai.2021/165 (2021). 1.

Cheng, X. et al. Intelligent vision for the detection of chemistry glassware toward AI robotic chemists. Artificial Intelligence Chemistry. 1, 100016, https://doi.org/10.1016/j.aichem.2023.100016 (2023).

Article Google Scholar 1.

Ali, L. et al. Development of YOLOv5-Based Real-Time Smart Monitoring System for Increasing Lab Safety Awareness in Educational Institutions. Sensors 22, 8820, https://doi.org/10.3390/s22228820 (2022).

Article ADS PubMed PubMed Central Google Scholar 1.

Sasaki, R., Fujinami, M., Nakai, H. Annotated Chemical Apparatus Image Dataset. https://doi.org/10.17632/8p2hvgdvpn.1 (2023). 1.

Ding, Z. et al. A new benchmark data set for chemical laboratory apparatus detection. Artificial Intelligence in Data and Big Data Processing 124, 201–210, https://doi.org/10.1007/978-3-030-97610-1_17 (2022).

Article Google Scholar 1.

Zou, L. et al. A benchmark dataset in chemical apparatus: recognition and detection. Multimedia Tools and Applications 83, 26419–26437, https://doi.org/10.1007/s11042-023-16563-8 (2024).

Article Google Scholar 1.

Eppel, S., Xu, H., Bismuth, M., Aspuru-Guzik, A. Vector-LabPics Dataset: Images of materials in transparent vessels for computer vision applications. University of Toronto - ChemSelfies Project https://www.cs.toronto.edu/chemselfies/ (2020). 1.

Khan, S. U., Møller, V. K., Frandsen, R. J. N. & Mansourvar, M. Real-time AI-driven quality control for laboratory automation: a novel computer vision solution for the Opentrons OT-2 liquid handling robot. Applied Intelligence 55, 524, https://doi.org/10.1007/s10489-025-06334-3 (2025).

Article Google Scholar 1.

Sasaki, R., Fujinami, M. & Nakai, H. Annotated Chemical Apparatus Image Dataset. Mendeley Data https://doi.org/10.17632/8p2hvgdvpn.1 (2023). 1.

Sasaki, R., Fujinami, M. & Nakai, H. Application and integration of computer vision technologies for automated recognition and recording of chemical experiments. Bulletin of the Chemical Society of Japan 97, https://doi.org/10.1093/bulcsj/uoae110 (2024). 1.

Hanafi, A., Elaachak, L. & Bouhorma, M. Machine learning based augmented reality for improved learning application through object detection algorithms. International Journal of Electrical and Computer Engineering (IJECE) 13, 1724–1733, https://doi.org/10.11591/ijece.v13i2.pp1724-1733 (2023).

Article Google Scholar 1.

Liu, Y. et al. Tri-band vehicle and vessel dataset for artificial intelligence research. Scientific Data 12, 592, https://doi.org/10.1038/s41597-025-04945-6 (2025).

Article ADS PubMed PubMed Central Google Scholar 1.

Ravi, N., Naqvi, S. & El-Sharkawy, M. BIoU: An improved bounding box regression for object detection. Journal of Low Power Electronics and Applications 12, 51, https://doi.org/10.3390/jlpea12040051 (2022).

Article Google Scholar 1.

Hossain, M. S. et al. Chemistry lab image dataset covering 25 apparatus categories. figshare [https://doi.org/10.6084/m9.figshar

Background & Summary

Background & Summary

Limited class diversity and real-world variability: Many existing datasets focus on a few types of glassware or tools. All the related datasets are cited in Table 1. Our dataset introduces 25 different classes in varied environments to reflect real chemical lab usage.

Methods

Dataset creation: acquisition, annotation, and preprocessing workflow

Acquisition equipment

Data collection

Dataset annotation and labeling

Data preprocessing

Experimental design

Data Record

Dataset distribution analysis

Folder structure of the dataset

Metadata

Technical Validation

Evaluation matrics

TP (True Positives): Correctly detected objects.

TN (True Negatives): Correctly identified the absence of objects.

FP (False Positives): Incorrectly detected objects.

FN (False Negatives): Objects that were present in the image but not detected by the model.

N: Total number of object classes.

Result analysis

Usage Notes

Code availability

References

Similar Posts