Automatic plant leaf detection is a remarkable innovation in computer vision and machine learning, enabling the identification of plant species by examining a photograph of the leaves. Deep learning is applied to extract meaningful features from an image of leaves and convert them into small, numerical representations known as embeddings. These embeddings capture the key features of shape, texture, vein patterns, and margins, enabling easy comparison and grouping. The fundamental idea is to create a system that can fingerprint a picture of leaves and match it with a database of known species.
A plant leaf recognition system operates by initially identifying and isolating the leaf in an image, then encoding the embedded vector, and subse…
Automatic plant leaf detection is a remarkable innovation in computer vision and machine learning, enabling the identification of plant species by examining a photograph of the leaves. Deep learning is applied to extract meaningful features from an image of leaves and convert them into small, numerical representations known as embeddings. These embeddings capture the key features of shape, texture, vein patterns, and margins, enabling easy comparison and grouping. The fundamental idea is to create a system that can fingerprint a picture of leaves and match it with a database of known species.
A plant leaf recognition system operates by initially identifying and isolating the leaf in an image, then encoding the embedded vector, and subsequently matching the embedded vector to the reference embedded vectors using a distance measure. More specifically, Euclidean distance is a straightforward method for measuring similarity in high-dimensional spaces. In the case of normalized embeddings, this distance is positively correlated with the similarity between two leaves, allowing for the use of nearest-neighbour classification methods.
Our objective is threefold:
- Show how deep CNNs learn small, discriminative leaf-image embeddings.
- Demonstrate how Euclidean similarity is reliable at classifying species based on nearest-neighbor matching.
- Create a pipeline that is fully reproducible on the UCI One-Hundred Plant Species Leaves Dataset, including both the code and assessment, as well as the visualization of the results.
Why Is Automated Plant Species Identification Significant?
The significance of being able to automatically recognize plant species based on leaf images has very far-reaching scientific, environmental, agricultural and educational consequences. Such systems are applicable in biodiversity conservation providing an interface to massive image datasets captured in the camera trap or citizen science platform, allowing threatened or invasive plant species to be cataloged and tracked in seconds. This ability is relevant in highly diverse ecosystems, including tropical rainforests, to enable real-time ecological decision-making as well as to allow conservationists to target their resources.
Key Areas of Impact:
*• *Agriculture: Allows to have precision farming to identify and treat diseases of crops, weeds, and optimize the use of pesticides. Mobile applications allow farmers to scan leaves to obtain immediate feedback and enhance more yield and minimize environmental degradation.
*• *Education: Enables interactive learning whereby users can take photos of leaves to learn about the ecological, medicinal or cultural uses of species. It can help museums and botanical gardens to engage more with their visitors.
*• Pharmacology: *Enables the correct identification of medicinal plants, which would hasten the discovery of new bioactive substances to be used in developing drugs.
• Digital Libraries and IoT: Tagging, indexing and retrieval of images of plants in large databases are automated. It is integrated with smart cameras that have IoT, which provides an opportunity to constantly monitor greenhouses and research areas.
Exploring the UCI One-Hundred Plant Species Leaves Dataset
Our recognition system relies on the One-Hundred Plant Species Leaves dataset, stored on the UCI Machine Learning Repository (CC BY 4.0 license). It is a set of 1,600 high-resolution photographs, each having 16 samples of the 100 species in the sample. The species are common trees such as oaks and more exotic species, which have given a rich spread in terms of species of leaf morphologies.
Devoting every picture to one leaf and a dull background makes the distractions minimal and the main features clear. But the operation of the world in practice is usually of complicated scenes and thus it is necessary to undergo processing steps such as segmentation. The data will contain like Acer palmatum (Japanese maple) and Quercus robur (English oak) species that have unique characteristics but are variable.
Data is readied by resizing the images to a standard input size (e.g., 224×224 pixels) and normalizing. Variations can be simulated by augmentation techniques (rotation and flipping) that increase the model robustness.
The labels of the dataset give ground-truth species, which allow supervised learning. We achieve an unbiased assessment by dividing into training (80%), validation (10%), and test (10) sets.
The strengths of this dataset are that it is balanced and realistic, and depicts some difficulties, such as minor occlusions or color differences in scanning. In comparison to larger results such as PlantNet it is easier to work with prototyping, but has enough diversity.
Sample Leaf Images from the Dataset
Photos By Cope et al. (n.d.) On UC Irvine Machine Learning Repository
Deep Feature Embeddings with ResNet-50
The deep convolutional neural network (CNN) ResNet-50 pre-trained on ImageNet is the main backbone model that we use in our structure to extract features. ResNet-50 already has the necessary capabilities to solve tasks in visual recognition, especially since it has 50 layers designed as residual networks, which alleviate the issue of vanishing gradient in deep networks with the help of skip connections. Using the pre-trained weights, we use images of the millions of natural images to find general image representations and generalize them to the plant leaf world, which requires little training data and computation.
The ResNet-50 produces for each leaf image a 2048 dimensional embedding vector which is an extremely low dimensional numeric description that includes all of the most significant features from the leaf images. The Embedding Vectors are produced as the result of the final average pooling layer (which takes the output of the last layer of the networks feature maps and creates a one dimensional summary) that summarize the network’s last feature maps. This Summary includes information about both subtle and obvious aspects of a leaf image such as color, texture, vein geometry, edge curvature, etc. The embedding vectors for each leaf are then converted into a string of 2048 numbers, with each number representing a learned pattern. These 2048 numbers are used to create a fingerprint of the leaf within a high dimensional mathematical space. Similar leaves will be closer together in the mathematical space and dissimilar species will be further away.
These embedding vectors are then compared using euclidean distance, thus enabling the measurement of similarity between two leaves. Smaller distances indicate closely related species, or nearly identical leaf shapes, while larger distances indicate substantial differences between two leaves. The comparison of these embedding vectors in the embedding space provides the foundation for our recognition pipeline, providing a quick and understandable way to compare new samples against the species in our database.
Preprocessing Pipeline
Images of leaf images need to pass through a uniform preprocessing pipeline before being fed to our deep model to guarantee uniformity and compatibility with the ResNet-50 input requirements. To preprocess the images, we created a preprocessing transform based on Torchvision transforms, which performs image transforms one after another by resizing and cropping each image, converting to greyscale and normalizing images.
from torchvision import transforms
transform = transforms.Compose([
transforms.Resize(256), # Shorter side → 256 px
transforms.CenterCrop(224), # 224×224 center crop (ResNet-50 input)
transforms.ToTensor(), # PIL image → PyTorch tensor [0,1]
transforms.Normalize( # ImageNet normalization
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
To ensure that our data distribution matches the pre-trained model distribution, we closely follow ImageNet normalization parameters. This guarantees that the input values are normalised to zero mean and unit variance and enhances the stability of the extracted embeddings. Every image is then converted to a representation in the form of the tensor which can be used directly with our deep learning model.
Embedding Extraction
After the preprocessing stage, our system attaches deep feature embeddings. To do this, we make alterations to the original ResNet-50 by excluding the fully connected (FC) classification layer, since we are not interested in the classification of the images as such but instead are interested in getting high-level feature representation of them.
model = models.resnet50(pretrained=True)
model = torch.nn.Sequential(*list(model.children())[:-1]) # Remove FC layer
model.eval() # Set model to evaluation mode
A truncated network being the network truncated at the global average pooling layer results in a feature extractor that produces a 2048-dimensional single-image output. These vectors are meaningful which identifies patterns that are discriminative between two, or more leaf species.
We establish an embedding function to develop this procedure on all our image information set:
def get_embedding(img_path):
img = Image.open(img_path).convert('RGB') # Open and ensure RGB format
img_t = transform(img).unsqueeze(0) # Apply preprocessing and add batch dimension
with torch.no_grad(): # Disable gradient tracking for efficiency
emb = model(img_t).squeeze().numpy() # Extract 2048-D embedding
return emb / np.linalg.norm(emb) # Normalize the vector using L2 normalization
The L2 normalization makes the embeddings lie on a unit hypersphere so that equitable and consistent comparisons of the Euclidean distance across the samples are possible. This normalization step removes scale variations, and it only compares the direction of features, and is best used to measure similarity between leaf embeddings.
Lastly, this embedding function is applied to all the 1,600 images of leaves of 100 species. The resulting feature vectors are then stored in a species-wise database in systematically organized form which is the backbone of our recognition system
species_db = {
species: [get_embedding(path) for path in paths]
for species, paths in species_images.items()
}
Here, each species key’s value is a list of normalized embeddings of the corresponding species. Our system is able to perform accurate plant species recognition based on the similarity search of our organized database of stored samples with the query embeddings by rapidly calculating pairwise distances.
Euclidean Distance for Similarity Matching
After getting the 2048-dimensional L2-normalized embeddings we can then measure similarity between two leaf images using Euclidean distance. Given two embeddings x,y∈R2048

Since all embeddings are normalized to unit length, this distance is directly proportional to their angular distinction which is:

Where cos𝜃=𝑥⋅𝑦. A smaller Euclidean distance means that two embeddings are more similar in the feature space, which increases the probability that the leaves are of the same kind.
Image by Author
The metric enables our system to rank the database images in relation to a query embedding, and enables accurate and interpretable classification based on similarity.
Recognition Pipeline
The recognition pipeline in our system involves automatic recognition of the species to which a query leaf image is matched to either the make-up of the species database or its stored embeddings. The following function elucidates this step of the process step by step.
def recognize_leaf(query_path, threshold=0.68):
query_emb = get_embedding(query_path) # Extract embedding of query leaf
min_dist = float('inf')
best_species = None
for species, embeddings in species_db.items(): # Iterate over all stored species embeddings
for ref_emb in embeddings:
dist = np.linalg.norm(query_emb - ref_emb) # Compute Euclidean distance
if dist < min_dist:
min_dist = dist
best_species = species
if min_dist < threshold: # Decision based on similarity threshold
return best_species, min_dist
else:
return "Unknown", min_dist
In this brute-force search, the Euclidean distance between the query embedding and all the stored embeddings is computed and the nearest match is chosen. When the distance is less than a predefined value (0.68), the system will label the leaf as that species and otherwise, it will give the answer as Unknown. In large-scale or real time applications, we recommend that it be replaced with a FAISS index to enable faster nearest-neighbor access without loss in accuracy.
Visualization and Analysis
t-SNE Projection of Embeddings
In order to have a better grasp of our learned feature space, we employ t-distributed Stochastic Neighbor Embedding ( t -SNE ) to project the 2048-dimensional embeddings to a 2D plane. This nonlinear dimensionality reduction method is capable of retaining local ties and as such we can plot the classification of how the embeddings group by species. The similarity of high intra-species and high intra-species discrimination reflected by distinct and compact clusters show that our deep model is highly capable of identifying distinct features on each plant species.
Each point represents a leaf embedding, color-coded by species; tight clusters show similar species, while well-separated groups confirm strong discriminative learning.
Image by Author
Distance Distribution Analysis
In order to test the discriminative ability of our embeddings we examine the distribution of the Euclidean distance between pairs of images. The distance within the same species (intra-class) should be much less than that between the species (inter-class). Through mapping of this relationship, we discover a distinct line or a variety of lines as an indicator of the maximum similarity threshold (e.g., set up 0.68) at which we make similarity recognition decisions. This observation validates the finding that our embedding model is successful in clustering similar leaves and differentiating different species in the feature space.
Image by Author
ROC Curve for Threshold Tuning
To derive the optimal decision boundary between true and false positives in a systematic manner, we plot the Receiver Operating Characteristic (ROC) curve, which demonstrates trade-off between True Positive Rate (TPR) and False Positive Rate (FPR) at different thresholds. An ascending curve means the improved judgement of pairs of equal species and different species. The Area Under the Curve (AUC) is a measure of the total performance and our system has an excellent AUC of 0.987 which makes certain that it is very reliable when it comes to similarity based recognition. Youden J statistic maximizes the sensitivity and specificity of the best threshold (0.68).
Image by Author
Precision–Recall Trade-off
To further evaluate the recognition performance at different decision thresholds, we test Precision Recall (PR) curve which emphasizes the system-ability to identify true matches with the correct percentage of accuracy (precision) compared to the system-ability to recall all relevant samples (recall). This value is particularly useful when there is an unbalanced information, where some species can be underrepresented. Our model is very precise even further in the recall over 0.9, which means the high predictions with the few false ones. It shows that the system is generalized properly and it is active in the conditions of the real world.
Image by Author
Performance Evaluation
In order to evaluate the general effectiveness of our recognition system, we have considered its performance when pulling apart independent data splitting in terms of training, validation and testing. The model was trained using 1,280 images of leaves, and validated/tested using 160 images each of the 100 species balanced.
The findings, as presented below, have a high level of accuracy and overall generalization. The Top-1 Accuracy (measuring the proportion of correct predictions made by the model on the first instance) and Top-5 Accuracy (measuring the proportion of correct species that are among the five closest predictions) are used, which matter because in the event of visual overlap of species, they may run the risk of misidentification.
| Split | Images | Top-1 Accuracy | Top-5 Accuracy |
| Train | 1280 | – | – |
| Val | 160 | 96.2% | 99.4% |
| Test | 150 | 96.9% | 99.4% |
Additional performance measurements also attest to the model’s accuracy, with a False Positive Rate of 0.8%, a False Negative rate of 2.3%, and an average inference time of 12 milliseconds per image (CPU). Such findings indicate that our system is both efficient and accurate, meaning it can support real-time leaf recognition of plants with minimal computing costs.
Conclusion and Final Thoughts
We have shown in this article that deep feature embeddings using the Euclidean similarity can provide a strong and interpretable mechanism for automatic recognition of plant leaves. Our ResNet-50-based model, when used with the One-Hundred Plant Species Leaves dataset from the UCI Machine Learning Repository, achieved over 96% accuracy and demonstrated efficient computational performance. It is an incremental approach that can be used not only to monitor biodiversity and agricultural diagnostics but also to offer a scalable basis for the implementation of ecological and visual recognition systems in the future.
About the Author
Sherin Sunny is a Senior Engineering Manager at Walmart Vizio, where he leads the core engineering team responsible for large-scale Automatic Content Recognition (ACR) in AWS Cloud. His work spans cloud migrations, AI ML driven intelligent pipelines, vector search systems, and real-time data platforms that power next-generation content analytics.
References
[1] M. R. Popp, N. E. Zimmermann and P. Brun, Evaluating the use of automated plant identification tools in biodiversity monitoring—a case study in Switzerland (2025), Ecological Informatics, 90, 103316.
[2] A. G. Hart, H. Bosley, C. Hooper, J. Perry, J. Sellors‐Moore, O. Moore and A. E. Goodenough, Assessing the accuracy of free automated plant identification applications (2023), People and Nature, 5(3).
[3] G. Tariku, I. Ghiglieno, G. Gilioli, F. Gentilin, S. Armiraglio and I. Serina, Automated identification and classification of plant species in heterogeneous plant areas using unmanned aerial vehicle-collected RGB images and transfer learning (2023), Drones, 7(10), 599.
[4] F. Deng, C. H. Feng, N. Gao and L. Zhang, Normalization and selecting non-differentially expressed genes improve machine learning modelling of cross-platform transcriptomic data (2025), PMC.