of Contents
📄Python Notebook 🍯Introduction 🔍Example ABC Agent Search Progress ⏳Agent Lifecycle in Swarm Optimization 🐝The 3 Bee Agent Roles 🪻Iris Dataset ❄ Clustering – No labels? No problem! 🏋️Fitness Model for Clustering 🤔Confusion Matrix as a Diagnostic Tool 🏃Running the Agentic AI Loop 📊Reporting Results 💬Designing Agent Prompts for Gemini ⚠️Gemini Agentic AI Issues ⚔️[Agentic AI Competitive Landscape towards 2026](#AgenticAICompetitiv…
of Contents
📄Python Notebook 🍯Introduction 🔍Example ABC Agent Search Progress ⏳Agent Lifecycle in Swarm Optimization 🐝The 3 Bee Agent Roles 🪻Iris Dataset ❄ Clustering – No labels? No problem! 🏋️Fitness Model for Clustering 🤔Confusion Matrix as a Diagnostic Tool 🏃Running the Agentic AI Loop 📊Reporting Results 💬Designing Agent Prompts for Gemini ⚠️Gemini Agentic AI Issues ⚔️Agentic AI Competitive Landscape towards 2026 ✨Conclusion and Future Work
📄Python Notebook
Explore my interactive notebook on Google Colab — and feel free to connect with me on LinkedIn for any questions or feedback.
🍯 Introduction
With the incredible innovation going on around Agentic AI, I wanted to get hands‑on with a project that integrates LLM prompts into a Data Science workflow. The Artificial Bee Colony (ABC) algorithm is inspired by honey bees’ foraging behavior and works remarkably well in nature. It belongs to the family of swarm intelligence algorithms, designed for decentralized decision‑making processes whereby “bee agents” pursue their individual goals autonomously, while collectively improving the quality of the overall solution (the “honeypot”).
This popular technique has been widely applied to many fields, in particular: scheduling, routing, energy optimization, resource allocation and anomaly detection. Researchers often combine ABC with neural networks in a hybrid approach, for example, using ABC to tune hyperparameters or optimize model weights. The algorithm is particularly relevant when data is scarce or when the problem is combinatorial – when the solution space grows exponentially (or even factorially) with the number of features.
In this project, my approach has been to mimic Swarm Optimization for an Adaptive Grid Search. The creative twist is that I applied Google’s new Agentic AI tools to implement the bee agents. In the ABC algorithm, there are three types of autonomous bee agents, and I defined their roles using text prompts powered by the latest Gemini LLMs.
Each foraging cycle (algorithm iteration) proceeds as follows:
- Scout bees explore → discover new food sources** (candidate solutions)**.
- Employed bees exploit → refine those sources and dance to share information about the quality of the nectar (fitness function).
- Onlooker bees exploit further → guided by the dances, they **reinforce **the colony’s focus on the best food sources.
🔍Example ABC Agent Search Progress
Source: Author
⏳Agent Lifecycle in Swarm Optimization
The ABC algorithm was first proposed by Derviş Karaboğa in 2005. In my modernized meta‑heuristic adaptation, I focused on the goal of improving clustering performance for an unsupervised dataset.
Below are the Python classes I implemented:
- WebResearcher: Responsible for researching and summarizing scikit-learn clustering algorithms and their key hyperparameters. The information gathered is crucial for generating accurate and effective prompts for the bee agents, and this class is implemented as an LLM‑based agent.
- ScoutBeeAgent: Generates diverse initial candidate clustering solutions for the Iris dataset, leveraging the parameter summaries provided by the WebResearcher.
- EmployedBeeAgent: Refines existing candidate solutions by exploring local parameter neighborhoods, using the WebResearcher’s insights to make informed adjustments.
- OnlookerBeeAgent: Evaluates the generated and refined candidates, selecting the most promising ones to carry forward to the next iteration.
- Runner: **Orchestrates **the overall ABC optimization loop, organizing and coordinating the Gemini AI agent flow. It manages sequencing between the different bee agents and tracks global progress. While the Runner ensures structure and oversight, each bee agent operates in a fully distributed and autonomous manner, independently performing its specialized tasks without centralized control.
- FitnessModel: Evaluates the quality of each candidate solution using the Adjusted Rand Index (ARI), with the objective of minimizing 1 – ARI to achieve better clustering solutions.
- Reporter: Visualizes the convergence of the best ARI values over iterations and compares the top‑performing solutions against baseline clustering models.
🐝The 3 Bee Agent Roles
The agents determine parameter values and ranges through natural language prompts provided to the Gemini generative AI model. All three agents inherit from the BeeAgent base class, which handles shared setup and candidate tracking. Part of each prompt is informed by the WebResearcher, which summarizes scikit-learn clustering algorithms algorithms and their key hyperparameters to ensure accuracy and relevance. Here’s how each agent works:
- 🐝ScoutBeeAgent (Initial Parameter Generation): Constructs prompts that allow the LLM some creativity within defined constraints. The allowed_algorithms parameter guides which models to consider from the popular clustering algorithms in scikit‑learn. The Gemini model interprets these instructions and generates diverse candidate solutions, ensuring no duplicates and balanced distribution across algorithms.
- 🐝EmployedBeeAgent (Parameter Refinement): Generates prompts with refining instructions, directing the LLM to adjust parameters by approximately ±10–20%, remain within valid ranges, and avoid inventing unsupported parameters. It takes the current solutions and applies these rules to create slightly varied (refined) candidates within the local neighborhood of the existing parameter space.
- 🐝OnlookerBeeAgent (Evaluation and Selection): Produces prompts that evaluate the candidates generated and refined by the other agents. Using a fitness score based on the Adjusted Rand Index (ARI), it selects the top‑k promising solutions, maintains algorithm diversity, and avoids duplicates. This reinforces the colony’s focus on the strongest candidates.
In essence, the Python code defines the task goal, parameters, constraints, and return values as text within the prompts. The generative AI model (Gemini) then “reads” and “understands” these instructions to produce or modify the actual numerical and categorical parameter values for the clustering algorithms. Different LLMs may respond differently to subtle changes in the input text, so it is important to experiment with the wording of prompts for the three agent classes. To refine the wording further, you can always consult your preferred LLM.
🪻Iris Dataset
A *natural *choice for this study is Sir Ronald Fisher’s classic Iris flower dataset, introduced in his 1936 paper. In the subsequent sections, this dataset is utilized as a small, well‑defined demonstration case to illustrate how the proposed ABC optimization method can be applied within the context of a clustering problem.
The Iris dataset (License : CC0 1.0) comprises 150 labeled samples, each belonging to one of 3 Iris classes: Iris Setosa, Iris Versicolor, Iris Virginica. Each flower sample is associated with 4 numeric features: Sepal length, Sepal width, Petal length, Petal width.
Source: Flores de Íris by Dcbmariano via Wikimedia Commons, licensed under CC BY‑SA 4.0.
Source: Author (see Google Colab notebook)
Source: Author (see Google Colab notebook)
As shown in both the pairwise relationship plots and the mutual information feature‑importance plots, petal length and petal width are by far the most informative features when measured against the target labels of the Iris dataset.
Mutual Information (MI) is computed feature‑wise with respect to the labels, whereas the Adjusted Rand Index (ARI), used in this project for fitness evaluation, measures the agreement between two partitions (predicted cluster labels versus true labels). Note that even if feature selection is applied, since Iris Versicolor and Iris Virginica share similar petal lengths and widths, their clusters overlap in feature space. As a result, the ARI can be strong but cannot reach a perfect score of 1.0.
❄ Clustering – No labels? No problem!
**Clustering **algorithms are a cornerstone of unsupervised learning and so I chose to focus on the goal of blindly determining the flower classes based solely on their features. In other words, the model was not trained on the flower labels; those labels were used only to validate performance metrics. Traditional clustering algorithms such as **KMeans **or **DBSCAN **often struggle with parameter sensitivity and dataset variability. Therefore, a meta-heuristic like ABC, which balances exploration vs exploitation, appears promising.
Note that in clustering algorithms, parameters should technically be referred to as hyperparameters, because they’re not learned from the data during training (as weights in a neural network or regression coefficients are) but they are set externally. Nevertheless, for brevity, they are often referred to as parameters.
Here’s a concise visual comparison of different clustering algorithms applied to several toy datasets, different colors represent different clusters that each algorithm found for 2D representations:
Source: Image from the scikit‑learn documentation (BSD 3‑Clause License)
In the classic Iris dataset, the two most similar species — versicolor and virginica — often pose a challenge for clustering algorithms. Many methods mistakenly group them into a single cluster, treating them as one continuous dense region. In contrast, the more distinct setosa species is consistently identified as a separate cluster.
Table comparing several popular clustering algorithms available in the scikit‑learn library:
| Algorithm | Summary | Key Hyperparameters | Efficiency | Accuracy |
| KMeans | Centroid-based, partitions data into k spherical clusters; simple and fast. | n_clusters, init, n_init, max_iter, random_state, tol | Fast on medium–large datasets; scales well; benefits from multiple restarts. | Strong for well-separated, convex clusters; poor on non-convex or varying-density shapes. |
| DBSCAN | Density-based, finds arbitrarily shaped clusters and marks noise without needing k. | eps, min_samples, metric, leaf_size | Moderate; slower in high dimensions; efficient with spatial indexing. | Excellent for irregular shapes and noise; sensitive to eps and density differences. |
| Agglomerative (Hierarchical) | Builds a dendrogram by iteratively merging clusters; no fixed k until cut. | n_clusters, affinity, linkage, distance_threshold | Slower (often O(n²)); memory-heavy for large n. | Good structural discovery; linkage choice impacts results; handles non-spherical clusters. |
| Gaussian Mixture Models (GMM) | Probabilistic mixture of Gaussians using EM (Expectation Maximization); soft assignments. | n_components, covariance_type, tol, max_iter, n_init, random_state | Moderate; EM can be costly with full covariance. | High when data is near-Gaussian; flexible shapes; risk of overfitting without constraints. |
| Spectral clustering | Graph-based; embeds data via eigenvectors before clustering (often KMeans). | n_clusters, assign_labels, n_neighbors, random_state, affinity | Slow on large n due to eigen-decomposition; best for small–medium sets. | Strong for manifold/complex structures; quality hinges on graph construction and affinity. |
| MeanShift | Mode-seeking via kernel density; no need to predefine k. | bandwidth, cluster_all, max_iter, n_jobs | Slow; expensive with many points/features. | Good for discovering cluster modes; performance highly dependent on bandwidth choice. |
Source: Table by author, generated with GPT-5
K‑Means as a Basic Clustering Example
K‑Means is among the most widely used clustering algorithms, valued for its simplicity and efficiency. Because of its prevalence, I will outline it here in more detail as a representative example of how clustering is commonly performed. Its popularity comes from its simplicity and efficiency, though it does have limitations. A key drawback is that the number of clusters k must be specified in advance.
How K‑Means Works
- Initialize Centroids: Select k starting centroids, either randomly or with smarter strategies like K‑Means++, which spreads them out to improve clustering quality.
- Assign Points to Clusters: Represent each data point as an n-dimensional vector, where each component corresponds to one feature. Assign points to the nearest centroid using a distance metric (commonly Euclidean). In high‑dimensional spaces, this step is complicated by the Curse of Dimensionality, where distances lose discriminative power.
- Update Centroids & Repeat: Recompute each centroid as the mean of all points in its cluster, then reassign points to the nearest centroid. Repeat until assignments stabilize — this is convergence.
Practical Considerations
- Curse of Dimensionality: In very high dimensions, distance metrics become less effective, reducing clustering reliability.
- Dimensionality Reduction: Techniques like PCA or t‑SNE are often applied before K‑Means to simplify the feature space and improve results.
- Choosing K: Methods such as the Elbow Method, Silhouette Score, or meta‑heuristics (e.g., ABC optimization) help estimate the optimal number of clusters.
🏋️Fitness Model for Clustering
The FitnessModel evaluates clustering candidate solutions on a dataset. The goal of a good clustering algorithm is to produce clusters that ideally map closely to the true classes but usually it’s not a perfect match. ARI (Adjusted Rand Index) is used to measure the similarity between two clusterings (predicted vs. ground truth) – it is a widely used metric for evaluating clustering performance because it corrects for chance agreement, works across different clustering algorithms, and provides a clear scale from −1 to +1 that’s easy to interpret.
| ARI Range | Meaning | Typical Edge Case Scenario |
| +1.0 | Perfect agreement | Predicted clustering exactly matches ground truth labels |
| ≈ 0.0 | Random clustering (chance level) | – Assignments are random- All points forced into one cluster (unless ground truth is also one cluster) |
| < 0.0 | Worse than random | – Systematic disagreement (clusters consistently mismatched or flipped)- Each point its own cluster when ground truth is different |
| Low/Negative (close to −1) | Strong disagreement | Extreme imbalance or mislabeling across clusters |
Source: Table by author, generated with GPT-5
Fitness = 1 – ARI, so lower fitness is better. This allows ABC to directly optimize clustering quality. Shown below is an example run for the initial iterations of an ABC with Gemini Agents that I developed including a preview of the LLM raw response texts. Note how the GMM (Gaussian Mixture Models) steadily improves as new candidates are selected on each iteration by the different bee agents. Refer to the Google Colab notebook for the logs for more iterations.
Starting ABC run with Fitness Model for dataset: Iris
Features: 4, Classes: 3
Baseline Models (ARI): {'DBSCAN': 0.6309344087637648, 'KMeans': 0.6201351808870379, 'Agglomerative': 0.6153229932145449, 'GMM': 0.5164585360868599, 'Spectral': 0.6451422031981431, 'MeanShift': 0.5681159420289855}
Runner: Initiating Scout Agent for initial solutions...
Scout Generating initial candidate solutions...
Scout : Sending prompt to Gemini model... n_candidates=12
Scout : Received response from Gemini model.
Scout : Raw response text: ```json[{"model":"KMeans","params":{"n_clusters":3,"init":"k-means++","n_init":10,"random_state":42}},{"model":"KMeans","params":{"n_clusters":4,"init":"random","n_init":10,"random_state":42}},{"model":"KMeans","params":{"n_clusters":5,"init":"k-mean...
Scout : Initial candidates generated.
Runner: Scout Agent returned 12 initial solutions.
Runner: Starting iteration 1/8...
Runner: Agents completed actions for iteration 1.
--- Iteration 1 Details ---
GMM Candidate 1 (Origin: Scout-10010) : Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
KMeans Candidate 2 (Origin: Scout-10000): Best previous ARI=0.620, Current ARI=0.620, Params: {'n_clusters': 3, 'init': 'k-means++', 'n_init': 10, 'random_state': 42}
DBSCAN Candidate 3 (Origin: Scout-10004): Best previous ARI=0.550, Current ARI=0.550, Params: {'eps': 0.7, 'min_samples': 4}
GMM Candidate 4 (Origin: Scout-10009) : Best previous ARI=0.820, Current ARI=0.516, Params: {'n_components': 3, 'covariance_type': 'full', 'max_iter': 100, 'random_state': 42}
KMeans Candidate 5 (Origin: Scout-10001): Best previous ARI=0.620, Current ARI=0.462, Params: {'n_clusters': 4, 'init': 'random', 'n_init': 10, 'random_state': 42}
DBSCAN Candidate 6 (Origin: Scout-10003): Best previous ARI=0.550, Current ARI=0.442, Params: {'eps': 0.5, 'min_samples': 5}
KMeans Candidate 7 (Origin: Scout-10002): Best previous ARI=0.620, Current ARI=0.435, Params: {'n_clusters': 5, 'init': 'k-means++', 'n_init': 5, 'random_state': 42}
DBSCAN Candidate 8 (Origin: Scout-10005): Best previous ARI=0.550, Current ARI=0.234, Params: {'eps': 0.4, 'min_samples': 6}
*** Global Best so far: ARI=0.820, Candidate={'model': 'GMM', 'params': {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}, 'origin_agent': 'Scout-10010', 'current_ari_for_display': 0.8202989638185834}
-----------------------------
Runner: Starting iteration 2/8...
Scout Generating initial candidate solutions...
Scout : Sending prompt to Gemini model... n_candidates=12
Employed Refining current solutions...
Employed : Sending prompt to Gemini model... n_variants=12
Onlooker Evaluating candidates and selecting promising ones...
Onlooker : Sending prompt to Gemini model... top_k=5
Scout : Received response from Gemini model.
Scout : Raw response text: ```json[{"model":"KMeans","params":{"n_clusters":3,"init":"k-means++","n_init":10,"random_state":42}},{"model":"KMeans","params":{"n_clusters":4,"init":"random","n_init":10,"random_state":42}},{"model":"KMeans","params":{"n_clusters":5,"init":"k-mean...
Scout : Initial candidates generated.
Employed : Received response from Gemini model.
Employed : Raw response text: ```json[{"model":"GMM","params":{"n_components":5,"covariance_type":"tied","max_iter":100,"random_state":42}},{"model":"GMM","params":{"n_components":3,"covariance_type":"full","max_iter":100,"random_state":42}},{"model":"KMeans","params":{"n_cluster...
Employed : Solutions refined.
Onlooker : Received response from Gemini model.
Onlooker : Raw response text: ```json[{"model":"GMM","params":{"n_components":4,"covariance_type":"tied","max_iter":100,"random_state":42}},{"model":"KMeans","params":{"n_clusters":3,"init":"k-means++","n_init":10,"random_state":42}},{"model":"DBSCAN","params":{"eps":0.7,"min_sam...
Onlooker : Promising candidates selected.
Runner: Agents completed actions for iteration 2.
--- Iteration 2 Details ---
GMM Candidate 1 (Origin: Scout-10022) : Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
GMM Candidate 2 (Origin: Scout-10010) : Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
GMM Candidate 3 (Origin: Onlooker-30000): Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
GMM Candidate 4 (Origin: Employed-20007): Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 80, 'random_state': 42}
GMM Candidate 5 (Origin: Employed-20006): Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 120, 'random_state': 42}
GMM Candidate 6 (Origin: Employed-20000): Best previous ARI=0.820, Current ARI=0.693, Params: {'n_components': 5, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
KMeans Candidate 7 (Origin: Scout-10012): Best previous ARI=0.620, Current ARI=0.620, Params: {'n_clusters': 3, 'init': 'k-means++', 'n_init': 10, 'random_state': 42}
KMeans Candidate 8 (Origin: Scout-10000): Best previous ARI=0.620, Current ARI=0.620, Params: {'n_clusters': 3, 'init': 'k-means++', 'n_init': 10, 'random_state': 42}
*** Global Best so far: ARI=0.820, Candidate={'model': 'GMM', 'params': {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}, 'origin_agent': 'Scout-10010', 'current_ari_for_display': 0.8202989638185834}
Source: Author (see Google Colab notebook)
🤔Confusion Matrix as a Diagnostic Tool
While the Adjusted Rand Index (ARI) provides a single score for clustering quality, the Confusion Matrix reveals where misclassifications occur by showing how true classes are distributed across predicted clusters.
In the Iris dataset, scikit‑learn encodes the species in a fixed order:
0 = Setosa, 1 = Versicolor, 2 = Virginica.
Even though there are only three true species, the algorithm below mistakenly produced four clusters. The matrix illustrates this mismatch:
[[ 0 6 44 0] [ 2 0 0 48] [49 0 0 1 ] [ 0 0 0 0 ]]
⚠️ Note: The order of the columns (clusters) does not necessarily correspond to the order of the rows (true classes). Cluster IDs are arbitrary labels assigned by the algorithm, and they don’t carry any inherent meaning.
Row-by-row Interpretation (row and column IDs start from 0)
- Row 0: [ 0 6 44 0] Setosa class → Its samples fall only into columns 1 and 2, with no overlap with Versicolor or Virginica. These two columns should really have been recognized as a single cluster corresponding to Setosa.
- Row 1: [ 2 0 0 48] Versicolor class → Split between columns 0 and 3, showing that the algorithm did not isolate Versicolor cleanly.
- Row 2: [49 0 0 1] Virginica class → Also split between columns 0 and 3, overlapping with Versicolor rather than forming its own distinct cluster.
- Row 3: [ 0 0 0 0] Extra mistaken cluster → No true samples here, reflecting that the algorithm produced 4 clusters for a dataset with only 3 classes.
📌The confusion matrix shows that Setosa is distinct (its clusters don’t overlap with the other species), while Versicolor and Virginica are not separated cleanly – both are spread across the same two clusters (columns 0 and 3). This overlap highlights the algorithm’s difficulty in distinguishing between them. The confusion matrix makes these misclassifications visible in a way that a single ARI score cannot.
🏃Running the Agentic AI Loop
The Runner **orchestrates **iterations:
- Scout bees propose diverse solutions.
- Employed bees refine them.
- Onlooker bees select promising ones.
- The solution pool is updated.
- The best ARI per iteration is tracked.
In the Runner class and throughout the Artificial Bee Colony (ABC) algorithm, a candidate refers to a specific clustering model together with its defined parameters. In the example in the solution pool shown below, two candidates are returned.
Candidates are orchestrated using python’s concurrent.futures.ThreadPoolExecutor, which enables parallel execution. As a result, the ScoutAgent, EmployedBeeAgent, and OnlookerBeeAgent are run asynchronously in separate threads during each iteration of the algorithm.
The runner.run() method returns two objects:
solution_pool: This is a list of the pool_size most promising candidates (each being a dictionary containing a model and its parameters) found across all iterations. This list is sorted by fitness (ARI), so the very first element, solution_pool[0], will represent the best-fitting model and its specific parameters that the ABC algorithm discovered.
best_history: This is a list that tracks only the best Adjusted Rand Index.
For example:
solution_pool = [
{
"model": "KMeans",
"params": {"n_clusters": 3, "init": "k-means++"},
"origin_agent": "Employed",
"current_ari_for_display": 0.742
},
{
"model": "AgglomerativeClustering",
"params": {"n_clusters": 3, "linkage": "ward"},
"origin_agent": "Onlooker",
"current_ari_for_display": 0.715
}
]
best_history = [
{"ari": 0.642, "model": "KMeans", "params": {"n_clusters": 3, "init": "random"}},
{"ari": 0.742, "model": "KMeans", "params": {"n_clusters": 3, "init": "k-means++"}}
]
Solution Pool Setup with ThreadPoolExecutor
ThreadPoolExecutor(): Initializes a pool of worker threads that can execute tasks concurrently.
ex.submit(…): Submits each agent’s act method as a separate task to the thread pool.
from concurrent.futures import ThreadPoolExecutor
import copy
# ... inside Runner.run() ...
for it in range(iterations):
print(f"Runner: Starting iteration {it+1}/{iterations}...")
if it == 0:
results = []
else:
# Use threads instead of processes
with ThreadPoolExecutor() as ex:
futures = [
ex.submit(self.scout.act),
ex.submit(self.employed.act, solution_pool),
ex.submit(self.onlooker.act, solution_pool)
]
results = [f.result() for f in futures]
print(f"Runner: Agents completed actions for iteration {it+1}.")
# ... rest of the loop unchanged ...
Each agent’s act method is dispatched to the thread pool, allowing them to run in parallel. The call to f.result() ensures that the Runner waits for all tasks to finish before moving forward.
This design achieves two things:
- Parallel execution within an iteration — agents act simultaneously, mimicking real bee colony behavior.
- Sequential iteration control — the Runner only advances once all agents have completed their work, keeping the overall loop orderly and deterministic.
From the Runner’s perspective, iterations still appear sequential, but internally each iteration benefits from concurrent execution of agent tasks.
Solution Pool Setup with ProcessPoolExecutor
While ThreadPoolExecutor provides concurrency through threads, it can be seamlessly replaced with ProcessPoolExecutor to achieve true parallel CPU execution.
With ProcessPoolExecutor, each agent runs in its own separate process, which bypasses Python’s **GIL **(Global Interpreter Lock). The GIL is a **mutex **(mutual exclusion lock) that ensures only one thread executes Python bytecode at a time, even on multi‑core systems. By using processes instead of threads, heavy numerical workloads can fully leverage multiple CPU cores, enabling genuine parallelism and improved performance for compute‑intensive tasks.
from concurrent.futures import ProcessPoolExecutor
import copy
# ... inside Runner.run() ...
for it in range(iterations):
print(f"Runner: Starting iteration {it+1}/{iterations}...")
if it == 0:
results = []
else:
# Use processes instead of threads
with ProcessPoolExecutor() as ex:
futures = [
ex.submit(self.scout.act),
ex.submit(self.employed.act, solution_pool),
ex.submit(self.onlooker.act, solution_pool)
]
results = [f.result() for f in futures]
print(f"Runner: Agents completed actions for iteration {it+1}.")
# ... rest of the loop unchanged ...
Key Differences between ProcessPoolExecutor vs ThreadPoolExecutor
- ProcessPoolExecutor launches separate python processes, not threads.
- Each agent runs independently on a different CPU core.
- This avoids the GIL, so CPU‑bound tasks (like clustering, fitness evaluation, numerical optimization) truly run in parallel. A CPU‑bound task is any computation where the limiting factor is the processor’s speed rather than waiting for input/output (I/O).
- Since processes run in separate memory spaces, they can’t directly share objects. Instead, anything passed between them must be serialized (pickled). Simple python objects like dictionaries, lists, strings, and numbers are picklable, so candidate dictionaries can be exchanged safely.
📌Key Takeaway:
✅ Use ProcessPoolExecutor if your agents do heavy computation (matrix ops, clustering, ML training).
❌ Stick with ThreadPoolExecutor if your agents are mostly I/O‑bound (waiting for data, network, disk).
Why are some of the candidate parameter values repeated in different iterations?
The repetition of candidate parameter values across iterations is a natural outcome of how the Artificial Bee Colony algorithm works and how the agents interact:
Scout Bee Agent’s Exploration: The ScoutBeeAgent is tasked with generating new and diverse candidate solutions. While it aims for diversity, given a limited parameter space or if the generative model finds certain parameter combinations consistently effective, it might suggest similar solutions in different iterations.
Employed Bee Agent’s Exploitation: The EmployedBeeAgent refines existing promising solutions. If a solution is already very good or close to an optimal configuration, the “local neighborhood” exploration (e.g., adjusting parameters by ±10-20%) might lead back to the same or very similar parameter values, especially after rounding or if the parameter adjustments are small.
Onlooker Bee Agent’s Selection: The OnlookerBeeAgent selects the top_k most promising solutions from a larger set of candidates (which includes newly scouted, refined by employed, and previously promising solutions). If the algorithm is converging, or if several distinct solutions yield very similar high-fitness scores, the OnlookerBeeAgent might repeatedly select parameter sets that are effectively identical from one iteration to the next.
Solution Pool Management: The Runner maintains a solution_pool of a fixed pool_size. It sorts this pool by fitness and keeps the best ones. If the top solutions remain consistently the same, or if new good solutions are identical to previous ones, those parameter sets will persist and thus be “repeated” in the iteration details.
Convergence: As the ABC algorithm progresses, it’s expected to converge towards optimal or near-optimal solutions. This convergence often means that the search space narrows, and agents repeatedly find the same high-performing parameter configurations unless some kind of pruning method (like deduplication) is applied.
📊Reporting Results
Benchmarking Standard Clustering Algorithms
Before applying ABC, it is useful to establish a baseline by evaluating the performance of standard clustering methods. I ran a comparison benchmark using default configurations for the following algorithms:
- KMeans
- DBSCAN
- Agglomerative Clustering
- Gaussian Mixture Models (GMM)
- Spectral Clustering
- MeanShift
As shown in the Google Colab notebook, the ABC agents discovered parameter sets that significantly improved the Adjusted Rand Index (ARI), reducing misclassifications between the closely related classes Versicolor and Virginica.
Reporter Outputs
The Reporter class is responsible for generating final evaluation outputs after running the Artificial Bee Colony (ABC) optimization. It provides three main functions:
- Comparison Table
- Compares each candidate solution’s Adjusted Rand Index (ARI) against baseline clustering models.
- Reports the improvement (candidate_ari – baseline_ari).
- Confusion Matrix Display
- Prints the confusion matrix of the best candidate solution to show class-level performance and misclassifications.
- Convergence Visualization
- Plots the progression of the best ARI across iterations.
- Annotates the plot with model names and parameters for each iteration.
💬Designing Agent Prompts for Gemini
I decided to design each agent’s prompt with the following template for a structured approach:
• Task Goal: What the agent must achieve.
• Parameters: Inputs like dataset name, number of candidates for the agent type, allowed algorithms and the hyperparameter input dictionary returned by the WebResearcher via its LLM prompt.
• Constraints: Ensure each candidate is unique, maintain balanced distribution across algorithms, require hyperparameters to stay within valid ranges.
• Return Values: JSON list of candidate solutions.
To ensure deterministic LLM behavior, I used this generation_config. In particular, note that specifying a temperature of zero leaves the model with no room for creativity between prompts and simply repeats the previous response.
generation_config={
"temperature": 0.0,
"top_p": 1.0,
"top_k": 1,
"max_output_tokens": 4096
}
res = genai_model.generate_content(prompt, generation_config=generation_config)
While developing new code like in this project, it is important to ensure that for the same input, you get the same output.
⚠️Gemini Agentic AI Issues
Gemini AI Model Types
- Lite (Flash‑Lite): Prioritize speed and cost efficiency. Ideal for bulk tasks like translation or classification.
- Flash: Well‑suited for production workloads requiring scale and moderate reasoning.
- Pro: The flagship tier – best for complex reasoning, multimodal comprehension (text, images, audio, video), and agentic AI use cases.
Why Prompts Alone Fail in Lite Models
I ran into a common limitation for the “Lite” models: LLMs don’t reliably obey instructions like “always include these parameters” just because you put them in the prompt. As of today, models often revert to defaults or minimal sets unless structure is enforced after generation. Why the explicit prompt still failed:
- Natural language instructions are weak constraints. Even “always include exactly these parameters” is interpreted probabilistically.
- No schema enforcement. When parsing JSON, you need to validate that required keys exist.
- Deduplication addresses duplicates, not gaps. It eliminates identical candidates but does not restore missing parameters.
📌Key Takeaway: Prompts alone won’t guarantee compliance. You need prompt + schema enforcement to ensure outputs consistently include required parameters.
Prompt Compliance Issues and Schema Solutions
Models can prioritize other parts of the prompt or simplify outputs despite emphasis on required items.
- Example instruction: “Return Values: ONLY output a JSON-style dictionary. Return string must be no longer than 1024 characters.”
- Observed outcome: len(res_text) = 1036 – responses exceeded the limit.
- Missing fields: Required items sometimes did not appear, even when stated clearly. Providing concrete output examples improved adherence.
- Practical fix: Pair prompts with schema enforcement (e.g., validate required keys, length checks) and post‑generation normalization to guarantee structure.
Empty Candidate Errors in Gemini API
On occasion, I got this response:
>> ScoutAgent: Error during API call (Attempt 1/3): Invalid operation: The response.text quick accessor requires the response to contain a valid Part, but none were returned. The candidate’s finish_reason is 2.
That error message means the model didn’t actually return any usable content in its response, so when my code tried to access response.text, there was no valid “Part” to read. The key clue is** finish_reason = 2**, which in Google’s API corresponds to a STOP or no content generated condition (the model terminated without producing text).
Why it happens:
- Empty candidate: The API call succeeded, but the model produced no output.
- FinishReason = 2: Indicates the generation stopped before yielding a valid part.
- Quick accessor failure: Since response.text expects at least one valid text part, it throws an error when none exist.
How to handle it:
- Check finish_reason before accessing response.text. Only read text if the candidate includes a valid part.
- Add fallback logic: If no text is returned, log the finish reason and retry or handle gracefully.
- Schema enforcement: Validate that required fields exist in the response before parsing.
📌 Key Takeaway: This isn’t a network error — it’s the model signaling that it stopped without generating text. You can find the full list of FinishReason values and guidance on interpreting them in Google’s documentation:Generate Content API – FinishReason.
Intermittent API Connection Errors
On occasion, the Gemini API call failed with:
- Error: ConnectionError: (‘Connection aborted.’, RemoteDisconnected(‘Remote end closed connection without response’))
📌 Key Takeaway: This is a network error and occurred without code changes, indicating transient network or service issues. Add retries with exponential backoff, timeouts, and robust logging to capture context (request size, rate limits, finish_reason) and recover gracefully.
Agent Security Considerations
One more thing to pay attention to, especially if you are using Agents for corporate use – security is mission-critical!
⚠️**Provide strict guardrails between Agents and the LLM. **Actively prevent agents from deleting critical files, taking off‑topic actions, making unauthorized external API calls, etc.
📌 Key takeaway: Apply the Principle of Least Privilege
- Scope: Restrict each agent’s permissions strictly to its assigned task.
- Isolation: Block filesystem writes, external calls, or off‑topic actions unless explicitly authorized.
- Audit: Record all actions and require approvals for sensitive operations.
⚔️Agentic AI Competitive Landscape towards 2026
Model Providers
This table outlines how the agentic AI market is expected to develop in the near future. It highlights the main companies, emerging competitors, and the trends that will shape the space as we move towards 2026. Presented here as a non‑exhaustive list of direct competitors to Gemini, the aim is to give readers a clear picture of the strategic environment in which agentic AI is evolving.
| Provider | Core Focus | Strengths | Notes |
| Google Gemini API | Multimodal LLM service (text, vision, code, etc.) | High‑quality generative outputs; Google Cloud integration; strong multimodal capabilities | Primarily a model API, Gemini 3 explicitly designed to support orchestration of agentic workflows |
| OpenAI GPT APIs | Text + code generation | Widely adopted; strong ecosystem; fine‑tuning options | Limited multimodal support compared to Gemini |
| Anthropic Claude | Safety‑focused text LLMs | Strong alignment and safety features; long context handling | Less multimodal capability |
| Mistral AI | Open and enterprise models | Flexible deployment; community driven; customizable | Requires infrastructure setup |
| Meta LLaMA | Open‑weight research models | Open source; strong research backing; customizable | Needs infra and ops for production |
| Cohere | Enterprise NLP and embeddings | Enterprise features; embeddings; privacy options | Narrower scope than general LLMs |
Source: Table by author, generated with GPT-5
Agent Orchestration Frameworks
This table examines the management and orchestration aspects of agentic AI. It highlights how different frameworks handle coordination, reliability, and integration to enable scalable agent systems.
| Framework | Core Focus | Strengths | Notes |
| LangGraph | Graph‑based orchestration | Models workflows as nodes/edges; strong memory; multi‑agent collaboration | Requires developer setup; orchestration only |
| LangChain | Agent/workflow orchestration | Rich ecosystem; tool integration; memory/state handling | Can increase token usage and complexity |
| CrewAI | Role‑based crew orchestration | Role specialization; collaboration patterns; good for teamwork scenarios | Depends on external LLMs |
| OpenAI Swarm | Lightweight multi‑agent orchestration | Simple handoffs; ergonomic routines | Good for running experiments |
| AutoGen (Microsoft) | Multi‑agent framework | Research + production focus; extensible | Still evolving; requires Microsoft ecosystem |
| AutoGPT | Autonomous agent prototype | Fast prototyping; community driven | Varying production readiness |
Source: Table by author, generated with GPT-5
✨Conclusion and Future Work
This project was my first experiment with Gemini’s agentic AI, adapting the Artificial Bee Colony algorithm to an optimization task. Even on a small dataset, it demonstrated how LLMs can take on bee‑like roles in a meta‑heuristic process, while also revealing both the promise and the practical challenges of this approach. Feel free to copy and adapt the Google Colab notebook for your own projects.
Future Work
- Applying the ABC meta‑heuristic to larger and more diverse datasets.
- Extending the WebResearcher agent to automatically construct datasets from domain‑specific sources (e.g. Royal Botanic Gardens Kew – POWO), inspired by Sir Ronald Fisher’s pioneering work in statistical botany.
- Running experiments with expanded pools of worker threads and adjusting the number of candidates per bee agent type.
- Exploring semi‑supervised clustering, where a small labeled dataset complements a larger unlabeled one.
- Comparing results from Google’s Gemini API with outputs from other providers’ APIs.