Introduction
When Google released their Gemini 2.0 models with image generation capabilities, I was excited to try them out. But as a developer who lives in the terminal, I wanted something more than just copying code snippets from documentation. I wanted a tool that felt natural, powerful, and ready for real-world use.
So I built gemini-imagen — a comprehensive Python library and CLI for Google Gemini’s image generation capabilities. In this article, I’ll walk you through the journey of building this tool, the technical decisions I made, and the lessons I learned along the way.
The Vision
I had three main goals:
- Developer-friendly CLI — Something you can use directly from the terminal without writing any code
 - Production-ready library — A Python API that’…
 
Introduction
When Google released their Gemini 2.0 models with image generation capabilities, I was excited to try them out. But as a developer who lives in the terminal, I wanted something more than just copying code snippets from documentation. I wanted a tool that felt natural, powerful, and ready for real-world use.
So I built gemini-imagen — a comprehensive Python library and CLI for Google Gemini’s image generation capabilities. In this article, I’ll walk you through the journey of building this tool, the technical decisions I made, and the lessons I learned along the way.
The Vision
I had three main goals:
- Developer-friendly CLI — Something you can use directly from the terminal without writing any code
 - Production-ready library — A Python API that’s type-safe, async-first, and ready for integration
 - Real-world features — S3 storage, observability with LangSmith, proper error handling, and configuration management
 
Starting Simple: The Core API
I started with the Python library first. The Google GenAI SDK provides the foundation, but I wanted to add a layer that made common tasks trivial.
Here’s what the simplest use case looks like:
from gemini_imagen import GeminiImageGenerator
generator = GeminiImageGenerator()
result = await generator.generate(
prompt="A serene Japanese garden with cherry blossoms",
output_images=["garden.png"]
)
print(f"Image saved to: {result.image_location}")
Type Safety with Pydantic
One of my early decisions was to use Pydantic v2 for all data validation. This gives users immediate feedback if they pass invalid parameters, and enables excellent IDE autocomplete.
from pydantic import BaseModel, Field
class GenerationResult(BaseModel):
"""Result from image generation."""
text: Optional[str] = Field(None, description="Generated text response")
image_locations: List[str] = Field(default_factory=list, description="Local file paths")
image_s3_uris: List[str] = Field(default_factory=list, description="S3 URIs")
image_http_urls: List[str] = Field(default_factory=list, description="Public HTTP URLs")
model: str = Field(description="Model used for generation")
Type hints everywhere meant fewer bugs and better developer experience.
Async from Day One
Modern Python is async, and image generation involves I/O-heavy operations (downloading images, uploading to S3). Making everything async from the start was crucial:
async def generate(
self,
prompt: str,
input_images: Optional[List[ImageSource]] = None,
output_images: Optional[List[OutputImageSpec]] = None,
**kwargs
) -> GenerationResult:
# Load input images in parallel
if input_images:
loaded_images = await asyncio.gather(*[
self._load_image(img) for img in input_images
])
# Generate
result = await self._call_api(...)
# Save output images in parallel
if output_images:
await asyncio.gather(*[
self._save_image(img, path) for img, path in zip(...)
])
This parallelization made the library 5x faster when working with multiple images.
Building the CLI with Click
For the CLI, I chose Click — it’s the gold standard for Python command-line tools. The goal was to make the CLI feel intuitive while exposing the full power of the library.
Command Structure
I organized commands by use case:
imagen generate "a sunset"        # Text-to-image
imagen analyze photo.jpg          # Image analysis
imagen edit "make it brighter"    # Image editing
imagen upload local.png s3://...  # S3 integration
imagen config set temperature 0.8 # Configuration
Piping Support
One feature I’m particularly proud of is stdin support. You can pipe prompts directly:
echo "a robot reading a book" | imagen generate -o robot.png
cat prompts.txt | while read prompt; do
imagen generate "$prompt" -o "$(echo $prompt | tr ' ' '_').png"
done
This makes batch processing trivial and follows Unix philosophy.
JSON Output for Scripting
For automation, I added a --json flag to every command:
result=$(imagen generate "a landscape" -o output.png --json)
s3_uri=$(echo "$result" | jq -r '.s3_uri')
http_url=$(echo "$result" | jq -r '.http_url')
This makes it easy to integrate with shell scripts, CI/CD pipelines, and other tools.
Real-World Features
S3 Integration
Cloud storage is essential for production use. I integrated boto3 with async support (aiobotocore) to make S3 a first-class citizen:
# Input from S3, output to S3
imagen generate "enhance this" \
-i s3://bucket/input.jpg \
-o s3://bucket/output.jpg
# The result includes both S3 URI and public HTTP URL
# S3 URI: s3://bucket/output.jpg
# HTTP URL: https://bucket.s3.amazonaws.com/output.jpg
The library automatically detects S3 URIs (anything starting with s3://) and handles authentication using standard AWS credential chains.
LangSmith Tracing
Observability is crucial for debugging and monitoring. I integrated LangSmith to automatically log:
- Input prompts and parameters
 - Generated images (as S3 URLs)
 - Model used and execution time
 - Safety filtering information
 - Custom metadata and tags
 
# Enable tracing
os.environ["LANGSMITH_TRACING"] = "true"
result = await generator.generate(
prompt="A robot in a library",
output_images=["robot.png"],
metadata={"user_id": "demo", "session": "example"},
tags=["demo", "robot"]
)
# View traces at https://smith.langchain.com/
This made debugging API issues and monitoring production usage much easier.
Safety Settings
Google’s safety filters can sometimes be restrictive. I added comprehensive control over safety thresholds:
# Use a preset
imagen generate "artistic photo" -o output.png --safety-setting preset:relaxed
# Fine-grained control
imagen generate "portrait" -o output.png \
--safety-setting SEXUALLY_EXPLICIT:BLOCK_ONLY_HIGH \
--safety-setting DANGEROUS_CONTENT:BLOCK_LOW_AND_ABOVE
You can also set global defaults:
imagen config set safety_settings relaxed
Configuration Management
I wanted configuration to be flexible but opinionable. The precedence is:
- Command-line flags (highest priority)
 - Environment variables
 - Config file (
~/.config/imagen/config.yaml) - Default values (lowest priority)
 
This means you can:
# Set global defaults
imagen config set default_model gemini-2.0-flash-exp
imagen config set temperature 0.8
imagen config set aspect_ratio 16:9
# Override per-command
imagen generate "test" -o out.png --temperature 0.4
Development Experience
Modern Python Tooling
I used uv for package management — it’s incredibly fast and handles dependency resolution better than pip. Combined with:
- ruff for linting and formatting (replaces flake8, black, isort)
 - mypy for type checking
 - pytest with async support for testing
 - pre-commit hooks for automatic code quality
 
The development experience was smooth and fast.
Testing Strategy
I separated tests into two categories:
- Unit tests — Mocked, no API keys required
 - Integration tests — Real API calls, marked with 
@pytest.mark.integration 
@pytest.mark.integration
async def test_real_generation():
"""Test actual image generation with real API."""
generator = GeminiImageGenerator()
result = await generator.generate(
prompt="A simple test image",
output_images=["test_output.png"]
)
assert result.image_location.exists()
Integration tests automatically skip if API keys aren’t configured, making CI/CD easier.
CI/CD with GitHub Actions
The CI pipeline runs:
- Lint job — ruff and mypy on all code
 - Test job — pytest on Python 3.12 and 3.13
 - Integration tests — Only on main branch with secrets
 - Build job — Package building and verification
 
Plus automated dependency updates with Dependabot.
Documentation Strategy
Good documentation is crucial for adoption. I organized it into four files:
- README.md — CLI quick start and features (for casual users)
 - LIBRARY.md — Python API reference (for developers integrating the library)
 - ADVANCED_USAGE.md — S3, LangSmith, scripting, automation (for power users)
 - CONTRIBUTING.md — Development setup and guidelines (for contributors)
 
Each file cross-references the others, making it easy to navigate.
Lessons Learned
1. Start with Types
Using Pydantic from day one caught so many bugs early. The upfront cost of defining models pays off immediately.
2. Async is Worth It
Making everything async added complexity, but the performance gains (especially with multiple images) made it worthwhile.
3. CLI UX Matters
Small touches like:
- Piping support
 - JSON output for scripting
 - Helpful error messages with suggestions
 - Progress indicators
 
These made the CLI feel professional and polished.
4. Configuration is Hard
Getting the precedence right (CLI flags > env vars > config file > defaults) took several iterations. But once done, it made the tool much more flexible.
5. Documentation Drives Design
Writing documentation early forced me to think about the user experience. If something was hard to explain, it was probably badly designed.
What’s Next?
Some features I’m considering:
- Batch processing — Built-in support for processing multiple prompts
 - Template system — Reusable prompt templates with variables
 - Webhook support — Async notifications when generation completes
 - Local model support — Fallback to local models when API is unavailable
 - More output formats — WebP, AVIF support
 
Try It Yourself
Getting started is simple:
pip install gemini-imagen
export GOOGLE_API_KEY="your-key-here"
imagen generate "a serene landscape" -o output.png
The code is open source on GitHub: https://github.com/aviadr1/gemini-imagen
Conclusion
Building gemini-imagen was a journey in creating developer tools that feel right. It’s not just about wrapping an API — it’s about understanding how developers work and building something that fits naturally into their workflow.
Whether you’re generating images in production, experimenting with AI art, or just need a quick CLI tool for image generation, I hope gemini-imagen makes your life easier.
Feel free to try it out, open issues, or contribute! I’d love to hear how you use it.
Aviad Rozenhek is a software engineer passionate about developer tools and AI. Connect on GitHub.