Code Evolution: Self-Improving Software with LLMs and Python

Workshop for PyDay Barcelona 2025

Level: Beginner/Intermediate

Requirements: Basic Python knowledge, familiarity with APIs

Abstract

What if your code could fix its own bugs, evolve better solutions, and create new capabilities on demand? This workshop explores the intersection of Large Language Models and evolutionary computation to build software that improves itself.

Through four hands-on demos, participants will implement increasingly sophisticated self-improvement patterns: from simple error correction to Darwinian ev…

Code Evolution: Self-Improving Software with LLMs and Python

Workshop for PyDay Barcelona 2025

Level: Beginner/Intermediate

Requirements: Basic Python knowledge, familiarity with APIs

Slides - Presentation

Abstract

Through four hands-on demos, participants will implement increasingly sophisticated self-improvement patterns: from simple error correction to Darwinian evolution of code, agents that write their own tools, and finally a multi-agent team that evolves its own system prompts through competition.

Workshop Structure

Section	Description
Introduction	Theory: Why combine Evolution + LLMs?
Environment Setup	API keys, Google Colab configuration
Demo 1: Self-Reparation	Basic error correction loop
Demo 2: Code Evolution	Genetic algorithms with LLM operators
Demo 3: Toolmaker	Runtime self-modification
Demo 4: Evolving Team	Multi-agent prompt evolution

The Four Demos

Demo 1: The Self-Reparation (Basic)

Concept: Self-Correction / Reflexion

A script that automatically fixes its own bugs by:

Executing code and capturing errors (stderr)
Sending the code + traceback to an LLM
Receiving and applying the corrected code
Re-executing successfully

This demonstrates the fundamental feedback loop: using execution output as a "learning signal" for improvement.

Related Research:

Demo 2: Code Evolution (Intermediate)

Concept: Genetic Algorithm with LLM as Mutation Operator

Instead of fixing errors, we optimize solutions through evolution:

Population: LLM generates initial candidates
Evaluation: A fitness function scores each candidate
Selection: Keep the best performers
Mutation/Crossover: LLM combines winning traits to create offspring
Repeat for multiple generations

The problem: Generate passwords meeting complex, interacting constraints (length, emoji requirements, digit sum equals exactly 15, etc.)

Related Research:

Demo 3: The Agent Toolmaker

Concept: Self-Modification / Hot-swapping

An agent that creates capabilities it doesn’t have:

Agent receives a request (sentiment analysis)
Detects missing tool (ToolNotFoundError)
Uses LLM to write a new Python function
Adds it to tools.py and hot-reloads via importlib.reload()
Completes the original request with its new capability

The system gains complexity with use - each tool persists for future interactions.

Related Research:

Demo 4: The Evolving Dev Team

Concept: Self-Evolving Agent Prompts / Multi-Agent Competition

Two AI developer agents with different philosophies compete to solve coding problems:

Round 1: Both solve a problem with their initial system prompts
Evaluation: A Tech Lead runs actual tests and scores solutions
Evolution: Each agent rewrites their own system prompt based on feedback
Round 2: Agents use their evolved prompts on a new problem
Analysis: Compare performance before and after evolution

The key insight: agents that reflect on their mistakes can improve their own instructions. We observe emergent strategies that weren’t explicitly programmed.

Related Research:

Technical Requirements

For Participants

Google account (for Colab)
One of the following API keys:
OpenAI API key with credits (~1 euro should be sufficient) - Recommended
Google Gemini API key (FREE) - Get it here
Groq API key (FREE, very fast) - Get it here
Modern web browser

Technologies Used

Python 3.10+
LLM API (OpenAI gpt-4o-mini, Google Gemini, or Groq Llama 3.1)
Google Colab
TextBlob (for sentiment analysis demo)

Setup Instructions

Open the notebooks in Google Colab
Configure your API key in Colab Secrets:

Click the key icon in the left sidebar
Add your API key as a secret (choose one):
OPENAI_API_KEY for OpenAI (recommended)
GEMINI_API_KEY for Google Gemini (free)
GROQ_API_KEY for Groq (free, very fast)
Enable notebook access

In each notebook’s setup cell, uncomment the section for your chosen provider
Run cells sequentially

Using Free Alternatives

Google Gemini (FREE):

Get a free API key from Google AI Studio
Add secret GEMINI_API_KEY in Colab Secrets
In each notebook’s setup cell, comment out OpenAI and uncomment the Gemini section

Groq (FREE - Very Fast):

Get a free API key from console.groq.com
Add secret GROQ_API_KEY in Colab Secrets
In each notebook’s setup cell, comment out OpenAI and uncomment the Groq section
Note: Groq uses Llama 3.1 model, which is very capable for these demos

Files

pydaybcn2025-code-evolution/
README.md                    # This file
demo_01_self_reparation.ipynb    # Bug fixing feedback loop
demo_02_evolution.ipynb      # Genetic algorithm with LLM
demo_03_toolmaker.ipynb      # Self-modifying agent
demo_04_evolving_team.ipynb  # Multi-agent prompt evolution

Key Concepts

The Feedback Loop

All three demos share a common pattern:

Action -> Outcome -> Evaluation -> LLM Improvement -> Better Action

The key insight is treating program output (errors, scores, missing capabilities) as training signal for the LLM.

LLM as Intelligent Operator

Unlike traditional genetic algorithms with random mutations, LLMs provide:

Semantic understanding of what makes solutions good
Directed improvement rather than random exploration
Ability to combine abstract concepts across candidates

Open-Ended Systems

Demo 3 illustrates open-endedness: systems that grow in complexity without predefined limits. The agent’s capability space expands with each new tool it creates.

Self-Evolving Prompts

Demo 4 shows meta-learning: agents that improve their own instructions. Through competition and reflection, they discover strategies we never explicitly taught them.

Safety Considerations

These demos use exec() and dynamic code generation. In production:

Sandbox all LLM-generated code (containers, VMs)
Validate code before execution
Implement human review for critical changes
Maintain audit trails of self-modifications
Set resource limits to prevent runaway processes

License

MIT License - Feel free to use and adapt for your own workshops.

Author

Camilo Chacón Sartori.

Workshop developed for PyDay Barcelona 2025.

Code Evolution: Self-Improving Software with LLMs and Python

Abstract

Code Evolution: Self-Improving Software with LLMs and Python

Abstract

Workshop Structure

The Four Demos

Demo 1: The Self-Reparation (Basic)

Demo 2: Code Evolution (Intermediate)

Demo 3: The Agent Toolmaker

Demo 4: The Evolving Dev Team

Technical Requirements

For Participants

Technologies Used

Setup Instructions

Using Free Alternatives

Files

Key Concepts

The Feedback Loop

LLM as Intelligent Operator

Open-Ended Systems

Self-Evolving Prompts

Safety Considerations

Further Reading

Papers

Code Improving (Author’s Research)

Related Concepts

License

Author

Similar Posts