For our Nexhacks project we wanted to explore a problem within predictions markets.

We started with a pretty simple frustration: prediction markets are powerful, but they’re hard to reason about unless you already think in probabilities. Most interfaces show a final price and expect users to translate/calculate this price into a risk, confidence, and position sizing. That gap is where people get confused.

Users will make bets based on gut decisions but do you know:

Your expected value if you're 80% confident?
The Kelly-optimal position size?
Your 95% Value at Risk?
Whether the risk-reward ratio favors you?

Our original goal was modest. Build something that helps people understand how a prediction market pos...

For our Nexhacks project we wanted to explore a problem within predictions markets.

Users will make bets based on gut decisions but do you know:

Your expected value if you're 80% confident?
The Kelly-optimal position size?
Your 95% Value at Risk?
Whether the risk-reward ratio favors you?

Our original goal was modest. Build something that helps people understand how a prediction market position behaves as probabilities and time change, using real Polymarket data. With our risk decision structure, we will provide a clear +EV, -EV or no edge for the simulated bet.

The Problem We Didn’t Expect

It was 4AM. We were integrating multiple sponsor tools at once: Polymarket data, numeric reasoning, explanations, evaluation, observability. Each one worked fine on its own, but stitching them together espeically with the hackaton time pressure and no sleep after my flght.

Every change required touching multiple pieces. Debugging meant guessing where things went wrong. And as the system grew, we realized the biggest risk wasn’t performance or features it was coordination. The answer here wasnt just 'more instances" or AI tokens

That’s when we stopped treating DevSwarm like “an AI coding tool” and started using it as the backbone of how we worked.

Reframing the System

Instead of thinking in terms of endpoints and API calls, we asked a different question:

“What if analysis itself was a pipeline of specialists?”

So we broke our /explain flow into clear stages, each with a single responsibility, and let DevSwarm handle the orchestration between them.

At runtime, a user interaction kicks off a sequence that looks like this:

One agent focuses purely on numeric reasoning (expected value, Kelly sizing, risk metrics)
One agent focuses on compressing context so we stay efficient
One agent turns that structured data into a clear explanation
One agent evaluates the result for consistency and quality

Each stage runs independently, hands off structured output, and reports its timing and status.

DevSwarm Runtime Agent Pipeline

Context Builder (Wood Wide AI, ~50ms)
↓
Compressor (Token Company, ~25ms)
↓
Tutor (LLM, ~150ms)
↓
Evaluator (Arize Phoenix, ~15ms)

Each stage runs independently and reports timing and status in real time.

How it all fits together

With DevSwarm, we built infrastructure.

The agent pipeline isn't just for show—it's how we:

Debugged sponsor integrations in real-time
Measured each sponsor's contribution
Showed users exactly how their analysis was computed

On top of our team, It was 6 DevSwarm builder agents working in parallel, each with:

Dedicated git branch (git-native isolation)
Specialized prompt (500+ lines of context per agent)
Clear ownership of files and responsibilities
Conventional commit prefixes for merge coordination

We split development into parallel DevSwarm builders, each working on a dedicated branch with clear ownership:

one focused on backend data and Polymarket integration,
one on numeric reasoning,
one on evaluation and observability,
one on UI polish,
and one whose only job was keeping the demo stable.

This sounds obvious in hindsight, but it completely changed our pace. We stopped stepping on each other’s work. Merges became predictable. The main branch stayed demo-ready, which is critical in a hackathon.

Our judge agent is scanning for defects and issues against the scoring criteria

Older judge verdicts, which we used for continuous integration

Conditional "GO" state, after clearing major issues and reaching a high score on the product and technical judge

What We Ended Up Shipping

By the end of the 24 hours, we had something that felt surprisingly cohesive:

A 3D visualization that shows how risk and payoff evolve as probabilities and time change
A numeric engine that computes EV, Kelly sizing, Sharpe-style metrics, and VaR
Live Polymarket data flowing through the system
An automated evaluation loop using Arize Phoenix that let us measure and improve explanation quality instead of guessing

None of this felt rushed, even though the clock was always ticking. DevSwarm absorbed most of the coordination overhead that usually slows teams down.

Agent	Sponsor	Function	Timing
`context_builder`	Wood Wide AI	Compute Kelly, Sharpe, VaR, EV, and break-even metrics	~50ms
`compressor`	The Token Company	Reduce context size by 40%+ and track estimated cost savings	~25ms
`tutor`	LLM	Generate a structured 3-part explanation from compressed context	~150ms
`evaluator`	Arize Phoenix	Score explanation against numeric ground truth and consistency checks	~15ms

The Biggest Takeaway

The most valuable insight wasn’t about speed or AI.

Parallel execution where possible
Structured handoffs between agents
Git-native versioning of agent configurations
Real-time visibility into each agent's contribution

When users can see how an answer is produced not just the answer itself they engage differently. They experiment more. They ask better questions. They actually learn.

For us, DevSwarm changed how we thought about structuring intelligence, both at runtime and during development.

Looking Back

We placed 3rd Overall at NexHacks, which was an awesome outcome, especially since this was only my second hackathon. But the bigger win was realizing there’s a better way to build AI-heavy products under pressure and playing with all the different new integrations.

This is not by hiding complexity, but by organizing it. Once we experienced that, it became hard to imagine going back to the old way.

Check out the result here!