A month ago, I wrote about building NutriAgent, my AI nutrition tracker that logs meals from Telegram and the web into a Google Sheet I own (you can read the original post here). I got it working, posted the article, and figured that was the end of the story.
Then I started using it every single day. And that’s when the real problems began to show up.
Not bugs. Not crashes. Just... little things that made me think "wait, this is annoying" multiple times per day. Things you only notice when you’re the actual user solving a real problem, not just demoing a cool idea.
Two problems broke the experience completely.
The Two Spreadsheets Problem (Why My Data Felt Broken)
I’d l…
A month ago, I wrote about building NutriAgent, my AI nutrition tracker that logs meals from Telegram and the web into a Google Sheet I own (you can read the original post here). I got it working, posted the article, and figured that was the end of the story.
Then I started using it every single day. And that’s when the real problems began to show up.
Not bugs. Not crashes. Just... little things that made me think "wait, this is annoying" multiple times per day. Things you only notice when you’re the actual user solving a real problem, not just demoing a cool idea.
Two problems broke the experience completely.
The Two Spreadsheets Problem (Why My Data Felt Broken)
I’d log my breakfast quickly on Telegram from my phone. Then at lunch, I’d be at my computer and use the web interface because it was easier. But at the end of the day, when I wanted to see my full nutrition breakdown, I had my data split across two different accounts and two different spreadsheets. I had to manually copy rows and merge them just to get a simple daily total.
The agent stored my Telegram meals under one user ID. My web chats were under another. When I asked "what did I eat this week?" the answer depended entirely on which platform I was using. My nutrition data was fragmented, making any real analysis impossible.
I realized that "make it multi-user" wasn’t enough. I needed one identity across both channels.
Since I found both channels useful for different scenarios, I decided to find a way to use them while keeping my data integrated and easy to visualize, and analyze
How the Linking Actually Works
I thought about building this feature into the main agent as a tool for this: "Send your email to link your account." But typing emails in chat felt clunky. Waiting for verification codes in Telegram felt slower than just clicking a button.
Some features are just faster in a web interface. Account linking is one of them.
So I built a Settings page in the web app that generates a short-lived linking code. You copy it, paste it into Telegram, and the bot connects your accounts. That’s it.
The flow:
- Get a code from the web Settings
- Send it to the Telegram bot
- Backend validates and binds your
telegram_user_idto yourclerk_user_id - Merge the chat histories and nutrition logs to keep everything in a single user account
Under the Hood: One User, Two Channels, One Source of Truth
Under the hood, the core decision was to pick a single canonical user identity and force everything else to align with it.
On the web side, authentication is handled by Clerk, which gives me a stable clerk_user_id. Instead of inventing a parallel identity system for Telegram, I decided to make clerk_user_id the primary key everywhere.
On the backend, the user model now looks roughly like this:
clerk_user_id→ primary identifiertelegram_user_id→ optional, nullableemail→ metadata and debugging
This means:
- Telegram is no longer a “separate user”
- It’s just another interface attached to the same account
- All nutrition logs, chat history, and summaries are keyed off the same ID
The linking code flow is intentionally simple:
- The web app generates a short-lived code bound to
clerk_user_id - Telegram sends the code back to the backend
- If valid, the backend attaches
telegram_user_idto the existing user record
No guessing. No heuristics. No email matching. If the code matches, the user explicitly intended to link the accounts.
This small constraint eliminated an entire class of edge cases I didn’t want to debug later.
The "One Meal, Three Messages" Telegram Headache
Once I got both channels working smoothly, I started using them interchangeably. That’s when I noticed something else. The web version lets me attach multiple images to a single message, for instance, a photo of my food plus a screenshot of the nutrition label. This made the AI estimates much more accurate.
But when I tried the same thing on Telegram, it fired off three separate messages, and I got three separate AI responses with different calorie counts. Each photo was processed in isolation from the webhook, without the context of the others. The experience gap was frustrating. The agent felt smart on web, broken on Telegram.
How I Fixed the Multiple Images Problem
Telegram has a way to detect media groups that are sent the so I introduced a MediaGroupHandler in the webhook handler for when you send multiple photos at once. So I built a simple batching system:
- When the bot receives an image as part of a media group, it waits 1 second to start processing the request
- If more images arrive in that chat within the window, it groups them and resets the delay
- Sends them all as
list[bytes]to the agent in one call
The agent’s analyze() method already accepts list[bytes], so no changes needed there. The fix was purely in the Telegram handler.
Now I can send three angles of my plate plus a nutrition label and get one smart response.
Why This Fix Lives in the Telegram Layer (Not the Agent)
One important detail: I didn’t change the agent at all to support multiple images.
The agent already accepts list[bytes] for images. The real bug wasn’t model capability — it was message orchestration.
Telegram delivers images as:
- Separate webhook events
- Sometimes grouped with a
media_group_id - Sometimes arriving milliseconds apart, out of order
Originally, each webhook triggered an agent call immediately. That meant:
- One image = one analysis
- Zero shared context
- Conflicting calorie estimates
The fix was to treat Telegram messages as signals, not requests.
I introduced a lightweight batching layer in the Telegram handler:
- Images with the same
media_group_idare buffered - A short debounce window (1 second) waits for more images
- Each new image resets the timer
- When the window closes, all images are sent together
Conceptually, it’s:
“Wait until the user is done talking, then think.”
media_groups: dict[str, list[bytes]] = {}
tasks: dict[str, asyncio.Task] = {}
lock = asyncio.Lock()
async def handle_image(chat_id, media_group_id, image_bytes):
async with lock:
media_groups.setdefault(media_group_id, []).append(image_bytes)
if media_group_id in tasks:
tasks[media_group_id].cancel()
tasks[media_group_id] = asyncio.create_task(
process_after_delay(media_group_id, chat_id)
)
async def process_after_delay(media_group_id, chat_id):
await asyncio.sleep(1)
images = media_groups.pop(media_group_id, [])
await agent.analyze(images=images, chat_id=chat_id)
By keeping this logic inside the Telegram adapter:
- The agent stays platform-agnostic
- The same analysis pipeline works for web uploads, Telegram albums, or future mobile clients
- Telegram quirks don’t leak into core business logic
This ended up being one of those fixes that made everything feel smarter without making the system more complex.
Another side effect of this implementation was that it forced me to go deeper into asynchronous programming with FastAPI and Uvicorn. I already had some exposure to asyncio, but this was the first time I had to reason explicitly about timing, cancellation, and shared state in a real user-facing flow.
To keep the solution simple, I used in-memory storage combined with asyncio.Lock() and cancellable asyncio.Tasks to implement the batching and debounce logic. This works well because the bot currently runs with a single worker, so I don’t need external coordination or persistence.
The important part is that this wasn’t a shortcut — it was a conscious tradeoff. The same pattern would translate cleanly to Redis, a queue, or a background worker if I needed to scale horizontally. For now, the simpler solution keeps the system easier to reason about, test, and evolve.
The "Oh, That’s Actually Smooth Now" Moment
After the changes, I logged lunch on Telegram during a break, used the web chat when I was at the computer, and that evening, I opened the single spreadsheet with the whole picture of my day ready to analyze and compare with the rest of the week.
I sent three images of dinner—no spam, just one clean response. The product finally feels intentional instead of held together with duct tape.
What Dogfooding Actually Teaches You
Building for yourself is different than building for a hypothetical user. You feel the pain immediately. You can’t ignore bad UX because you’re the one suffering.
The gap between "it works" and "it works well enough to use daily" is massive—and only dogfooding reveals it.
I learned that context engineering is more important than overloading prompts. I learned that some features belong in web UIs, not chat. And I learned that starting with a no-code tool is great for testing, but real usage demands real architecture.
It’s a Real Product Now
NutriAgent stopped being a toy project when I started needing it. These changes didn’t just add features—they made it something I can share and scale.
The project is live at https://nutriagent.juandago.dev. The code is open source for the Agent and Web UI.
This was my journey, but I’d love to hear your thoughts. Let’s continue the conversation on X or LinkedIn.