Over the past weekend, I decided to come out of a rut and do some recreational programming. For this time, I decided to dig deeper into Go and build a distributed log ingestion system.
In just 2-3 days, I was able to build the following:
- Built ingest nodes that accepts logs from producer applications
- Used FNV-1a hashing for deterministic partition routing to distribute logs across multiple storage nodes by service name
- Implemented append-only log-structured storage with JSON line format
- Created HTTP APIs for batch log ingestion with automatic metadata enrichment (timestamps, client IP, node ID)
- Implemented fan-out Fan-in writes to storage nodes via goroutines, which reduced latency from sequential (O(N)) to concurrent execution (bounded by the slowest node).
- Added …
Over the past weekend, I decided to come out of a rut and do some recreational programming. For this time, I decided to dig deeper into Go and build a distributed log ingestion system.
In just 2-3 days, I was able to build the following:
- Built ingest nodes that accepts logs from producer applications
- Used FNV-1a hashing for deterministic partition routing to distribute logs across multiple storage nodes by service name
- Implemented append-only log-structured storage with JSON line format
- Created HTTP APIs for batch log ingestion with automatic metadata enrichment (timestamps, client IP, node ID)
- Implemented fan-out Fan-in writes to storage nodes via goroutines, which reduced latency from sequential (O(N)) to concurrent execution (bounded by the slowest node).
- Added partition-aware batching (1 network call per partition, not per log)
- Refactored to clean architecture separating business logic from HTTP logic
And I did this all with help from AI, but not in the way you might think.
Some context
Before AI, if I were to undertake this project, then I’d have to spend days researching the ins and outs of distributed systems and logging platforms. It can be draining to do heavy research when you’re sitting down just to write some weekend recreational code. I’m not saying that researching and going deep on things is bad, but I’m just saying that sometimes I just want to code something up.
With AI, all it takes is some prompts to get direction. I use a technique that I’d like to call the ‘Breadcrumbs approach’.
The approach
The core concept of this approach is that we don’t use AI as a teacher but rather use it like a compass to get directions. You walk the trail, but use AI to guide you. There is an interesting article about Learning with LLMs that you can read to get more context on why using AI to learn is a bad idea.
In this approach, we don’t ask the AI for code, or boilerplate setup or any programming based doubts. Rather we just give it context on what we intend to build and then ask it for the next immediate step.
For example, in case of the ‘Distributed log ingestion system’, I told AI what I intend to build. I had no idea how a distributed log ingestion system worked. So I asked AI to give me a brief introduction. From this, I got some keywords such as Ingest node, storage node, partitioning, etc.
Then I asked AI to give me a high level overview of the components that I need to implement. It gave me the following:
Ingest node - takes in the logs from the producers
Partitioning - Figure out a way to split and store the logs using n storage nodes
Storage node - accepts requests from Ingest nodes and actually writes the logs to disk based on the partition
Once I get this sort of overview, I was still a bit confused. Then I asked AI on what the first immediate step is, it suggested to ‘Implement /ingest endpoint on Ingest node to accepts logs from producers’. This sounds like something that I can do without any AI intervention. So I decided to jump into code. From this point, I restrict myself from using agentic coding tools or LLMs. Instead I use the good old fashioned search engine and stackoverflow to figure stuff out.
Then, whenever I got stuck or didn’t know what the next step was, I’d ask AI to give me the next immediate step. Make sure to mention to only give you enough to move forward, not too much to the point where you offload your thinking & simply copy paste.
The major issue with learning with AI is that it mainly gives you a summarization, and you feel like you understood the concept from the summarization. However, this is a major pitfall when using AI to learn. Refer the article I mentioned above to understand more.
Once I was done with all the features that I intended, I asked AI to suggest potential improvements or concepts that I can apply to improve as an engineer. It gave me a list of things I had no idea about.
- Segment rotation with offset indexes
- Add service discovery instead of hard-coded URLs
- Add a dashboard to expose metrics
- Balancing techniques such as round robin
- Backpressure if producers flood the ingest node
- Move to protobuf + gRPC (to reduce payload size and improve serialization performance).
- Retry with exponential back-off
- Add search
Right now I’m on my way to implement these. If you’d like to see the code that I wrote using this approach, please check out the github repo.
We’re fortunate to be living at the same time as major revolutions are happening. While they are super helpful, we should be careful not to downgrade ourselves in the process. Learn to wield the tool, and not be controlled by it. If you found this article helpful or want to reach out, feel free to contact me. I’d love to talk.