Thoughts on LLMs (closed- and open-source) in software development after one year of professional use.

These are my thoughts on LLMs in software engineering after using them in professional setting for about an year:

Chatbots are amazing at codebase exploration.
Chatbots are good at checking regression while going through idea, especially Codex.
Claude is better than others in code quality.
Local model aren’t much help not even for easier tasks. The models you can run locally using 24-40 GB of VRAM are underwhelming and slow. The agentic flows, especially, can quickly build up big KV caches which are too much and too slow to handle locally. Forget about multiple 100k+ chat sessions concurrently. Economies of scale win here to bring the best value out of a certain capex spent on hardware. Models like gemini flash are fast, good and cheap.
That said, the biggest open-sour…

These are my thoughts on LLMs in software engineering after using them in professional setting for about an year:

Chatbots are amazing at codebase exploration.
Chatbots are good at checking regression while going through idea, especially Codex.
Claude is better than others in code quality.
Local model aren’t much help not even for easier tasks. The models you can run locally using 24-40 GB of VRAM are underwhelming and slow. The agentic flows, especially, can quickly build up big KV caches which are too much and too slow to handle locally. Forget about multiple 100k+ chat sessions concurrently. Economies of scale win here to bring the best value out of a certain capex spent on hardware. Models like gemini flash are fast, good and cheap.
That said, the biggest open-source models can basically match GPTs and Claudes of the world now and at a fraction of the cost. Since, for most people, they are too big to run locally the only viable option is various 3rd party hosted ones but they are often not trusted enough to be used with internal company codebases. This means we are mostly left with OpenAI, Anthropic or Google’s models.
Since code generation is cheap now (LLMs), going out of the way for thoughtful tests, readability, and PR documentation is the minimum now.
Code cannot be merged at the rate it is produced because you have to own what was generated. The main gain we get is elevation from generation to checking, which is faster but not a substitute for skills.
Because you have to own the work, you have to be competent in that area. Paradoxically, if LLMs are relied on too much, they can hinder your ability to develop enough competence to supervise the work.
On the flip side, LLMs do allow greater exposure to the problem set much faster: fail fast → solve → get better (rapid iteration). In other words, they complement your agency. It remains an open question which of these two wins out for developing competence.
Rapid comprehension appears to be the most standout capability of LLMs over humans. So the longer the longer and the richer the context the most we can get out of LLMs.

Tags:

Similar Posts