CodeChat-Eval: Evaluating Large Language Models in Multi-Turn Code Refinement Dialogues (opens in new tab)

Large Language Models (LLMs) are increasingly used in software engineering to generate and refine code. In practice, developers often continue from an initial code generation request with follow-up refinement instructions, such as requests to improve style, restructure implementation, or change the execution strategy while preserving the intended behaviour. However, existing benchmarks generally omit this multi-turn code refinement dialogue sett...

Read the original article