cs.SE updates on arXiv.org · Scour

Improving MPI Error Detection and Repair with Large Language Models and Bug References

A Synthesis Method of Safe Rust Code Based on Pushdown Colored Petri Nets

IndustryCode: A Benchmark for Industry Code Generation

Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy

An Initial Exploration of Contrastive Prompt Tuning to Generate Energy-Efficient Code

SkillRT: Compiling Skills for Efficient Execution Everywhere

DrugPlayGround: Benchmarking Large Language Models and Embeddings for Drug Discovery

UniCon: A Unified System for Efficient Robot Learning Transfers

A Survey of Real-Time Support, Analysis, and Advancements in ROS 2

Is the Cure Still Worse Than the Disease? Test Overfitting by LLMs in Automated Program Repair

An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications

Probing Pre-trained Language Models on Code Changes: Insights from ReDef, a High-Confidence Just-in-Time Defect Prediction Dataset

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

A Multi-Language Perspective on the Robustness of LLM Code Generation

DOne: Decoupling Structure and Rendering for High-Fidelity Design-to-Code Generation

LLMs as Idiomatic Decompilers: Recovering High-Level Code from x86-64 Assembly for Dart

Computational Foundations for Strategic Coopetition: Formalizing Sequential Interaction and Reciprocity

ProdCodeBench: A Production-Derived Benchmark for Evaluating AI Coding Agents

Evaluation of gNB Monostatic Sensing for UAV Use Case

EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild

Log in to enable infinite scrolling