A Benchmark for Language Models in Real-World System Building
arxiv.org·17h
🏗️Compiler Archaeology
Preview
Report Post

arXiv:2601.12927v1 Announce Type: new Abstract: During migration across instruction set architectures (ISAs), software package build repair is a critical task for ensuring the reliability of software deployment and the stability of modern operating systems. While Large Language Models (LLMs) have shown promise in tackling this challenge, prior work has primarily focused on single instruction set architecture (ISA) and homogeneous programming languages. To address this limitation, we introduce a new benchmark designed for software package build repair across diverse architectures and languages. Comprising 268 real-world software package build failures, the benchmark provides a standardized evaluation pipeline. We evaluate six state-of-the-art LLMs on the benchmark, and the results show that…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help