By Francesco Evangelista, Sam L. Thomas, Alex Matrosov
Software and its supply chain are becoming increasingly complex, with developers relying more and more on third-party frameworks and libraries to ship fast and fulfill business objectives. While the prevalence of such libraries and frameworks is a boon, allowing developers to focus on specific business logic as opposed to “reinventing the wheel,” these dependencies often introduce vulnerabilities into otherwise secure codebases.
This problem is compounded by the advent of AI-driven software development. We have entered a new era of scale for cyber attacks, where low-hanging targets can be compromised essentially just by facing the internet. As LLMs produce code that often depends on outdated library versions, we risk creating unmaintainable and vulnerable software that is still functionally correct. But the risk is bidirectional: the quality of reasoning and logic in AI models has progressed so quickly that they can now independently figure out exploit development based on existing knowledge, a domain previously reserved for human experts. As demonstrated by recent experiments, the only things that matter now are context (domain-specific expertise) and the velocity of access to knowledge regarding new attack classes. This shift makes identifying security issues a vital, time-sensitive part of today’s software development process.
While there is an abundance of tools available for auditing and analyzing source code, when a dependency is shipped as a binary, few such tools exist. More often than not, developers can do little more than trust that the library or framework they’re using is free of vulnerabilities. Most existing tools for checking if binaries contain known vulnerabilities (including most binary SCA solutions) rely on scanning for version strings or simple byte patterns like YARA. Unfortunately, such an approach is ineffective when library maintainers backport fixes without updating version numbers, a scenario frequently observed in open source projects (like Lighttpd). Moreover, even when such approaches successfully identify a library version, they tell us nothing about the reachability of potentially vulnerable code paths. This results in huge numbers of false positives and provides little actionable insight.