Skip to main content

Next-Generation Binary Analysis Techniques and Platform

Researcher: David Brumley

Cross Cutting Thrusts: Software Security | Formal Methods

Abstract

Scope: We propose continued research on advanced and novel binary code analysis for software security. We need software security techniques that only require access to binary code because binary code is ubiquitous. Further, binary code analysis allows us to argue about the security of the code that will execute, not just the code that was compiled. There are a tremendous number of security applications in defense (e.g., [4, 8, 9]), offense [5], and a wealth of other tasks [2,3,6,7,10,11,13] that can take advantage of binary analyses. These security applications require that the binary analysis be rooted in a firm foundation in order to provide the overall necessary security assurances. The main goal of our research is to put binary program analyses on a sound foundation. Despite the advantages of binary analysis, there has been relatively little principled work in the area. Most existing work has been developed by the IT industry as hacks that tend to work on a limited problem domain and provide best-effort results. Unfortunately, hacks are insufficient for long-term security. Very basic tasks such as stating assumptions for the analysis, understanding the complexity and termination properties of the algorithm, and rigorous experimentation are lacking more often than provided. Our approach consists of two thrusts. First, we need to develop program analysis theory that is appropriate for binary code. Second, we will use this theory to guide principled implementations of binary analysis and reverse engineering. The two thrusts are complementary: principled binary program analysis theory is necessary so that reverse engineering and binary analysis has a solid foundation, and reverse engineering tasks motivate specific directions for developing new theory.The main challenge for this line of work is previous program analyses have typically targeted source code, and do not work well for binary code. For example, while higher-level languages have types, functions, pointers, loops, and local variables, assembly has no types, no functions, one globally addressed memory region, gotos and stack frames instead of local variables. Our experiments and experience show that these assumptions do not necessarily hold even for benign programs compiled from a high-level language.

Outcomes: Our central measure of success is to demonstrate deeper, faster, and more accurate analysis in security-critical scenarios. We plan on demonstrating our success by publishing in appropriate academic venues. In particular, we plan on publishing in all work areas described in § 2 A successful research program will also result in releasing usable tools to the ARO, CyLab partners, the DoD, and researchers. CyLab partners and collaborators are actively using our current tools, and will benefit from the proposed improvements.