Junyuan Zeng, a PhD graduate student in the Computer Science (CS) department at UT Dallas, is developing an efficient and effective method to convert low level binary code into high level C code.
Typically, computer software is developed using a high level language (source code such as C) then compiled and linked into low level binary code (1’s and 0’s) which is then executed by a computer. Sometimes, in older, legacy systems, the source code is lost and the executable versions are still used.
In any system, old or new, malware executable software may find it’s way onto the system. Identifying malicious malware executable code (1’s and 0’s) is often difficult to distinguish from useful executable code (also 1’s and 0’s). Using tools and techniques, such as those developed by Junyuan Zeng, to convert low-level binary code to high-level source code, makes it easier to identify and eliminate malware that has invaded a system.
After the malware has been eliminated, the software can be repaired and reused. Being able to reuse the code, after the malware has been eliminated, is important because that is the goal of fixing the corrupted code. The generated high level code has no malware and can be recompiled and safely reused.
At the CS Mixer on 10/10/2014, Junyuan Zeng, talked about his research work which is summarized in his own words below.
Binary code is everywhere and binary code reuse is valuable to many security applications including function transplanting and malware analysis. While prior approaches have shown binary code can be extracted and reused, they are often static analysis based and fall short especially when dealing with obfuscated binaries. Our trace-oriented programming (TOP), is a general framework for developing new software from existing binary code.
The substantial difference compared with existing work is that TOP gains benefits from dynamic analysis (such as obfuscation resilient, points-to analysis free), and it elevates the low level binary code into high level C code. Thus, this approach can be used for malware analysis especially malware function inspection and classification.
The Department of Computer Science at UT Dallas is one of the largest CS departments in the United States with more than 750 undergraduate, 500 master, and 125 PhD students. They are committed to exceptional teaching and research in a culture that is as daring as it is supportive.