-
Notifications
You must be signed in to change notification settings - Fork 3
Using McSema lifted bitcode #24
Comments
This will need some progress on #4 |
So, after one day's worth of effort, I've been able to take (very slightly modified) McSema produced bitcode for test.c, run it through uint64_t sub_400566_main(uint64_t RSP8, uint64_t RSP16, uint64_t RSP24, uint64_t RSP32, uint64_t RSP40, uint64_t RSP48)
{
uint64_t alloca7;
uint64_t alloca11;
uint64_t alloca14;
uint64_t alloca15;
uint64_t alloca16;
uint64_t alloca1 = RSP8;
uint64_t alloca2 = RSP16;
uint64_t alloca3 = RSP48;
uint64_t alloca4 = RSP40;
uint64_t alloca5 = RSP32;
uint64_t alloca6 = RSP24;
uint64_t anon8 = (uint64_t){{0, 0, 0, 0}};
alloca7 = anon8;
uint64_t anon10 = (uint64_t)&alloca11 | 1;
alloca9 = anon10 + 42;
uint64_t anon12 = (uint64_t){{1, 0, 2, 0, 0, 0, 0, 0}, {71, 108, 111, 98, 97, 108, 32, 118, 97, 114, 105, 97, 98, 108, 101, 32, 39, 97, 39, 32, 111, 102, 32, 118, 97, 108, 117, 101, 32, 37, 117, 32, 97, 116, 32, 97, 100, 100, 114, 101, 115, 115, 32, 37, 112, 32, 105, 115, 32, 0}, {101, 118, 101, 110, 46, 0}, {111, 100, 100, 46, 0}};
uint32_t* anon13 = (uint32_t*)anon8;
printf(anon12 + 8 & 0xffffffff, (__zext uint64_t)*anon13, anon8, __undefined, __undefined, __undefined, alloca14, alloca7, alloca15, *(uint64_t*)alloca16, alloca1, alloca2, alloca6, alloca5, alloca4, alloca3);
uint64_t alloca9 = anon10 + ((*anon13 & 1) != 0 ? 67 : 55) + 10;
if ((*anon13 & 1) != 0)
{
puts(anon12 + 64 & 0xffffffff);
}
else
{
puts(anon12 + 58 & 0xffffffff);
}
return 0;
} I think the whole experiment can be summarized in the following points:
In principle, it does. But it's likely unstable and the output is pretty low quality
Currently the two main reasons fcd crashes with mcsema bitcode is: a)
For fcd, it's better CFG recovery, better recovery of binary data other than executable code (global variables, static constants, ...) and last but not least, test cases for argument recovery, stack recovery and pseudocode generation. For mcsema, it's access to an easily hackable LLVM pass pipeline with support for passes written in python and quick overview of a binary in a high-level language.
In my opinion, definitely. CFG recovery and lifting of all binary data (not just executable code) is not a trivial task. Using whatever existing code makes it possible to spend developer time on something more meaningful. Also I think that with each iterative improvement to |
As it is, fcd only has very basic CFG recovery and doesn't handle a number of aspects of symbol information usually present in a binary. McSema has more capabilities in this regard and produces bitcode similar to that of fcd after lifting. It would definitely be worth to check if bitcode lifted by McSema could be used in fcd in some way.
The text was updated successfully, but these errors were encountered: