Using McSema lifted bitcode #24

surovic · 2018-04-04T21:56:28Z

As it is, fcd only has very basic CFG recovery and doesn't handle a number of aspects of symbol information usually present in a binary. McSema has more capabilities in this regard and produces bitcode similar to that of fcd after lifting. It would definitely be worth to check if bitcode lifted by McSema could be used in fcd in some way.

surovic · 2018-04-05T21:11:23Z

This will need some progress on #4

surovic · 2018-04-09T19:34:52Z

So, after one day's worth of effort, I've been able to take (very slightly modified) McSema produced bitcode for test.c, run it through RemillArgumentRecovery and RemillStackRecovery IR passes and produce C pseudocode using the AST passes in fcd. The output for function main() is as follows:

uint64_t sub_400566_main(uint64_t RSP8, uint64_t RSP16, uint64_t RSP24, uint64_t RSP32, uint64_t RSP40, uint64_t RSP48)
{
    uint64_t alloca7;
    uint64_t alloca11;
    uint64_t alloca14;
    uint64_t alloca15;
    uint64_t alloca16;
    uint64_t alloca1 = RSP8;
    uint64_t alloca2 = RSP16;
    uint64_t alloca3 = RSP48;
    uint64_t alloca4 = RSP40;
    uint64_t alloca5 = RSP32;
    uint64_t alloca6 = RSP24;
    uint64_t anon8 = (uint64_t){{0, 0, 0, 0}};
    alloca7 = anon8;
    uint64_t anon10 = (uint64_t)&alloca11 | 1;
    alloca9 = anon10 + 42;
    uint64_t anon12 = (uint64_t){{1, 0, 2, 0, 0, 0, 0, 0}, {71, 108, 111, 98, 97, 108, 32, 118, 97, 114, 105, 97, 98, 108, 101, 32, 39, 97, 39, 32, 111, 102, 32, 118, 97, 108, 117, 101, 32, 37, 117, 32, 97, 116, 32, 97, 100, 100, 114, 101, 115, 115, 32, 37, 112, 32, 105, 115, 32, 0}, {101, 118, 101, 110, 46, 0}, {111, 100, 100, 46, 0}};
    uint32_t* anon13 = (uint32_t*)anon8;
    printf(anon12 + 8 & 0xffffffff, (__zext uint64_t)*anon13, anon8, __undefined, __undefined, __undefined, alloca14, alloca7, alloca15, *(uint64_t*)alloca16, alloca1, alloca2, alloca6, alloca5, alloca4, alloca3);
    uint64_t alloca9 = anon10 + ((*anon13 & 1) != 0 ? 67 : 55) + 10;
    if ((*anon13 & 1) != 0)
    {
        puts(anon12 + 64 & 0xffffffff);
    }
    else 
    {
        puts(anon12 + 58 & 0xffffffff);
    }
    return 0;
}

I think the whole experiment can be summarized in the following points:

Does fcd work with mcsema bitcode?

In principle, it does. But it's likely unstable and the output is pretty low quality

If it crashes, what seems to be the issue?

Currently the two main reasons fcd crashes with mcsema bitcode is: a) __remill_basic_block() was not preserved; b) the AST passes don't support an IR construct present in the mcsema bitcode.

What opportunities might there be with running on mcsema-lifted bitcode?

For fcd, it's better CFG recovery, better recovery of binary data other than executable code (global variables, static constants, ...) and last but not least, test cases for argument recovery, stack recovery and pseudocode generation.

For mcsema, it's access to an easily hackable LLVM pass pipeline with support for passes written in python and quick overview of a binary in a high-level language.

Is it worth the time?

In my opinion, definitely. CFG recovery and lifting of all binary data (not just executable code) is not a trivial task. Using whatever existing code makes it possible to spend developer time on something more meaningful. Also I think that with each iterative improvement to RemillArgumentRecovery, RemillStackRecovery and the AST passes we'll see very noticeable improvements in the output.

surovic added the enhancement New feature or request label Apr 4, 2018

surovic added this to the Milestone 5: Generating C Pseudocode milestone Apr 4, 2018

surovic self-assigned this Apr 4, 2018

surovic removed this from the Milestone 5: Generating C Pseudocode milestone Apr 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using McSema lifted bitcode #24

Using McSema lifted bitcode #24

surovic commented Apr 4, 2018

surovic commented Apr 5, 2018

surovic commented Apr 9, 2018 •

edited

Loading

Using McSema lifted bitcode #24

Using McSema lifted bitcode #24

Comments

surovic commented Apr 4, 2018

surovic commented Apr 5, 2018

surovic commented Apr 9, 2018 • edited Loading

surovic commented Apr 9, 2018 •

edited

Loading