Skip to content
This repository has been archived by the owner on Jun 16, 2019. It is now read-only.

Using McSema lifted bitcode #24

Open
surovic opened this issue Apr 4, 2018 · 2 comments
Open

Using McSema lifted bitcode #24

surovic opened this issue Apr 4, 2018 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@surovic
Copy link

surovic commented Apr 4, 2018

As it is, fcd only has very basic CFG recovery and doesn't handle a number of aspects of symbol information usually present in a binary. McSema has more capabilities in this regard and produces bitcode similar to that of fcd after lifting. It would definitely be worth to check if bitcode lifted by McSema could be used in fcd in some way.

@surovic surovic added the enhancement New feature or request label Apr 4, 2018
@surovic surovic self-assigned this Apr 4, 2018
@surovic
Copy link
Author

surovic commented Apr 5, 2018

This will need some progress on #4

@surovic
Copy link
Author

surovic commented Apr 9, 2018

So, after one day's worth of effort, I've been able to take (very slightly modified) McSema produced bitcode for test.c, run it through RemillArgumentRecovery and RemillStackRecovery IR passes and produce C pseudocode using the AST passes in fcd. The output for function main() is as follows:

uint64_t sub_400566_main(uint64_t RSP8, uint64_t RSP16, uint64_t RSP24, uint64_t RSP32, uint64_t RSP40, uint64_t RSP48)
{
    uint64_t alloca7;
    uint64_t alloca11;
    uint64_t alloca14;
    uint64_t alloca15;
    uint64_t alloca16;
    uint64_t alloca1 = RSP8;
    uint64_t alloca2 = RSP16;
    uint64_t alloca3 = RSP48;
    uint64_t alloca4 = RSP40;
    uint64_t alloca5 = RSP32;
    uint64_t alloca6 = RSP24;
    uint64_t anon8 = (uint64_t){{0, 0, 0, 0}};
    alloca7 = anon8;
    uint64_t anon10 = (uint64_t)&alloca11 | 1;
    alloca9 = anon10 + 42;
    uint64_t anon12 = (uint64_t){{1, 0, 2, 0, 0, 0, 0, 0}, {71, 108, 111, 98, 97, 108, 32, 118, 97, 114, 105, 97, 98, 108, 101, 32, 39, 97, 39, 32, 111, 102, 32, 118, 97, 108, 117, 101, 32, 37, 117, 32, 97, 116, 32, 97, 100, 100, 114, 101, 115, 115, 32, 37, 112, 32, 105, 115, 32, 0}, {101, 118, 101, 110, 46, 0}, {111, 100, 100, 46, 0}};
    uint32_t* anon13 = (uint32_t*)anon8;
    printf(anon12 + 8 & 0xffffffff, (__zext uint64_t)*anon13, anon8, __undefined, __undefined, __undefined, alloca14, alloca7, alloca15, *(uint64_t*)alloca16, alloca1, alloca2, alloca6, alloca5, alloca4, alloca3);
    uint64_t alloca9 = anon10 + ((*anon13 & 1) != 0 ? 67 : 55) + 10;
    if ((*anon13 & 1) != 0)
    {
        puts(anon12 + 64 & 0xffffffff);
    }
    else 
    {
        puts(anon12 + 58 & 0xffffffff);
    }
    return 0;
}

I think the whole experiment can be summarized in the following points:

  1. Does fcd work with mcsema bitcode?

In principle, it does. But it's likely unstable and the output is pretty low quality

  1. If it crashes, what seems to be the issue?

Currently the two main reasons fcd crashes with mcsema bitcode is: a) __remill_basic_block() was not preserved; b) the AST passes don't support an IR construct present in the mcsema bitcode.

  1. What opportunities might there be with running on mcsema-lifted bitcode?

For fcd, it's better CFG recovery, better recovery of binary data other than executable code (global variables, static constants, ...) and last but not least, test cases for argument recovery, stack recovery and pseudocode generation.

For mcsema, it's access to an easily hackable LLVM pass pipeline with support for passes written in python and quick overview of a binary in a high-level language.

  1. Is it worth the time?

In my opinion, definitely. CFG recovery and lifting of all binary data (not just executable code) is not a trivial task. Using whatever existing code makes it possible to spend developer time on something more meaningful. Also I think that with each iterative improvement to RemillArgumentRecovery, RemillStackRecovery and the AST passes we'll see very noticeable improvements in the output.

@surovic surovic removed this from the Milestone 5: Generating C Pseudocode milestone Apr 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant