-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output IP addresses, domain names, file manipulations, and (potentially) registry details #1914
Output IP addresses, domain names, file manipulations, and (potentially) registry details #1914
Conversation
Hi @mr-tz, I hope you're having a nice week. Sorry for all the messages. I have changed the program design since the last commits - I realized that the default extractor/rendering functions should be easily extensible to verbose modes so I have restructured the program to reflect this (haven't pushed any additional code to GitHub though). Since I have changed these features' design fairly significantly, I just wanted to update this PR. In the initial PR, the extraction functions (like "extract_domain_names," etc.) don't fully handle dynamic and static inputs. These extraction functions: 1) extract strings from inputs; and, 2) regex parse the strings for e.g., domain names and IP addresses. Furthermore, some static inputs in capa implement the "extract_file_strings" in slightly different ways. This is roughly what I was thinking:
However, I have had trouble implementing the Also, can you suggest some features for a vverbose mode for outputting domain names? I am unsure what would be most useful (although I am still mulling over ideas!). Apologies again for all the messages - just wanted to make sure my approach sounds reasonable before proceeding too far! But no worries if you're on holiday - Merry Christmas and Happy New Year! |
This PR addresses issue #1907. It extracts IP addresses, domain names, and file manipulations from CAPE sandbox traces and outputs them. It may eventually extract and present registry details as well. It uses a regex to identify IPv4 and IPv6 addresses. It uses another regex to identify web domains and potential subdomains.
The extraction and rendering functions require tests (still a work in progress). This PR may require changelog and documentation updates.
Most of the significant features are in common.py and default.py - changes in the other files are to make the main additions compatible. I have gotten it to pass most of the tests locally. I'm still working on a few issues and need to improve the tests (especially the rendering tests). What do you think of the general design? Do you have any questions or suggestions?
The output would look like this:
+-----------------+
| Domain Names |
|------------------+
| google.com |
| web.domain.net |
| mywebsite.edu |
+-----------------+
+-----------------+
| IP Addresses |
|----------------- |
| 10.0.0.1 |
| 192.1.23.45 |
+-----------------+
+------------------------+------------------------+
| APIs | File Names |
|------------------------+-------------------------|
| CreateFile | /path/to/file.txt |
| WriteFile | /path/to/other_file.txt |
+------------------------+------------------------+
The ResultDocument class seems to be the staging ground for outputting results. I have added a couple attributes that I use for presenting IPs and domains, and might be useful if capa outputs further dynamic analysis results in the future. The default, verbose, and vverbose modes all output the same results right now, but the verbose and vverbose modes can probably be expanded.
Also, @mr-tz can you please tell me a bit more about what type of registry key/value analysis you think would be useful? I was thinking about listing: what keys are created, modified, and deleted; whether any of these registry keys are run when the computer starts up; whether functions like RegSetKeySecurity are called, which can grant malware increased privileges; and the registry keys that all of these actions relate to. But I don't want to add too many details for a simple default output - it might be good to add additional details in verbose and vverbose mode. What do you think of these things?
EDIT: I actually think it would be better to refactor the new features out of "common.py" into their own file(s), and also their rendering functions into files like "render_ip_addresses.py", "render_domains.py", "render_file_names.py", etc. Default, verbose, and vverbose rendering functions could be included in each of these files. I am going to work on this, and also work on making sure the suggested changes pass the CI tests.