-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show prevalence of rules in the output #1737
base: master
Are you sure you want to change the base?
Show prevalence of rules in the output #1737
Conversation
…Goel-04/capa into Aayush-Goel-04/Issue#520
Update default.py Update CHANGELOG.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start!
Do you have an idea on if/how to display this data for the other output modes (verbose, very verbose, and also JSON)?
We can also add create a new field capa/capa/render/result_document.py Lines 559 to 575 in 9d21add
What are your thoughts @mr-tz |
Delete try.py, rules_prevalence.pickle
cdcd32f
to
f6058b1
Compare
26e4096
to
712cee3
Compare
712cee3
to
c5302cd
Compare
Co-authored-by: Moritz <[email protected]>
That could work well if we find a place that requires few modifications and is flexible. I think we'd want to keep prevalence data and rule information separate (with a separate DB as you're proposing here). |
for
we can do similar for |
capa/render/result_document.py
Outdated
CD = Path(__file__).resolve().parent.parent.parent | ||
file = CD / "assets" / "rules_prevalence_data" / "rules_prevalence.json.gz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use get_default_root()
Line 444 in 210a13d
def get_default_root() -> Path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using get_default_root
works well locally but it cause circular import when being during pyinstaller build.
@williballenthin I suggest moving such functions to capa.helpers
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moving it makes sense (see #1821 also)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mr-tz I suggest we move ahead with proposal 3 in above mentioned PR.
moving below to a new capa.loader
or we can move them to capa.helper
has_file_limitation
is_supported_format
is_supported_arch
get_arch
is_supported_os
get_os
is_running_standalone
get_default_root
get_default_signatures
get_workspace
get_extractor
get_file_extractors
get_signatures
get_sample_analysis
collect_metadata
compute_dynamic_layout
compute_static_layout
compute_layout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me!
@@ -521,6 +544,7 @@ def from_capa(cls, rule: capa.rules.Rule) -> "RuleMetadata": | |||
return cls( | |||
name=rule.meta.get("name"), | |||
namespace=rule.meta.get("namespace"), | |||
prevalence=load_rules_prevalence().get(rule.meta.get("name"), "unknown"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the rule prevalence database distributed with capa the library? i think its important that people be able to use capa the library without maintaining this database. so perhaps we want to handle the case of the database not existing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can provide a warning if no db is found (in case that's not already there) pointing to one and explaining shortly what it does
capa/render/result_document.py
Outdated
if not file.exists(): | ||
return {} | ||
with gzip.open(file, "rb") as gzfile: | ||
return json.loads(gzfile.read().decode("utf-8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while we're at it, is it worth defining a pydantic data model for the DB file/format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the format is dict[rule name, prevalence]
which will be hard to represent in pydantic, unless we enumerate all the rule names as potential values. i think the type hint above is a good start. still, adding some comments here showing a snippet of the file would be valuable.
Comments on loading rules_prevalence and warning if file not found
Apologies for disappearing there for a bit – college placement stuff got pretty intense. Back to PR, Tests are failing in pyinstaller due circular import when trying to fetch path for rules_prevalence database
What are ur thoughts @mr-tz @williballenthin |
I think moving to Python files analogous to the COM DB files sounds good. There's been a bunch of changes recently on the API, so please ensure the PR is up to date with master. |
I have converted the database to python file. Now we just need the actual prevalence values for rules, and this will be good to go. |
""" | ||
Load and return a dictionary containing prevalence information for rules defined in capa. | ||
|
||
Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns: | |
Return: |
relates to #520
Checklist