-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plugins: Yarascan + Vadyarascan Context #1287
base: develop
Are you sure you want to change the base?
Plugins: Yarascan + Vadyarascan Context #1287
Conversation
This attempts to address limitations with the current implementation of the `YaraScan` and `VadYaraScan` plugins that seriously impacts their usefulness in the CLI; namely, the inability to view user-defined context surrounding yara matches in a hexdump format. In the CLI, users must now enumerate yara hits with one of one the plugins, then copy information about the hit, such as the PID and offset, to another location, and re-read the data from the layer in which the match occurred within volshell, which is a laborious process. Within volshell, there is no publicly available API on either the `YaraScan` or `VadYaraScan` classes to enumerate hits and interact with those values programatically outside of constructing an instance of the plugin and retrieving values from the `TreeGrid` returned by the `run` method. In addition to the changes proposed here, we may want to consider providing classmethods for performing at least yara string searches without requiring users to manually update the configuration and construct the plugins via their constructors.
bff3f5b
to
fbcda54
Compare
@ikelos exposing the full hex+ascii dump contents of hits is pretty critical. Right now the results of the plugins are pretty bare and useless, and its been the main complaining point from students in workshops and is a fully valid complaint in my opinion. |
Yeah, I think you've found an issue that will be encountered by many of our plugins. I'm still debating the best of dealing with this though. I can't figure out if we should get the requested window size from the user/UI to the plugin (which then requires the user to rerun the plugin just to get a little more data) or if we invent a new return type that says "Data" and contains the layer and the offset to identify it? It would then be up to the UI to retrieve an amount of data and decide how to display it. That would also allow the UI to move forward and back at will (I'm thinking a web UI that "scrolls" you to the right part of the memory image) and in general be much more flexible than the CLI being handed a chunk of data. It shouldn't really affect things that consume the output of the plugin, because they should have the context they were working on and therefore access to all the layers. I guess the only sticky issue would be if we ever wanted to support live layers, but I think that's not as terribly a useful feature as people may think. So what do you say? The steps we'd need to take would be:
So that sounds like about 3 pull requests, whaddya think? |
Thanks, I appreciate the feedback, but this whole paragraph feels unnecessary. It comes across as an attempt to apply pressure on me, and it puts my back up. I'm trying to give this a fair decision, but defensively trying to oversell it to me before you've even heard my response doesn't help its cause. "It's a fully valid complaint in [your] opinion" suggests that you think that in my opinion, it wouln't be a valid complaint, or that I'm immediately going to go against it. Have my other decisions been so unreasonable, or felt targetted against you, that you feel you need to put that in before I've said anything? I'd have much preferred "this is a problem a lot of students encounter", that doesn't denigrate what the plugins currently do, it doesn't push for a particular solution, it doesn't add pressure and it doesn't pit you and your students against any decision I might make that isn't the one being suggested, it just adds facts. I know you feel under pressure at the moment, but we're all on the same side, we're trying to make a good tool, and my job is to make sure it'll work long term, be maintainable and makes sense so we don't have to change it in six months. |
I wasn't trying to put pressure or you vs. me. I will word better in the future. I was trying to relay student response as we have been running some workshops (20-30 people) as well as having close friends/users go through some hands on exercises on their own to get feedback on features and use cases. Putting "in my opinion" just meant I agreed with the students. Also, I typed that comment quickly as my flight was getting close to descending, and they would make us put our laptops away. I didn't mean it hostile, but it definitely reads that way. Sorry for that. |
@dgmcdona maybe one for a different PR once all these bits have been worked out but it would probably make sense to also update the linux VmaYaraScan plugin too? Or does this do that already in a way I've missed? |
Thanks @eve-mem, I had totally overlooked that one. I'll make sure it's included in the effort when it's time to convert the plugins. |
Just to update, this is waiting on me to develop a better "data" output that will allow a UI to specify how much context should be provided to data blocks. #1310, which I hope to work on in a couple of weeks if not sooner. |
There are limitations with the current implementation of the
YaraScan
andVadYaraScan
plugins that seriously impacts their usefulness in the CLI; namely, the inability to view user-defined amounts of context surrounding yara matches in a hexdump format.In the CLI, users must now enumerate yara hits with one of one the plugins, then copy information about the hit, such as the PID and offset, to another location, and re-read the data from the layer in which the match occurred within volshell, which is a laborious process. Additionally, neither of the plugins currently render the data using
format_hints.HexBytes
which would provide a more useful view of the matched bytes.Within volshell, there is no publicly available API on either the
YaraScan
orVadYaraScan
classes to enumerate hits and interact with those values programatically outside of constructing an instance of the plugin and retrieving values from theTreeGrid
returned by therun
method. In addition to the changes proposed here, we may want to consider providing classmethods for performing at least yara string searches without requiring consumers to manually update the configuration tree and construct the plugins via their constructors.This attempts to address these issues by:
format_hints.HexBytes
in theTreeGrid
s for both theYaraScan
andVadYaraScan
pluginsIntRequirements
toYaraScan
that allow users to specify additional context surrounding matches, akin to the-A
/-B
grep flagsenumerate_matches()
method inVadYaraScan
to provide an interface for interacting with hits beyond just accessing nodes in theTreeGrid
.@ikelos This is resurrecting #1131, which I read your comment on. This at least partially addresses the issue of placing the new
IntRequirement
s in a better location, although I'm not 100% sure this will meet your requirements if the idea is for these new requirements to specify the additional context before and after binary data returned from other places, and not just within these two plugins.If you still feel that this is the wrong approach to this problem, and that more parts of the framework than just the
yarascan
andvadyarascan
modules can make use of these requirements, I'm happy to work with you to identify the correct way to do this.One thing that I am not yet certain about with this PR is the decision to extend the interface with a new
enumerate_matches
method in theVadYaraScan
class and not in theYaraScan
class; these two plugins don't return the exact same values in theirTreeGrid
s (VadYaraScan
includes a PID as well), so if an equivalent method was added toYaraScan
, it likely wouldn't have the same signature, although there is no inheritance relationship between the two plugin classes here, so it may not be an issue.I also noticed that
YaraScan
will scan multiple layers, but the returned offsets don't specify the layer in which the offset is valid. If we do move this logic out of the_generator()
method and into a publicly exposed method, we will likely want to include the layer name in the returned tuple as well.