Plugins: Yarascan + Vadyarascan Context #1287

dgmcdona · 2024-10-02T21:34:09Z

There are limitations with the current implementation of the YaraScan and VadYaraScan plugins that seriously impacts their usefulness in the CLI; namely, the inability to view user-defined amounts of context surrounding yara matches in a hexdump format.

In the CLI, users must now enumerate yara hits with one of one the plugins, then copy information about the hit, such as the PID and offset, to another location, and re-read the data from the layer in which the match occurred within volshell, which is a laborious process. Additionally, neither of the plugins currently render the data using format_hints.HexBytes which would provide a more useful view of the matched bytes.

Within volshell, there is no publicly available API on either the YaraScan or VadYaraScan classes to enumerate hits and interact with those values programatically outside of constructing an instance of the plugin and retrieving values from the TreeGrid returned by the run method. In addition to the changes proposed here, we may want to consider providing classmethods for performing at least yara string searches without requiring consumers to manually update the configuration tree and construct the plugins via their constructors.

This attempts to address these issues by:

Using format_hints.HexBytes in the TreeGrids for both the YaraScan and VadYaraScan plugins
Adding two IntRequirements to YaraScan that allow users to specify additional context surrounding matches, akin to the -A/-B grep flags
Creating an enumerate_matches() method in VadYaraScan to provide an interface for interacting with hits beyond just accessing nodes in the TreeGrid.

@ikelos This is resurrecting #1131, which I read your comment on. This at least partially addresses the issue of placing the new IntRequirements in a better location, although I'm not 100% sure this will meet your requirements if the idea is for these new requirements to specify the additional context before and after binary data returned from other places, and not just within these two plugins.

If you still feel that this is the wrong approach to this problem, and that more parts of the framework than just the yarascan and vadyarascan modules can make use of these requirements, I'm happy to work with you to identify the correct way to do this.

One thing that I am not yet certain about with this PR is the decision to extend the interface with a new enumerate_matches method in the VadYaraScan class and not in the YaraScan class; these two plugins don't return the exact same values in their TreeGrids (VadYaraScan includes a PID as well), so if an equivalent method was added to YaraScan, it likely wouldn't have the same signature, although there is no inheritance relationship between the two plugin classes here, so it may not be an issue.

I also noticed that YaraScan will scan multiple layers, but the returned offsets don't specify the layer in which the offset is valid. If we do move this logic out of the _generator() method and into a publicly exposed method, we will likely want to include the layer name in the returned tuple as well.

volatility3/framework/plugins/yarascan.py

This attempts to address limitations with the current implementation of the `YaraScan` and `VadYaraScan` plugins that seriously impacts their usefulness in the CLI; namely, the inability to view user-defined context surrounding yara matches in a hexdump format. In the CLI, users must now enumerate yara hits with one of one the plugins, then copy information about the hit, such as the PID and offset, to another location, and re-read the data from the layer in which the match occurred within volshell, which is a laborious process. Within volshell, there is no publicly available API on either the `YaraScan` or `VadYaraScan` classes to enumerate hits and interact with those values programatically outside of constructing an instance of the plugin and retrieving values from the `TreeGrid` returned by the `run` method. In addition to the changes proposed here, we may want to consider providing classmethods for performing at least yara string searches without requiring users to manually update the configuration and construct the plugins via their constructors.

atcuno · 2024-10-02T22:13:59Z

@ikelos exposing the full hex+ascii dump contents of hits is pretty critical. Right now the results of the plugins are pretty bare and useless, and its been the main complaining point from students in workshops and is a fully valid complaint in my opinion.

ikelos · 2024-10-06T16:11:33Z

Yeah, I think you've found an issue that will be encountered by many of our plugins. I'm still debating the best of dealing with this though. I can't figure out if we should get the requested window size from the user/UI to the plugin (which then requires the user to rerun the plugin just to get a little more data) or if we invent a new return type that says "Data" and contains the layer and the offset to identify it? It would then be up to the UI to retrieve an amount of data and decide how to display it. That would also allow the UI to move forward and back at will (I'm thinking a web UI that "scrolls" you to the right part of the memory image) and in general be much more flexible than the CLI being handed a chunk of data. It shouldn't really affect things that consume the output of the plugin, because they should have the context they were working on and therefore access to all the layers.

I guess the only sticky issue would be if we ever wanted to support live layers, but I think that's not as terribly a useful feature as people may think. So what do you say? The steps we'd need to take would be:

Create a new data output type (like the Disassembly Type). We might have to figure out how to sneak it through the "simple data types rule" for return values, but I'd probably recommend allowing a tuple type, and then keeping the types within that simple (like really simple, so no tuples-within-tuples. Also bump the main framework MINOR number, since that should just be an additive change.
Implement the display of such return values in the CLI (using default parameters, ideally with sensible defaults and/or the config file somehow, rather than muddying up the CLI, I don't really want to see more than about 2 numeric config options added, but that's not a hard and fast rule)
Go around plugins changing data output to this new output type (I'd suggest an autoconverter that could take an object and pull out the layer and offset, just to make it easier to return, although I can already feel people suggesting pushing that to accepting a complex object which I think would be a bad idea because it's leaking what the plugin should do out into the UI). This would include bumping their version number and their required framework version.

So that sounds like about 3 pull requests, whaddya think?

ikelos · 2024-10-06T16:22:35Z

@ikelos exposing the full hex+ascii dump contents of hits is pretty critical. Right now the results of the plugins are pretty bare and useless, and its been the main complaining point from students in workshops and is a fully valid complaint in my opinion.

Thanks, I appreciate the feedback, but this whole paragraph feels unnecessary. It comes across as an attempt to apply pressure on me, and it puts my back up. I'm trying to give this a fair decision, but defensively trying to oversell it to me before you've even heard my response doesn't help its cause. "It's a fully valid complaint in [your] opinion" suggests that you think that in my opinion, it wouln't be a valid complaint, or that I'm immediately going to go against it. Have my other decisions been so unreasonable, or felt targetted against you, that you feel you need to put that in before I've said anything? I'd have much preferred "this is a problem a lot of students encounter", that doesn't denigrate what the plugins currently do, it doesn't push for a particular solution, it doesn't add pressure and it doesn't pit you and your students against any decision I might make that isn't the one being suggested, it just adds facts. I know you feel under pressure at the moment, but we're all on the same side, we're trying to make a good tool, and my job is to make sure it'll work long term, be maintainable and makes sense so we don't have to change it in six months.

atcuno · 2024-10-06T17:17:00Z

I wasn't trying to put pressure or you vs. me. I will word better in the future. I was trying to relay student response as we have been running some workshops (20-30 people) as well as having close friends/users go through some hands on exercises on their own to get feedback on features and use cases.

Putting "in my opinion" just meant I agreed with the students. Also, I typed that comment quickly as my flight was getting close to descending, and they would make us put our laptops away. I didn't mean it hostile, but it definitely reads that way. Sorry for that.

eve-mem · 2024-10-08T05:55:46Z

@dgmcdona maybe one for a different PR once all these bits have been worked out but it would probably make sense to also update the linux VmaYaraScan plugin too? Or does this do that already in a way I've missed?

dgmcdona · 2024-10-08T17:48:49Z

Thanks @eve-mem, I had totally overlooked that one. I'll make sure it's included in the effort when it's time to convert the plugins.

This is a port of #1287

ikelos · 2024-11-06T23:26:02Z

Just to update, this is waiting on me to develop a better "data" output that will allow a UI to specify how much context should be provided to data blocks. #1310, which I hope to work on in a couple of weeks if not sooner.

github-advanced-security bot found potential problems Oct 2, 2024

View reviewed changes

volatility3/framework/plugins/yarascan.py Fixed Show resolved Hide resolved

dgmcdona force-pushed the dgmcdona/plugins-yarascan-context branch from bff3f5b to fbcda54 Compare October 2, 2024 21:40

atcuno added the parity-release label Oct 3, 2024

dgmcdona mentioned this pull request Oct 11, 2024

Framework: Add 'Data' type to base types #1310

Open

iMHLv2 pushed a commit that referenced this pull request Oct 20, 2024

Plugins: Yarascan + Vadyarascan Context #1287

0073734

This is a port of #1287

atcuno pushed a commit that referenced this pull request Oct 23, 2024

Plugins: Yarascan + Vadyarascan Context #1287

6031d53

This is a port of #1287

atcuno pushed a commit that referenced this pull request Oct 24, 2024

Plugins: Yarascan + Vadyarascan Context #1287

3a4696f

This is a port of #1287

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plugins: Yarascan + Vadyarascan Context #1287

Plugins: Yarascan + Vadyarascan Context #1287

dgmcdona commented Oct 2, 2024

atcuno commented Oct 2, 2024

ikelos commented Oct 6, 2024

ikelos commented Oct 6, 2024

atcuno commented Oct 6, 2024

eve-mem commented Oct 8, 2024

dgmcdona commented Oct 8, 2024

ikelos commented Nov 6, 2024 •

edited

Loading

Plugins: Yarascan + Vadyarascan Context #1287

Are you sure you want to change the base?

Plugins: Yarascan + Vadyarascan Context #1287

Conversation

dgmcdona commented Oct 2, 2024

atcuno commented Oct 2, 2024

ikelos commented Oct 6, 2024

ikelos commented Oct 6, 2024

atcuno commented Oct 6, 2024

eve-mem commented Oct 8, 2024

dgmcdona commented Oct 8, 2024

ikelos commented Nov 6, 2024 • edited Loading

ikelos commented Nov 6, 2024 •

edited

Loading