Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugins: Yarascan + Vadyarascan Context #1287

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

dgmcdona
Copy link
Contributor

@dgmcdona dgmcdona commented Oct 2, 2024

There are limitations with the current implementation of the YaraScan and VadYaraScan plugins that seriously impacts their usefulness in the CLI; namely, the inability to view user-defined amounts of context surrounding yara matches in a hexdump format.

In the CLI, users must now enumerate yara hits with one of one the plugins, then copy information about the hit, such as the PID and offset, to another location, and re-read the data from the layer in which the match occurred within volshell, which is a laborious process. Additionally, neither of the plugins currently render the data using format_hints.HexBytes which would provide a more useful view of the matched bytes.

Within volshell, there is no publicly available API on either the YaraScan or VadYaraScan classes to enumerate hits and interact with those values programatically outside of constructing an instance of the plugin and retrieving values from the TreeGrid returned by the run method. In addition to the changes proposed here, we may want to consider providing classmethods for performing at least yara string searches without requiring consumers to manually update the configuration tree and construct the plugins via their constructors.

This attempts to address these issues by:

  • Using format_hints.HexBytes in the TreeGrids for both the YaraScan and VadYaraScan plugins
  • Adding two IntRequirements to YaraScan that allow users to specify additional context surrounding matches, akin to the -A/-B grep flags
  • Creating an enumerate_matches() method in VadYaraScan to provide an interface for interacting with hits beyond just accessing nodes in the TreeGrid.

@ikelos This is resurrecting #1131, which I read your comment on. This at least partially addresses the issue of placing the new IntRequirements in a better location, although I'm not 100% sure this will meet your requirements if the idea is for these new requirements to specify the additional context before and after binary data returned from other places, and not just within these two plugins.

If you still feel that this is the wrong approach to this problem, and that more parts of the framework than just the yarascan and vadyarascan modules can make use of these requirements, I'm happy to work with you to identify the correct way to do this.

One thing that I am not yet certain about with this PR is the decision to extend the interface with a new enumerate_matches method in the VadYaraScan class and not in the YaraScan class; these two plugins don't return the exact same values in their TreeGrids (VadYaraScan includes a PID as well), so if an equivalent method was added to YaraScan, it likely wouldn't have the same signature, although there is no inheritance relationship between the two plugin classes here, so it may not be an issue.

I also noticed that YaraScan will scan multiple layers, but the returned offsets don't specify the layer in which the offset is valid. If we do move this logic out of the _generator() method and into a publicly exposed method, we will likely want to include the layer name in the returned tuple as well.

This attempts to address limitations with the current implementation
of the `YaraScan` and `VadYaraScan` plugins that seriously impacts their
usefulness in the CLI; namely, the inability to view user-defined context
surrounding yara matches in a hexdump format.

In the CLI, users must now enumerate yara hits with one of one the plugins,
then copy information about the hit, such as the PID and offset, to
another location, and re-read the data from the layer in which the match
occurred within volshell, which is a laborious process.

Within volshell, there is no publicly available API on either the
`YaraScan` or `VadYaraScan` classes to enumerate hits and interact with
those values programatically outside of constructing an instance of the
plugin and retrieving values from the `TreeGrid` returned by the `run`
method. In addition to the changes proposed here, we may want to
consider providing classmethods for performing at least yara string
searches without requiring users to manually update the configuration
and construct the plugins via their constructors.
@dgmcdona dgmcdona force-pushed the dgmcdona/plugins-yarascan-context branch from bff3f5b to fbcda54 Compare October 2, 2024 21:40
@atcuno
Copy link
Contributor

atcuno commented Oct 2, 2024

@ikelos exposing the full hex+ascii dump contents of hits is pretty critical. Right now the results of the plugins are pretty bare and useless, and its been the main complaining point from students in workshops and is a fully valid complaint in my opinion.

@ikelos
Copy link
Member

ikelos commented Oct 6, 2024

Yeah, I think you've found an issue that will be encountered by many of our plugins. I'm still debating the best of dealing with this though. I can't figure out if we should get the requested window size from the user/UI to the plugin (which then requires the user to rerun the plugin just to get a little more data) or if we invent a new return type that says "Data" and contains the layer and the offset to identify it? It would then be up to the UI to retrieve an amount of data and decide how to display it. That would also allow the UI to move forward and back at will (I'm thinking a web UI that "scrolls" you to the right part of the memory image) and in general be much more flexible than the CLI being handed a chunk of data. It shouldn't really affect things that consume the output of the plugin, because they should have the context they were working on and therefore access to all the layers.

I guess the only sticky issue would be if we ever wanted to support live layers, but I think that's not as terribly a useful feature as people may think. So what do you say? The steps we'd need to take would be:

  • Create a new data output type (like the Disassembly Type). We might have to figure out how to sneak it through the "simple data types rule" for return values, but I'd probably recommend allowing a tuple type, and then keeping the types within that simple (like really simple, so no tuples-within-tuples. Also bump the main framework MINOR number, since that should just be an additive change.
  • Implement the display of such return values in the CLI (using default parameters, ideally with sensible defaults and/or the config file somehow, rather than muddying up the CLI, I don't really want to see more than about 2 numeric config options added, but that's not a hard and fast rule)
  • Go around plugins changing data output to this new output type (I'd suggest an autoconverter that could take an object and pull out the layer and offset, just to make it easier to return, although I can already feel people suggesting pushing that to accepting a complex object which I think would be a bad idea because it's leaking what the plugin should do out into the UI). This would include bumping their version number and their required framework version.

So that sounds like about 3 pull requests, whaddya think?

@ikelos
Copy link
Member

ikelos commented Oct 6, 2024

@ikelos exposing the full hex+ascii dump contents of hits is pretty critical. Right now the results of the plugins are pretty bare and useless, and its been the main complaining point from students in workshops and is a fully valid complaint in my opinion.

Thanks, I appreciate the feedback, but this whole paragraph feels unnecessary. It comes across as an attempt to apply pressure on me, and it puts my back up. I'm trying to give this a fair decision, but defensively trying to oversell it to me before you've even heard my response doesn't help its cause. "It's a fully valid complaint in [your] opinion" suggests that you think that in my opinion, it wouln't be a valid complaint, or that I'm immediately going to go against it. Have my other decisions been so unreasonable, or felt targetted against you, that you feel you need to put that in before I've said anything? I'd have much preferred "this is a problem a lot of students encounter", that doesn't denigrate what the plugins currently do, it doesn't push for a particular solution, it doesn't add pressure and it doesn't pit you and your students against any decision I might make that isn't the one being suggested, it just adds facts. I know you feel under pressure at the moment, but we're all on the same side, we're trying to make a good tool, and my job is to make sure it'll work long term, be maintainable and makes sense so we don't have to change it in six months.

@atcuno
Copy link
Contributor

atcuno commented Oct 6, 2024

I wasn't trying to put pressure or you vs. me. I will word better in the future. I was trying to relay student response as we have been running some workshops (20-30 people) as well as having close friends/users go through some hands on exercises on their own to get feedback on features and use cases.

Putting "in my opinion" just meant I agreed with the students. Also, I typed that comment quickly as my flight was getting close to descending, and they would make us put our laptops away. I didn't mean it hostile, but it definitely reads that way. Sorry for that.

@eve-mem
Copy link
Contributor

eve-mem commented Oct 8, 2024

@dgmcdona maybe one for a different PR once all these bits have been worked out but it would probably make sense to also update the linux VmaYaraScan plugin too? Or does this do that already in a way I've missed?

@dgmcdona
Copy link
Contributor Author

dgmcdona commented Oct 8, 2024

Thanks @eve-mem, I had totally overlooked that one. I'll make sure it's included in the effort when it's time to convert the plugins.

iMHLv2 pushed a commit that referenced this pull request Oct 20, 2024
atcuno pushed a commit that referenced this pull request Oct 23, 2024
atcuno pushed a commit that referenced this pull request Oct 24, 2024
@ikelos
Copy link
Member

ikelos commented Nov 6, 2024

Just to update, this is waiting on me to develop a better "data" output that will allow a UI to specify how much context should be provided to data blocks. #1310, which I hope to work on in a couple of weeks if not sooner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants