Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: keyword schema mapping v1 #11951

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lukashino
Copy link
Contributor

Explained in the MD document that is part of the PR - switch to "rich diff" mode for conversion to a readable format.

@jasonish
Copy link
Member

jasonish commented Oct 13, 2024

It does look like a schema can be extended with custom fields (https://json-schema.org/draft/2019-09/json-schema-core#rfc.section.6.5), my only comment here would be some sort of prefix to make clear they are suricata extensions to easily differentiate what parts are jsonschema and what are our custom extensions to it.

Copy link

codecov bot commented Oct 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.74%. Comparing base (d5dd549) to head (ecaa8a1).
Report is 18 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #11951      +/-   ##
==========================================
+ Coverage   82.70%   82.74%   +0.03%     
==========================================
  Files         912      910       -2     
  Lines      249102   249008      -94     
==========================================
+ Hits       206018   206035      +17     
+ Misses      43084    42973     -111     
Flag Coverage Δ
fuzzcorpus 60.78% <ø> (+0.10%) ⬆️
livemode 18.70% <ø> (-0.02%) ⬇️
pcap 44.10% <ø> (-0.01%) ⬇️
suricata-verify 62.19% <ø> (+0.04%) ⬆️
unittests 59.01% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Copy link
Member

@inashivb inashivb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow. Thanks. This is very elaborate.
When I had heard the problem statement, I had misunderstood it to be a problem of a simple script that flattens eve.json and tries a regex match against the --list-keywords output and vice-versa and gives an output about possibly missing fields in both.

This would indeed be helpful in cases where we do not have an exact mapping.

Q: I think I do not understand the concept of an N:N mapping here. What does it mean?
I thought we would be looking for 1:1 mapping b/w schema:implemented rule keywords..

@lukashino
Copy link
Contributor Author

N:N mapping means that we can have multiple keywords can be related to one Suricata output and multiple Suricata outputs can be related to multiple detection keywords. The best example is probably from the files where:

To eve.json output fields like:

  • nfs.filename
  • ftp_data.filename
  • files[].filename
  • fileinfo.filename

Following keywords are somehow related:

  • file.name (full file name - mycustomfilename.pdf)
  • filename (only the prefix - mycustomfilename)
  • fileext (only the suffix - .pdf)

1:1 mapping is not the goal here, as I think it can be more user-friendly to use the filename keyword to match on the actual filename.
On user friendliness (or rulewriter friendliness) example might better used http.uri and http.uri.raw example where both keywords relate to http.uri eve.json output field. But as http.uri is normalized, the rule writer doesn't need to care in what form the original URI was written, the person simply writes the content that should be searched for.

@lukashino
Copy link
Contributor Author

Now we need to agree if this suggestion would be sufficient and then we can proceed with the tooling. (Added also a prefix to the keyword.)

"url": {
    "type": "string",
    "mapping-detect-keywords": [
        "http.uri.raw",
        "http.uri",
        "http.urilen"
    ]
},

- fileext

- file.name, filename, fileext can access eve.json fields:
- nfs.filename
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these nfs and ftp_data fields are different from the fileinfo records


To make it managable in a text editor I thought of describing the relationship primarily in one direction e.g. what eve.json fields are described by what keywords. The other direction, what fields are affected by what keywords, can be obtained by inversing the data structure.

The keyword and the eve.json fields can be in three states (somewhat similar to Git) - tracked, unassigned, ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what could tracked look like? I assume we'd have a redmine ticket reference there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm by tracked I meant mapped (so as it should be).
Ignored could possibly reference/contain an explanation.
Unassigned would be the screaming one.

@victorjulien
Copy link
Member

Would it make sense to have a mapping specify whether its an exact match, a fuzzy match, a subset or a superset?

Like http.request_line contains raw uri, method, etc. So http.request_line can be used to match on those fields, and could be in the "superset" group... It's almost like a hierarchy: raw content -> http.request frame -> http.request_line -> http.uri. Each of them gets more precise.

@lukashino
Copy link
Contributor Author

indeed, I think it might be a good idea to have at least the possibility for that.
Maybe creating a hierarchy would be "too much". For instance, http.urilen would not directly fit in but is related to http.uri.raw. Same with http.uri and http.uri.raw, what is more precise?
Also, I am not convinced you could be as precise with raw content as you are with http-related keywords.
So instead of creating a hierarchy, we could just aim for related keywords.

The other question is, will it help us with anything?
The current task doesn't require that.

For http.uri eve.json field you could have:
keywords-exact-match:

  • http.uri
  • http.uri.raw
  • http.urilen

keywords-partly-match:

  • http.request_line
  • frame.http.request

I'll expand on this more. Thanks for bringing up request_line, missed that.

@lukashino
Copy link
Contributor Author

Some extra notes for the next revision:

  • try to make a more hierarchy-like structure
  • maybe leave out urilen for now, try to propose a structure, and maybe then we will try to adhere to more consistent keyword matching
  • urilen can be replaced with bsize now for every buffer there - urilen could be made obsolete legacy keyword now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants