Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for analysis of source code/scripted languages #1080
base: master
Are you sure you want to change the base?
add support for analysis of source code/scripted languages #1080
Changes from 39 commits
bbd3f70
8173397
428f6bc
a6d7ba2
80bf78b
cf3dc7e
9d7f575
3d4b4ec
eca7ead
5fd953f
1f79db9
a58bc0b
5ddb8ba
31e2fb9
5bf3f18
a4529fc
d5de9a1
6c10458
9bd9824
2594849
619ed94
5e23802
5d83e8d
9570523
7c5e6e3
1e0326a
ca1939f
d7ab2db
5cfbecc
26cc1bc
2a9e76f
672ca71
ca426ca
fd80277
d0c4acb
ad31d83
e52a9b3
b27713b
b2df2b0
a0379a6
eeecb63
cebc5e1
d7dcc94
32dc5ff
5e85a6e
614900f
bb08181
1fd9d4a
7ba978f
25cf09b
e693573
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use
format
for this? e.g.format: C#
.pro:
con:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with overloading the file format feature is that file to language is a one-to-many mapping, e.g. there can be embedded templates that contain multiple different script languages such as C# for server-side scripts and JavaScript for client-side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean we only support Linux?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tree-sitter needs to compile its (C) language bindings. Although I have a limited knowledge of package management, I've suggested to Moritz that we should precompile and package the supported tree-sitter bindings for each platform we support. The current state is a temporary measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm lets find a better place for this initialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global in this file is a good place to start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this call generate any exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can throw type errors (which I believe we prevent with mypy) and a value error if parsing completely fails. Then the parse method will throw a ValueError, so the engine will throw a ValueError etc.: I can handle it in the following way at the Extractor level:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it is neither?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is no easy way to remedy this. From what I understand about templates in general is that the syntax is determined by the templating engine. In other words, there is easy way to detect from an unknown template which templating engine is being used (asp.net (and if so, what language), razor, ejs, erb, mako, jinja2, django, cheetah, go's html/template etc., not to mention each has their own syntax (some might use regular programming languages like C# to embed server logic, some might just contain very rudimentary placeholders/logic.
Here I am assuming that we only support EJS and C# in ASPX at the moment as embedded templates. This is because Tree-sitter embedded templates parser can only parse EJS and ERB (and we are not interested in embedded Ruby at the moment as far as I'm concerned). What's more, the default language for ASPX is VB, therefore if anyone wants to use C#, they need to include a @ Page directive with a Language attribute (see: https://docs.microsoft.com/en-us/previous-versions/aspnet/k33801s3(v=vs.100), https://docs.microsoft.com/en-us/previous-versions/dotnet/netframework-4.0/ydy4x04a(v=vs.100)?redirectedfrom=MSDN, https://docs.microsoft.com/en-us/previous-versions/aspnet/fbdt8kk7(v=vs.100)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still assume that it's JS whenever it's not CS?
Could raise an Exception instead or are there other safe-guards in place before we get here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document why we are only/specially handling
ASPX
here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to the issue discussed here: #1080 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to add an explicit encoding for these calls.