This repository contains a binary built with tree-sitter that lets you:
- Inspect the concrete syntax tree of a source file
- Use pre-written tree-sitter query files to locate important symbols in source code
- Format output in JSON to use the results in your own applications
In particular, this repo provides a binary prepackaged with:
- A recent version of the tree-sitter library
- A large number of tree-sitter grammars
- An implementation of many common query predicates
Contributions are welcome and we encourage using this tool for any applications that involve code syntax analysis. For example, these queries are used by Codeium Search to index code locally for repo-wide semantic search. If you use Codeium Search, adding queries for your language here will enable it to work better on your own code!
# Print all names and arguments from function definitions.
fd -e js \
| xargs -i ./parse -quiet -use_tags_query -json -json_include_path -file '{}' \
| jq -r '.
| select(.captures."definition.function" != null)
| .file + ":" + .captures.name[0].text + .captures."codeium.parameters"[0].text'
# Output:
# examples/example.js:add(a, b)
$ ./download_parse.sh
$ ./parse -file examples/example.js -named_only
program [0, 0] - [4, 0] "// Adds two numbers.\n…"
comment [0, 0] - [0, 20] "// Adds two numbers."
function_declaration [1, 0] - [3, 1] "function add(a, b) {\n…"
name: identifier [1, 9] - [1, 12] "add"
parameters: formal_parameters [1, 12] - [1, 18] "(a, b)"
identifier [1, 13] - [1, 14] "a"
identifier [1, 16] - [1, 17] "b"
body: statement_block [1, 19] - [3, 1] "{\n…"
return_statement [2, 4] - [2, 17] "return a + b;"
binary_expression [2, 11] - [2, 16] "a + b"
left: identifier [2, 11] - [2, 12] "a"
right: identifier [2, 15] - [2, 16] "b"
$ ./parse -file examples/example.js -use_tags_query -json | jq ".captures.doc[0].text"
"// Adds two numbers."
Queries try to follow the conventions established by tree-sitter.
Most captures also include documentation as @doc
. @definition.function
and @definition.method
also capture @codeium.parameters
.
Top-level capture | Python | TypeScript | JavaScript | Go | Java | C++ | PHP | Ruby | C# | Perl | Kotlin | Dart | Bash | C |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@definition.class |
✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
@definition.function |
✓ | ✓1 | ✓ | ✓ | N/A | ✓ | ✓ | N/A | N/A | ✓ | ✓ | ✓ | ✓ | ✓ |
@definition.method |
✓2 | ✓1 | ✓ | ✓ | ✓ | ✓2 | ✓ | ✓ | ✓ | ✓2 | ✓ | ✓2 | ✓ | ✓ |
@definition.constructor |
✓ | ✓ | ✓ | N/A | ✗ | ✗ | ✗ | ✗ | ✓ | ✗ | ✓ | ✗ | N/A | N/A |
@definition.interface |
N/A | ✓ | N/A | ✓ | ✓ | N/A | ✓ | ✗ | ✓ | N/A | ✗ | ✗ | N/A | N/A |
@definition.namespace |
N/A | ✓ | N/A | N/A | N/A | ✓ | ✓ | N/A | ✓ | ✗ | ✗ | N/A | N/A | N/A |
@definition.module |
N/A | ✓ | N/A | N/A | N/A | ✗ | N/A | ✓ | N/A | N/A | N/A | ✗ | N/A | N/A |
@definition.type |
N/A | ✓ | N/A | ✓ | N/A | ✗ | ✗ | N/A | N/A | N/A | N/A | ✗ | N/A | N/A |
@definition.constant |
✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | N/A | ✗ |
@definition.enum |
✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | N/A | ✓ | N/A | ✗ | ✗ | N/A | ✗ |
@definition.import |
✓ | ✓ | ✓ | ✗ | ✗ | ✗ | N/A | ✓ | ✗ | ✓ | ✓ | ✗ | N/A | ✓ |
@definition.include |
N/A | N/A | N/A | N/A | N/A | ✗ | ✗ | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
@definition.package |
N/A | N/A | N/A | ✓ | ✓ | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
@reference.call |
✓ | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ |
@reference.class |
✓3 | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | ✗ | N/A | N/A |
Language | Supported injections |
---|---|
Vue | JavaScript, TypeScript |
HTML | JavaScript |
Want to write a query for a new language? tags.scm
and other queries in each language's tree-sitter repository, like tree-sitter-javascript, are a good place to start.
$ ./parse -supported_predicates
#eq?/#not-eq?
(#eq? <@capture|"literal"> <@capture|"literal">)
Checks if two values are equal.
#has-parent?/#not-has-parent?
(#has-parent? @capture node_type...)
Checks if @capture has a parent node of any of the given types.
#has-type?/#not-has-type?
(#has-type? @capture node_type...)
Checks if @capture has a node of any of the given types.
#lineage-from-name!
(#lineage-from-name! "literal")
If the name captures scopes, split by "literal" and retain the last element
as the name. The other elements are appended to the lineage.
#match?/#not-match?
(#match? @capture "regex")
Checks if the text for @capture matches the given regular expression.
#select-adjacent!
(#select-adjacent! @capture @anchor)
Selects @capture nodes contiguous with @anchor (all starting and ending on
adjacent lines).
#set!
(#set! key <@capture|"literal">)
Store metadata as a side effect of a match.
#strip!
(#strip! @capture "regex")
Removes all matching text from all @capture nodes.
Need a predicate which hasn't been implemented? File an issue! We try to use predicates from nvim-treesitter.
$ ./parse -supported_languages
ada
c
cpp
csharp
css
dart
go
hcl
html
java
javascript
json
julia
kotlin
latex
markdown
ocaml
ocaml_interface
perl
php
protobuf
python
ruby
rust
shell
svelte
swift
toml
tree_sitter_query
tsx
typescript
vue
yaml
Looking for support for another language? File an issue with a link to the repo that contains the grammar.
Pull requests are welcome. For non-issue discussions about codeium-parse
, join
our Discord.
- You can create new source files with patterns you want to target in
test_files/
. - Look at the syntax tree using
./parse -file test_files/<your file>
to get a sense of how to capture the pattern. - Learn the query syntax from tree-sitter documentation.
- Run
./goldens.sh
to see what your query captures.