Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
Aleksandrs Berdicevskis committed Nov 24, 2022
2 parents 7e6218d + b76bd57 commit 643ca10
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@
This is a very preliminary release of some scripts we are using within the Cassandra project to study language change in contemporary Swedish.

## Requirements:
- Ruby (recommended version: 2.6.3p62. Any other version should in principle also be fine, but if you use 3.0+, you may have to tweak some of the scripts, since some methods have become deprecated)
- The necessary gems (=libraries, packages) should be included in the default Ruby installation, but if not, use "gem install gem_name". For plotting, you would have to install the `rinruby` gem
- For plotting, you would need to have R installed (tested on 4.0.2)
- Ruby (recommended version: 2.6.3p62. Any other version should in principle also be fine, but if you use 3.0+, you may have to tweak some of the scripts, since some methods have become deprecated);
- The necessary gems (=libraries, packages) should be included in the default Ruby installation, but if not, use "gem install gem_name". For plotting, you would have to install the `rinruby` gem;
- For plotting, you would need to have R installed (tested on 4.0.2).

## Usage

### General
`korp16.rb` will output a json and a tsv that contain the relative frequencies of the given variant(s) across the years. It will do so by running `count_time` in the Korp API: https://ws.spraakbanken.gu.se/docs/korp. So basically it's a wrapper for running this command and processing its output in a convenient way
`korp16.rb` will output a json and a tsv that contain the relative frequencies of the given variant(s) across the years. It will do so by running `count_time` in the Korp API: https://ws.spraakbanken.gu.se/docs/korp. So basically it's a wrapper for running this command and processing its output in a convenient way.

`plot.rb` draws a nice graph, using the tsv file
`plot.rb` draws a nice graph, using the tsv file.

### Variable
`korp16.rb` needs to know what to look for. You have to descripe your variable using the CQP language (https://cwb.sourceforge.io/files/CQP_Tutorial.pdf) in the file `korp_queries.rb`. The description consists of two or three lines.
Expand All @@ -22,7 +22,7 @@ Line 2: variant1 = your_variant1_in_CQP (see examples in the file)

Line 3 (optional): variant2 = your_variant2_in_CQP (see examples in the file)

Use only variant1 if you want to know how its frequency (both absolute and normalized by corpus size, measured in instances per million, ipm) change over time). Use variant1 and variant2 if you want to know how they compete against each other. In this case, you will see frequencies of both variants, both absolute and relative (normalized by the sum of the frequencies of both variant1 and variant2; measured as proportion of 0 to 1). It is recommended to have the innovative variant as variant2. `plot.rb` will plot the relative frequency of variant2.
Use only variant1 if you want to know how its frequency (both absolute and normalized by corpus size, measured in instances per million, ipm) changes over time. Use variant1 and variant2 if you want to know how they compete against each other. In this case, you will see frequencies of both variants, both absolute and relative ("relative" = normalized by the sum of the frequencies of both variant1 and variant2; measured as proportion of 0 to 1). It is recommended to have the innovative variant as variant2. `plot.rb` will plot the relative frequency of variant2.

When launching `korp16.rb`, provide the label of your variable in the command line: `ruby korp16.rb --variable your_label --corpus your_corpus`
If you are using only one variant, add `--nvariants 1`. The default is `--nvariants 2`, it's not necessary to specify it.
Expand Down Expand Up @@ -50,7 +50,7 @@ Run `ruby plot.rb --variable your_label --corpus familjeliv-all [--nvariants n]`

There is some additional functionality (which may not always work properly), for instance:
`--granularity`: (m)onth or (y)ear (default)
`--user`: whether the script should look only in texts authored by the user with a specific `username`. Might now work correctly for some corpora.
`--user`: whether the script should look only in texts authored by the user with a specific `username`. Might not work correctly for some corpora.

The script is using `corpus_tools.rb` as an auxiliary script.
Output will be stored in the folder `variables` (`\\variable_name\\corpus_name\\subcorpus_name\\username.{json,tsv}.` By default, `username = all_users`.
Expand Down

0 comments on commit 643ca10

Please sign in to comment.