For texts written in contemporary Swedish Sparv can generate the following types of annotations:

Part of speech tagging:
- pos: part of speech
- msd: morphosyntactic tag
Tool: Hunpos
Model: in-house model trained on SUC 3.0
Tag set: MSD tags
SALDO-based analysis:
- baseform: citation form
- lemgram: lemgram, identifies the inflectional table (using SALDO tags)
- sense: identifies a sense in SALDO and its probability
- (saldo: identifies a sense in SALDO - will be removed soon)
- sentiment: sentiment score using SenSALDO
Compound analysis (also based on SALDO):
- complemgram: compound lemgram
- compwf: compound word form
- (prefix: initial part of a compound - will be removed soon)
- (suffix: final part of a compound - will be removed soon)
Dependency analysis:
- ref: the position of the word in the sentence
- dephead: dependency head, the ref of the word which the current word modifies or is dependent of
- deprel: dependency relation, the relation of the current word to its dependency head
Tool: MaltParser
Model: swemalt, trained on Swedish Treebank
Tag set: Mamba-Dep
Named entity recognition:
- ne.ex: named entity (name expression, numerical expression or time expression)
- ne.type: named entity type
- ne.subtype: named entity sub type
Tool: hfst-SweNER
References: HFST-SweNER – A New NER Resource for Swedish, Reducing the effect of name explosion
Readability metrics:
- text.lix: the Swedish readability metric LIX, läsbarhetsindex
- text.ovix: the Swedish readability metric OVIX, ordvariationsindex
- text.nk: the Swedish readability metric nominalkvot
Lexical classes:
- blingbring: lexical class from the Blingbring resource (on word level)
- swefn: frames from swedish FrameNet (on word level)
- text.blingbring: lexical class from the Blingbring resource (on document level)
- text.swefn: frames from swedish FrameNet (on document level)

Older Swedish texts or texts written in other languages can often be annotated with a sub set of the above annotation types.

The msd annotation for non-Swedish languages is based on different tag sets, depending on which language is annotated and what annotation tool is being used. The attribute contains information about the part of speech and in many cases morphosyntactic information. The pos annotation contains only part-of-speech information and uses the universal POS tag set.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

annotations_eng.md

annotations_eng.md

Files

annotations_eng.md

Latest commit

History

annotations_eng.md

File metadata and controls