code clean-up and documentation
implement IDF ngram selection in subseq
fix min_seq_length parameter in subseq not respected
add IDF-weight penalty in score
more efficient vocab
simplify code - remove vocab from SuffixArray
FuzzyMatch-cli displays number of matches / number of input
add cas penalty token (default) - allowing to enable/disable case normalization
add nbr penalty token (default) - allowing to enable/disable number normalization
fix empty "perfect" and "no-perfect" subseq matches for 1-length string
better implementation of subseq matching
ignore fuzzy thresholds for subseq matching
introduce max subseq matching
remove remaining dependencies
simpler serialization format for fast uncompressed index reading
code simplification
do not require extra dependency on API file
implement support of joiner as an alternative penalty token
fix performance lost with adding penalty token logic
skip empty segments during indexing
add min subsequence length and ratio (--ml, --mr) options for lookup
generalize penalty for tags, separators and punctuations optionally selected with --penalty_tokens option
totally remove tags from actual index, only consider them for penalty
implement special fuzzy match penalties for tag differences, entity differences, and case differences
make fuzzy match score precision 0.01% to avoid odd fuzzy rates
sort equivalent fuzzy matches by id
simplify code

Provide feedback

Saved searches