Skip to content

Latest commit

 

History

History
27 lines (27 loc) · 1.43 KB

CHANGELOG.md

File metadata and controls

27 lines (27 loc) · 1.43 KB
  • code clean-up and documentation
  • implement IDF ngram selection in subseq
  • fix min_seq_length parameter in subseq not respected
  • add IDF-weight penalty in score
  • more efficient vocab
  • simplify code - remove vocab from SuffixArray
  • FuzzyMatch-cli displays number of matches / number of input
  • add cas penalty token (default) - allowing to enable/disable case normalization
  • add nbr penalty token (default) - allowing to enable/disable number normalization
  • fix empty "perfect" and "no-perfect" subseq matches for 1-length string
  • better implementation of subseq matching
  • ignore fuzzy thresholds for subseq matching
  • introduce max subseq matching
  • remove remaining dependencies
  • simpler serialization format for fast uncompressed index reading
  • code simplification
  • do not require extra dependency on API file
  • implement support of joiner as an alternative penalty token
  • fix performance lost with adding penalty token logic
  • skip empty segments during indexing
  • add min subsequence length and ratio (--ml, --mr) options for lookup
  • generalize penalty for tags, separators and punctuations optionally selected with --penalty_tokens option
  • totally remove tags from actual index, only consider them for penalty
  • implement special fuzzy match penalties for tag differences, entity differences, and case differences
  • make fuzzy match score precision 0.01% to avoid odd fuzzy rates
  • sort equivalent fuzzy matches by id
  • simplify code