You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My training set has 100,000 doc samples and 1,000 tags, but I found that tags satisfy the long tail distribution. Some tags only appear in less than 10 docs, while others are basically included in every doc. So how should I deal with these situations?
The text was updated successfully, but these errors were encountered:
Magpie will likely learn to almost never recommend the classes from the long tail and will frequently default to the most common class. If that's not a behaviour you desire, then you might want to repartition your dataset to have more balanced class distribution.
My training set has 100,000 doc samples and 1,000 tags, but I found that tags satisfy the long tail distribution. Some tags only appear in less than 10 docs, while others are basically included in every doc. So how should I deal with these situations?
The text was updated successfully, but these errors were encountered: