You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Software routinely gives various measures of feature importance and even marginal dependency on features. Explanation of what such things are and pointing out how they fail seems like a potentially very useful Lab topic.
The text was updated successfully, but these errors were encountered:
Related subtopic: Tree-based models that require one-hot encoding of categoricals, vs. out-of-box support for categoricals.
Example dataset: Consider looking at https://archive.ics.uci.edu/ml/datasets/Adult (categoricals for occupation/education/etc). Ideally we'd find a dataset where the categoricals rank differently for the two runs described in 2.
Use catboost or lightgbm with and without one hot encoding preprocessor.
Somewhat related to feature importance is feature selection. Here's a paper on non-linear feature selection with gradient boosting. Haven't read it yet but could be interesting: http://alicezheng.org/papers/gbfs.pdf
Software routinely gives various measures of feature importance and even marginal dependency on features. Explanation of what such things are and pointing out how they fail seems like a potentially very useful Lab topic.
The text was updated successfully, but these errors were encountered: