Need content on feature importance for tree-based methods #47

davidrosenberg · 2017-12-26T02:16:14Z

Software routinely gives various measures of feature importance and even marginal dependency on features. Explanation of what such things are and pointing out how they fail seems like a potentially very useful Lab topic.

buj201 · 2018-01-03T21:25:45Z

Related subtopic: Tree-based models that require one-hot encoding of categoricals, vs. out-of-box support for categoricals.

Example dataset: Consider looking at https://archive.ics.uci.edu/ml/datasets/Adult (categoricals for occupation/education/etc). Ideally we'd find a dataset where the categoricals rank differently for the two runs described in 2.
Use catboost or lightgbm with and without one hot encoding preprocessor.
Get feature_importance_ from two versions.
Compare.

davidrosenberg · 2018-01-05T18:46:25Z

Somewhat related to feature importance is feature selection. Here's a paper on non-linear feature selection with gradient boosting. Haven't read it yet but could be interesting: http://alicezheng.org/papers/gbfs.pdf

buj201 mentioned this issue Jan 3, 2018

LIME/Black box #49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need content on feature importance for tree-based methods #47

Need content on feature importance for tree-based methods #47

davidrosenberg commented Dec 26, 2017

buj201 commented Jan 3, 2018

davidrosenberg commented Jan 5, 2018

Need content on feature importance for tree-based methods #47

Need content on feature importance for tree-based methods #47

Comments

davidrosenberg commented Dec 26, 2017

buj201 commented Jan 3, 2018

davidrosenberg commented Jan 5, 2018