[Bug] Wrong output shapes for Shapley values #342

RAMitchell · 2023-03-10T09:27:27Z

In the case of a binary classification model from sklearn we expect the output for both positive and negative classes (this would be consistent with the normal prediction output). As the model is transferred in the treelite format it has num_classes set to 1 in the xgboost style and so the Shapley values are written as a single column. There is no way to detect if the model is a regression or binary classification model from the information given in treelite, so we cannot just mirror the output to correct the result on the triton-fil side without also causing this to happen for every regression model.

RAMitchell · 2023-03-15T11:30:57Z

At this stage I think the path of least resistance is to output the shapley values only for the positive class. This is not ideal because generally we want shapley values to add up to the normal prediction output e.g. shapely_values.sum(axis=-1) == prediction_output.

hcho3 · 2023-03-15T14:19:55Z

There is no way to detect if the model is a regression or binary classification model from the information given in treelite

Would it be useful if Treelite stored a flag to indicate whether the model is a regression model? I can get it in for Treelite 4.0

RAMitchell · 2023-03-15T14:31:35Z

Yes this is a good idea. This is currently only a problem for the random forest models. In the case of xgboost we can tell from the output transformation that it is classification, but the random forest classification uses the identity transform so we can't actually tell the difference.

I think this will become a problem in future as well for multi-output regression models. The current implementation assumes that all multi-output models are classification - it may be helpful for downstream applications to differentiate this.

hcho3 mentioned this issue Mar 28, 2023

Update task_type enum to distinguish between regressors and binary classifiers dmlc/treelite#463

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Wrong output shapes for Shapley values #342

[Bug] Wrong output shapes for Shapley values #342

RAMitchell commented Mar 10, 2023

RAMitchell commented Mar 15, 2023

hcho3 commented Mar 15, 2023

RAMitchell commented Mar 15, 2023

[Bug] Wrong output shapes for Shapley values #342

[Bug] Wrong output shapes for Shapley values #342

Comments

RAMitchell commented Mar 10, 2023

RAMitchell commented Mar 15, 2023

hcho3 commented Mar 15, 2023

RAMitchell commented Mar 15, 2023