Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

featureImportances method doesn't exist #16

Open
gnani4444 opened this issue Jul 22, 2020 · 7 comments
Open

featureImportances method doesn't exist #16

gnani4444 opened this issue Jul 22, 2020 · 7 comments
Assignees

Comments

@gnani4444
Copy link

gnani4444 commented Jul 22, 2020

Hi
I am using XGBoost Spark 3.0 GPU version

I couldn't find featureImportances method for the model object. Can you guide me how to get feature importances from the trained model.
and
Can you share any notebook or code for hyper-parameter tuning using hyperopt, If you already have it.

Thanks in advance

@wbo4958
Copy link
Collaborator

wbo4958 commented Jul 22, 2020

the pyspark supporting for XGBoost is totally different from XGBoost-built-in-python package. Actually xgboost pyspark is just a wrapper of XGBoost4j. So there is no such method.

But I suppose what you're looking for is

model.nativeBooster.getScore("xxx", "xxx")
  /**
    * Get importance of each feature based on information gain or cover
    * Supported: ["gain, "cover", "total_gain", "total_cover"]
    *
    * @return featureScoreMap  key: feature index, value: feature importance score
    */
  @throws(classOf[XGBoostError])
  def getScore(featureMap: String, importanceType: String): Map[String, Double] = {
    Map(booster.getScore(featureMap, importanceType)
        .asScala.mapValues(_.doubleValue).toSeq: _*)
  }

  /**
    * Get importance of each feature based on information gain or cover
    * , with specified feature names.
    * Supported: ["gain, "cover", "total_gain", "total_cover"]
    *
    * @return featureScoreMap  key: feature name, value: feature importance score
    */
  @throws(classOf[XGBoostError])
  def getScore(featureNames: Array[String], importanceType: String): Map[String, Double] = {
    Map(booster.getScore(featureNames, importanceType)
        .asScala.mapValues(_.doubleValue).toSeq: _*)
  }

@wbo4958
Copy link
Collaborator

wbo4958 commented Jul 22, 2020

We don't have example about hyperopt, but we have some notebooks for CrossValidator

https://github.com/NVIDIA/spark-xgboost-examples/blob/spark-3/examples/notebooks/python/cv-mortgage-gpu.ipynb

@gnani4444
Copy link
Author

the pyspark supporting for XGBoost is totally different from XGBoost-built-in-python package. Actually xgboost pyspark is just a wrapper of XGBoost4j. So there is no such method.

But I suppose what you're looking for is

model.nativeBooster.getScore("xxx", "xxx")
  /**
    * Get importance of each feature based on information gain or cover

I got an error Method doesn't exit
image

@wbo4958
Copy link
Collaborator

wbo4958 commented Jul 22, 2020

can you try

model.nativeBooster.getScore("", "gain")

@gnani4444
Copy link
Author

gnani4444 commented Jul 22, 2020

@wbo4958
I got a java object
image

I added a feature name got error

image

@sdev2030
Copy link

@gnani4444
You can assign the object to a variable and print it. Also you can convert the object to java list using toList() method on that object. Once you get the list, extract the index and score for each feature by looping thru the list and creating pandas data frame that can printed as a graph. Hope this helps.

@wbo4958
Copy link
Collaborator

wbo4958 commented Jun 9, 2021

@gnani4444, still has any issue?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants