Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNABLE_TO_INFER_SCHEMA on pyspark #677

Open
cometta opened this issue Jan 18, 2024 · 0 comments
Open

UNABLE_TO_INFER_SCHEMA on pyspark #677

cometta opened this issue Jan 18, 2024 · 0 comments

Comments

@cometta
Copy link

cometta commented Jan 18, 2024

i get below error when convert from sparkml model to ONNX

An error was encountered:
AnalysisException
[Traceback (most recent call last):
,   File "/tmp/spark-591fcd26-f35c-4194-9d93-9e4fa0b7a634/shell_wrapper.py", line 113, in exec
    self._exec_then_eval(code)
,   File "/tmp/spark-591fcd26-f35c-4194-9d93-9e4fa0b7a634/shell_wrapper.py", line 106, in _exec_then_eval
    exec(compile(last, '<string>', 'single'), self.globals)
,   File "<string>", line 1, in <module>
,   File "/home/user/work/.python_libs/lib/python3.10/site-packages/onnxmltools/convert/main.py", line 302, in convert_sparkml
    return convert(
,   File "/home/user/work/.python_libs/lib/python3.10/site-packages/onnxmltools/convert/sparkml/convert.py", line 101, in convert
    onnx_model = convert_topology(
,   File "/home/user/work/.python_libs/lib/python3.10/site-packages/onnxconverter_common/topology.py", line 776, in convert_topology
    get_converter(operator.type)(scope, operator, container)
,   File "/home/user/work/.python_libs/lib/python3.10/site-packages/onnxmltools/convert/sparkml/operator_converters/random_forest_regressor.py", line 31, in convert_random_forest_regressor
    tree_df = save_read_sparkml_model_data(
,   File "/home/user/work/.python_libs/lib/python3.10/site-packages/onnxmltools/convert/sparkml/operator_converters/tree_ensemble_common.py", line 113, in save_read_sparkml_model_data
    df = spark.read.parquet(os.path.join(path, "data"))
,   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 531, in parquet
    return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))
,   File "/opt/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
,   File "/opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py", line 175, in deco
    raise converted from None
, pyspark.errors.exceptions.captured.AnalysisException: [UNABLE_TO_INFER_SCHEMA] Unable to infer schema for Parquet. It must be specified manually.
]

print(initial_types)

[('some_column_name', StringTensorType(shape=[None, 1])), ('var1', StringTensorType(shape=[None, 1])), ('var2', FloatTensorType(shape=[None, 1])), ('var3', FloatTensorType(shape=[None, 1])), ('var4', FloatTensorType(shape=[None, 1])), ('var5', FloatTensorType(shape=[None, 1])), ('var6', FloatTensorType(shape=[None, 1])), ('var7', FloatTensorType(shape=[None, 1])), ('var9', FloatTensorType(shape=[None, 1]))]

model i use is from pipeline randomforest

onnx_model = convert_sparkml(model, 'pyspark test', initial_types, spark_session = spark)

@cometta cometta changed the title UNABLE_TO_INFER_SCHEMA UNABLE_TO_INFER_SCHEMA on pyspark Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant