Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] [Spark 4] Raw Cudf/JNI exception encountered during casting with ANSI enabled #11552

Open
mythrocks opened this issue Oct 1, 2024 · 0 comments
Labels
bug Something isn't working Spark 4.0+ Spark 4.0+ issues

Comments

@mythrocks
Copy link
Collaborator

Description
With ANSI enabled, when a cast operation fails, one sees raw CUDF exceptions rather than the appropriate Spark exception.

This problem is not exclusive to Spark 4; this behaviour also occurs on Spark 3.x, but only with ANSI enabled.

Repro
Consider the following String cast example:

Seq( "", "", "" ).toDF("a").write.mode("overwrite").parquet("/tmp/myth/test_input")

spark.read.parquet("/tmp/myth/test_input").selectExpr(" CAST(a AS INTEGER)").show

With ANSI enabled, empty strings should cause exceptions rather than yield NULLs.

On Apache Spark 4, the exception looks like:

org.apache.spark.SparkNumberFormatException: [CAST_INVALID_INPUT] The value '' of the type "STRING" cannot be cast to "INT" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error. SQLSTATE: 22018
== SQL (line 1, position 1) ==
cast(a as integer)
^^^^^^^^^^^^^^^^^^

  at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidInputInCastToNumberError(QueryExecutionErrors.scala:145)
  at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.withException(UTF8StringUtils.scala:51)
  at org.apache.spark.sql.catalyst.util.UTF8StringUtils$.toIntExact(UTF8StringUtils.scala:34)
  at org.apache.spark.sql.catalyst.expressions.Cast.$anonfun$castToInt$2(Cast.scala:801)
  ...

When ANSI is enabled with the spark-rapids plugin, one sees:

com.nvidia.spark.rapids.jni.CastException: Error casting data on row 0:
        at com.nvidia.spark.rapids.jni.CastStrings.toInteger(Native Method)
        at com.nvidia.spark.rapids.jni.CastStrings.toInteger(CastStrings.java:50)
        at com.nvidia.spark.rapids.jni.CastStrings.toInteger(CastStrings.java:37)
        at com.nvidia.spark.rapids.GpuCast$.doCast(GpuCast.scala:551)
        at com.nvidia.spark.rapids.GpuCast.doColumnar(GpuCast.scala:1816)
        at com.nvidia.spark.rapids.GpuUnaryExpression.doItColumnar(GpuExpressions.scala:276)

Expected behavior
One would expect that the CUDF exception would be caught and handled (or wrapped into a Spark-specific exception).

Environment details

  • ANSI enabled
  • Spark 4, 3.x

Additional context
This is an ANSI mode test. This won't be addressed as part of #11009. It's likely to need RapidsErrorUtils shim work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Spark 4.0+ Spark 4.0+ issues
Projects
None yet
Development

No branches or pull requests

1 participant