Job aborted due to stage failure while reading a simple Text File from HDFS #49

radianv · 2018-02-05T16:59:45Z

I working with spark notebooks, regarding to Scalable Spark/HDFS Workbench using Docker

val textFile = sc.textFile("/user/root/vannbehandlingsanlegg.csv")

textFile: org.apache.spark.rdd.RDD[String] = /user/root/vannbehandlingsanlegg.csv MapPartitionsRDD[1] at textFile at <console>:67

It will show the execution time and the number of lines in the csv file, but I got the next error:

cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD`

I have been searching and I saw it could be about executor dependencies, any idea?

The text was updated successfully, but these errors were encountered:

radianv · 2018-02-05T17:09:21Z

As an additional information, I had done the same test connecting directly to spark-master container and it work well:

`scala> val textFile = sc.textFile("/user/root/vannbehandlingsanlegg.csv")
textFile: org.apache.spark.rdd.RDD[String] = /user/root/vannbehandlingsanlegg.csv MapPartitionsRDD[1] at textFile at :24

scala> textFile.count
res4: Long = 4385`

Probably the issue is in spark notebook configuration.

earthquakesan · 2018-03-14T13:23:45Z

Hi @radianv,

sorry for late reply, I had a lot of issues with spark notebook and has switched to Apache Zeppelin in the end. The issue you had is most likely version mismatch of Spark between spark notebook and Spark Master.

MahsaSeifikar · 2020-04-19T11:40:40Z

I have the same issue! Any solution?

SuperElectron · 2020-04-21T16:23:25Z

This is also an error inside spark-master container for
val textFile = sc.textFile("/user/root/vannbehandlingsanlegg.csv").

From the adundance of errors in the issues related to HDFS and nodes/workers it seems like something in configuration is definately missing.

It is also worth noting that the walk-through blog steps do not work: https://www.big-data-europe.eu/scalable-sparkhdfs-workbench-using-docker/

Can anyone successfuly do the following steps in this ^^^^ blog post?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job aborted due to stage failure while reading a simple Text File from HDFS #49

Job aborted due to stage failure while reading a simple Text File from HDFS #49

radianv commented Feb 5, 2018

radianv commented Feb 5, 2018

earthquakesan commented Mar 14, 2018

MahsaSeifikar commented Apr 19, 2020

SuperElectron commented Apr 21, 2020

Job aborted due to stage failure while reading a simple Text File from HDFS #49

Job aborted due to stage failure while reading a simple Text File from HDFS #49

Comments

radianv commented Feb 5, 2018

I working with spark notebooks, regarding to Scalable Spark/HDFS Workbench using Docker

It will show the execution time and the number of lines in the csv file, but I got the next error:

radianv commented Feb 5, 2018

earthquakesan commented Mar 14, 2018

MahsaSeifikar commented Apr 19, 2020

SuperElectron commented Apr 21, 2020