Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Job aborted due to stage failure while reading a simple Text File from HDFS #49

Open
radianv opened this issue Feb 5, 2018 · 4 comments

Comments

@radianv
Copy link

radianv commented Feb 5, 2018

I working with spark notebooks, regarding to Scalable Spark/HDFS Workbench using Docker

val textFile = sc.textFile("/user/root/vannbehandlingsanlegg.csv")

textFile: org.apache.spark.rdd.RDD[String] = /user/root/vannbehandlingsanlegg.csv MapPartitionsRDD[1] at textFile at <console>:67

It will show the execution time and the number of lines in the csv file, but I got the next error:

cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD`

I have been searching and I saw it could be about executor dependencies, any idea?

@radianv
Copy link
Author

radianv commented Feb 5, 2018

As an additional information, I had done the same test connecting directly to spark-master container and it work well:

`scala> val textFile = sc.textFile("/user/root/vannbehandlingsanlegg.csv")
textFile: org.apache.spark.rdd.RDD[String] = /user/root/vannbehandlingsanlegg.csv MapPartitionsRDD[1] at textFile at :24

scala> textFile.count
res4: Long = 4385`

Probably the issue is in spark notebook configuration.

@earthquakesan
Copy link
Member

Hi @radianv,

sorry for late reply, I had a lot of issues with spark notebook and has switched to Apache Zeppelin in the end. The issue you had is most likely version mismatch of Spark between spark notebook and Spark Master.

@MahsaSeifikar
Copy link

I have the same issue! Any solution?

@SuperElectron
Copy link

This is also an error inside spark-master container for
val textFile = sc.textFile("/user/root/vannbehandlingsanlegg.csv").

From the adundance of errors in the issues related to HDFS and nodes/workers it seems like something in configuration is definately missing.

It is also worth noting that the walk-through blog steps do not work: https://www.big-data-europe.eu/scalable-sparkhdfs-workbench-using-docker/

Can anyone successfuly do the following steps in this ^^^^ blog post?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants