-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exception trying to run Mnist example #121
Comments
Just to be sure, can you tell me what command you're running to launch the Mnist app? Also, did you download the Mnist data with |
Yes, I used get_mnist.sh/get_cifar10.sh and use the command as shown on the readme |
I'd suggest running the individual commands from a Spark shell and seeing specifically where the error occurs. Also, are there any error messages on the workers? |
The error happens after the data is loaded. The Caffe network config loads and runs a bit then it crashes with the ArrayOutOfBounds. |
Since it's on EC2, if you want to share the image with us, it'd be easy for us to look into it. It should work fine on the image that we provide (in the readme). |
did you solve the problem, I have a simliar problem , the SparkNet assembly is also using the SPARKNETCPU , and crash with the same exception ArrayOutBounds |
I have not. I was severely pressed for time and got CoffeOnSpark working and decided to go with that one. I would still like to get SparkNet working though. |
Hey, thanks for keeping us updated. I think I can reproduce the problem now, it seems to occur in local mode with more than one SparkNet worker; that is not a regime we typically use, so we haven't run into it yet. I'll keep you updated if I find out why the problem occurs. |
Hi!
I'm trying to run SparkNet on a MapR cluster running Spark 1.5.2
I can get Caffe to run locally, including python bindings, and the SparkNet assembly is using the SPARKNETCPU artefacts (with JavaCPP on the 03-16 version as indicated in another post.
the job starts up and completes Stage 3 successfully but then throws an exception:
16/04/10 10:18:52 WARN TaskSetManager: Lost task 3.0 in stage 14.0 (TID 41, 10.0.0.217): java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at libs.JavaNDArray.baseFlatInto(JavaNDArray.java:67)
at libs.JavaNDArray.recursiveFlatInto(JavaNDArray.java:79)
at libs.JavaNDArray.recursiveFlatInto(JavaNDArray.java:82)
at libs.JavaNDArray.flatCopy(JavaNDArray.java:93)
at libs.JavaNDArray.toFlat(JavaNDArray.java:111)
at libs.NDArray.toFlat(NDArray.scala:32)
at libs.TensorFlowUtils$.tensorFromNDArray(TensorFlowUtils.scala:71)
at libs.TensorFlowNet$$anonfun$setWeights$1.apply(TensorFlowNet.scala:114)
at libs.TensorFlowNet$$anonfun$setWeights$1.apply(TensorFlowNet.scala:112)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:102)
at scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:102)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:102)
at libs.TensorFlowNet.setWeights(TensorFlowNet.scala:112)
at apps.MnistApp$$anonfun$main$4.apply$mcVI$sp(MnistApp.scala:96)
at apps.MnistApp$$anonfun$main$4.apply(MnistApp.scala:96)
at apps.MnistApp$$anonfun$main$4.apply(MnistApp.scala:96)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:894)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$28.apply(RDD.scala:894)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1850)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:88)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Any help would be greatly appreciated.
Note: the Cifar example also fails with what seems to be the exact same error.
The text was updated successfully, but these errors were encountered: