You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running pyspark tests on cluster with 12 node, 20 cores on each nodes and 60gb memory per node. I am getting output of first few tests(sort, agg, count etc), and when it reaches to broadcast, job terminates. I assume it is because of lack of memory from .err file in result folder as ensureFreeSpace(4194304) called with curMem=610484012, maxMem=611642769. How can I increase maxMem value? This is my config/config.py file content.
COMMON_JAVA_OPTS = [
# Fraction of JVM memory used for caching RDDs.
JavaOptionSet("spark.storage.memoryFraction", [0.66]),
JavaOptionSet("spark.serializer", ["org.apache.spark.serializer.JavaSerializer"]),
JavaOptionSet("spark.executor.memory", ["9g"]),
and
Is the spark-submit command taking memory as set in config.py here? maxMem is only 611mb which looks like 0.66*1gb of default memory setting of Spark. Changing spark.executor.memory or SPARK_DRIVER_MEMORY value in config/config.py has no effect on maxMem, but changing spark.storage.memoryFraction from 0.66 to 0.88 increases the MaxMem. How can I control maxMem value to get large memories that are already available in the cluster?
The text was updated successfully, but these errors were encountered:
I am running pyspark tests on cluster with 12 node, 20 cores on each nodes and 60gb memory per node. I am getting output of first few tests(sort, agg, count etc), and when it reaches to broadcast, job terminates. I assume it is because of lack of memory from .err file in result folder as ensureFreeSpace(4194304) called with curMem=610484012, maxMem=611642769. How can I increase maxMem value? This is my config/config.py file content.
COMMON_JAVA_OPTS = [
# Fraction of JVM memory used for caching RDDs.
JavaOptionSet("spark.storage.memoryFraction", [0.66]),
JavaOptionSet("spark.serializer", ["org.apache.spark.serializer.JavaSerializer"]),
JavaOptionSet("spark.executor.memory", ["9g"]),
and
Set driver memory here
SPARK_DRIVER_MEMORY = "20g"
It shows the running command as follows.
Setting env var SPARK_SUBMIT_OPTS: -Dspark.storage.memoryFraction=0.66 -Dspark.serializer=org.apache.spark.serializer.JavaSerializer -Dspark.executor.memory=9g -Dspark.locality.wait=60000000 -Dsparkperf.commitSHA=unknown
Running command: /nfs/15/soottikkal/local/spark-1.5.2-bin-hadoop2.6//bin/spark-submit --master spark://r0111.ten.osc.edu:7077 pyspark-tests/core_tests.py BroadcastWithBytes --num-trials=10 --inter-trial-wait=3 --num-partitions=400 --reduce-tasks=400 --random-seed=5 --persistent-type=memory --num-records=200000000 --unique-keys=20000 --key-length=10 --unique-values=1000000 --value-length=10 --broadcast-size=209715200 1>> results/python_perf_output__2016-01-28_23-35-54_logs/python-broadcast-w-bytes.out 2>> results/python_perf_output__2016-01-28_23-35-54_logs/python-broadcast-w-bytes.err
Is the spark-submit command taking memory as set in config.py here? maxMem is only 611mb which looks like 0.66*1gb of default memory setting of Spark. Changing spark.executor.memory or SPARK_DRIVER_MEMORY value in config/config.py has no effect on maxMem, but changing spark.storage.memoryFraction from 0.66 to 0.88 increases the MaxMem. How can I control maxMem value to get large memories that are already available in the cluster?
The text was updated successfully, but these errors were encountered: