-
Notifications
You must be signed in to change notification settings - Fork 767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update code to support newer java versions #586
base: master
Are you sure you want to change the base?
Update code to support newer java versions #586
Conversation
b20b1f1
to
cd25224
Compare
Hi @carsonwang, I work for SSP Intel doing Data Analytics Reference Stack Best regards, |
@luisfponce , thank you for working on this. We are reviewing and validating this. |
common/src/main/scala/com/intel/hibench/common/streaming/metrics/KafkaCollector.scala
Outdated
Show resolved
Hide resolved
autogen/src/main/java/org/apache/hadoop/fs/dfsioe/TestDFSIO.java
Outdated
Show resolved
Hide resolved
731dcac
to
6455c21
Compare
90502d6
to
51e5c71
Compare
@gczsjdy , can you help take a look at the latest update? |
@carsonwang No problem. |
README.md
Outdated
- Hadoop: Apache Hadoop 2.x, CDH5, HDP | ||
- Spark: Spark 1.6.x, Spark 2.0.x, Spark 2.1.x, Spark 2.2.x | ||
### Supported Hadoop/Spark releases: ### | ||
- Hadoop: Apache Hadoop 2.x, 3.x, CDH5, HDP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test Hadoop 3.0, 3.1, 3.3?
Otherwise 2.x, 3.2
?
Why do you separate streaming/non-streaming frameworks? I don't see a very good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Did you test Hadoop 3.0, 3.1, 3.3?
A: No, you right, Otherwise 2.x, 3.2.
Q: Why do you separate streaming/non-streaming frameworks?
A: Because Scala < 2.12 does not compiles on java 1.11 jdk and, scala 2.12
requires to change (or bump) the package org.apache.kafka from 0.8.2.1
to at least 0.10.2.2
and then the whole code related with Kafka and streaming testing must be ported.
This last kafka version (0.10.2.2) will require to modify following classes:
- KafkaCollector.scala
- KafkaConsumer.scala
- MetricsUtil.scala
So, bottom line, as mentioned in previous comment for @carsonwang, to avoid break the streaming benchmarks in scala 2.11 and 2.10 was streaming/non-streaming frameworks split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: Otherwise 2.x, 3.2.
docs/build-hibench.md
Outdated
mvn clean package -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11 | ||
|
||
Supported frameworks only: hadoopbench, sparkbench, (Not yet tested flinkbench, stormbench, gearpumpbench) | ||
Supported modules includes: micro, ml(machine learning), websearch and graph (not tested streaming and structuredStreaming) (Does not support sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all modules can be built under JDK8? We normally use 8 in our environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Spark 2.4 wont support sql
benchmarks, Hive not used anymore.
I can be more specific on this and document that for Spark xx version SQL benchmarks not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the Not yet tested
part, leaving it on master seems... @carsonwang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got rid off it, to avoid causing noise in master.
docs/build-hibench.md
Outdated
Supported modules includes: micro, ml(machine learning), websearch and graph (not tested streaming and structuredStreaming) (Does not support sql) | ||
|
||
### Build using JDK 1.11 | ||
If you are interested in building using Java 11 indicate that streaming benchmarks won't be compiled also, specify scala, spark and hadoop version as below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also specify:
- Which scala version(besides 2.12) is compatible with JDK11?
- Which Hadoop/Spark version is compatible with JDK11?
About the streaming benchmarks support, I think it's okay to lack some streaming(Flink, Gearpump, Spark Streaming, but not Structured Streaming) support on new versions, as long as we pointed it out clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Which Hadoop/Spark version is compatible with JDK11?
A: This is not my area, but documentation could be more specific if required.
At least it can be wrote down that Scala2.12 + JDK11 + Spark2.4 (Compiled with Scala 2.12) works excluding the streaming and SQL benchmarks)_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please indicate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/build-hibench.md
Outdated
mvn clean package -Psparkbench -Phadoopbench -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dexclude-streaming | ||
|
||
Supported frameworks only: hadoopbench, sparkbench (Does not support flinkbench, stormbench, gearpumpbench) | ||
Supported modules includes: micro, ml(machine learning), websearch and graph (does not support streaming, structuredStreaming and sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What problem did SQL module meet? It's an essential part of Spark, leaving it alone makes not much sense : )
- Structured Streaming is a part of SQL, so making SQL work can also benefit SS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: What problem did SQL module meet?
A: For newer versions of Spark HiveContext is deprecated, I can point out in de documentation that if -Dspark=2.4
or further versions required then SQLBench won't work.
(Again here it is necessary an update/port of ScalaSparkSQLBench.scala code)
HiveContext is deprecated
In Spark 2, HiveContext is deprecated. Replace all usage with an instantiation of the singletonSparkSession:
val spark: SparkSession = SparkSession.builder
.config(conf)
.enableHiveSupport()
.getOrCreate()
Most functionality of HiveContext is now available directly on the SparkSession instance. Note that, if you need them, SparkContext and SQLContext are now properties of SparkSession:
val sc = spark.sparkContext
val sqlContext = spark.sqlContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I got it.
I think supporting Spark 2.4 without SQL module is quite weird. I can think of 2 ways:
- Drop Spark 1.6 support and modify the
ScalaSparkSQLBench.scala
to useSparkSession
, which is introduced in Spark 2.0 - Create another seperate
ScalaSparkSQLBench
, deciding which class to use by Spark version
I like the first one better, newer HiBench version should drop some old codebase. cc @carsonwang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @luisfponce , I left some comments.
.travis.yml
Outdated
export HDFS_DATANODE_USER=root | ||
export HDFS_SECONDARYNAMENODE_USER=root | ||
export YARN_RESOURCEMANAGER_USER=root | ||
export YARN_NODEMANAGER_USER=root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has it to be root? What if we don't set these environment variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Has it to be root?
A: Not really, depending on the user.
Q: What if we don't set these environment variables?
A: If those variables were not set (only starting Hadoop 3.2 services) I got:
- start-dfs.sh:
Starting namenodes on [localhost]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
ERROR: Attempting to operate on hdfs datanode as root
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
Starting secondary namenodes [ubuntu-hib]
ERROR: Attempting to operate on hdfs secondarynamenode as root
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
- start-yarn.sh
Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
However, I moved:
HDFS_NAMENODE_USER=$USER
HDFS_DATANODE_USER=$USER
HDFS_SECONDARYNAMENODE_USER=$USER
YARN_RESOURCEMANAGER_USER=$USER
YARN_NODEMANAGER_USER=$USER
to hadoop-env.sh
, and now it is user agnostic, and travis.yml
looks cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
.travis.yml
Outdated
export HADOOP_HDFS_HOME=$HADOOP_HOME | ||
export YARN_HOME=$HADOOP_HOME | ||
export HADOOP_INSTALL=$HADOOP_HOME | ||
export SPARK_DIST_CLASSPATH=$(/opt/$HADOOP_BINARIES_FOLDER/bin/hadoop classpath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to remove unnecessary envs(Line 46-54), I suppose even if they are not set, Spark/Hadoop will probe the right HOME, and that's verified in the original travis(for Spark 1.6, though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I will get rid off them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.travis.yml
Outdated
|
||
sudo -E ./travis/configssh.sh | ||
sudo -E ./travis/restart_hadoop_spark.sh | ||
sudo -E ./bin/run_all.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:nit new line
And other files. : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- cp ./travis/spark.conf ./conf/ | ||
- /opt/hadoop-2.6.5/bin/yarn node -list 2 | ||
- sudo -E ./bin/run_all.sh | ||
- | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:nit remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this pipe (literal style) because since my perspective looks cleaner when putting script in yaml files, avoiding writting \
every line.
Other way it would look like this example:
script:
- if [[ "$java_ver" == 11 ]]; then \
mvn clean package -q -Psparkbench -Phadoopbench -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dmaven-compiler-plugin.version=3.8.0 -Dexclude-streaming \
elif [[ "$java_ver" == 8 ]]; then \
mvn clean package -q -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11 \
elif [[ "$java_ver" == 7 ]]; then \
mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.2 -Dscala=2.11 \
mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.0 -Dscala=2.11 \
mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=1.6 -Dscala=2.10 \
else \
exit 1 \
fi
- sudo -E ./travis/configssh.sh
- sudo -E ./travis/restart_hadoop_spark.sh
- sudo -E ./bin/run_all.sh
instead of currently it is:
script:
- |
if [[ "$java_ver" == 11 ]]; then
mvn clean package -q -Psparkbench -Phadoopbench -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dmaven-compiler-plugin.version=3.8.0 -Dexclude-streaming
elif [[ "$java_ver" == 8 ]]; then
mvn clean package -q -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11
elif [[ "$java_ver" == 7 ]]; then
mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.2 -Dscala=2.11
mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.0 -Dscala=2.11
mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=1.6 -Dscala=2.10
else
exit 1
fi
sudo -E ./travis/configssh.sh
sudo -E ./travis/restart_hadoop_spark.sh
sudo -E ./bin/run_all.sh
Up to you, for me both ways still working, (and Mr. Yaml lint indicates both ways works too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but seems
if [[ "$java_ver" == 11 ]]; then
mvn clean package -q -Psparkbench -Phadoopbench -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dmaven-compiler-plugin.version=3.8.0 -Dexclude-streaming
elif [[ "$java_ver" == 8 ]]; then
mvn clean package -q -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11
elif [[ "$java_ver" == 7 ]]; then
mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.2 -Dscala=2.11
mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=2.0 -Dscala=2.11
mvn clean package -q -Dmaven.javadoc.skip=true -Dspark=1.6 -Dscala=2.10
else
exit 1
fi
sudo -E ./travis/configssh.sh
sudo -E ./travis/restart_hadoop_spark.sh
sudo -E ./bin/run_all.sh
without any pipes is also valid, the \n
s will be automatically escaped in travis?
travis/config_hadoop_spark.sh
Outdated
cp ./travis/artifacts/hadoop32/mapred-site.xml $HADOOP_CONF_DIR | ||
cp ./travis/artifacts/hadoop32/yarn-site.xml $HADOOP_CONF_DIR | ||
sed -i "s|<maven.compiler.source>1.6</maven.compiler.source>|<maven.compiler.source>1.8</maven.compiler.source>|g" pom.xml | ||
sed -i "s|<maven.compiler.target>1.6</maven.compiler.target>|<maven.compiler.target>1.8</maven.compiler.target>|g" pom.xml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not 1.11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, will changed for Java11 + maven compiler version 3.8
Source: Choose Java Version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
docs/build-hibench.md
Outdated
mvn clean package -Psparkbench -Phadoopbench -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dexclude-streaming | ||
|
||
Supported frameworks only: hadoopbench, sparkbench (Does not support flinkbench, stormbench, gearpumpbench) | ||
Supported modules includes: micro, ml(machine learning), websearch and graph (does not support streaming, structuredStreaming and sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I got it.
I think supporting Spark 2.4 without SQL module is quite weird. I can think of 2 ways:
- Drop Spark 1.6 support and modify the
ScalaSparkSQLBench.scala
to useSparkSession
, which is introduced in Spark 2.0 - Create another seperate
ScalaSparkSQLBench
, deciding which class to use by Spark version
I like the first one better, newer HiBench version should drop some old codebase. cc @carsonwang
docs/build-hibench.md
Outdated
### Build using JDK 1.8 | ||
If you are interested in building using Java 11 specify scala, spark and hadoop version as below | ||
|
||
mvn clean package -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be misleading that Java 8 can only be used with the specified Scala/Hadoop/Spark version. I think we can drop this section and only leave JDK 11
section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
docs/build-hibench.md
Outdated
mvn clean package -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11 | ||
|
||
Supported frameworks only: hadoopbench, sparkbench, (Not yet tested flinkbench, stormbench, gearpumpbench) | ||
Supported modules includes: micro, ml(machine learning), websearch and graph (not tested streaming and structuredStreaming) (Does not support sql) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the Not yet tested
part, leaving it on master seems... @carsonwang
docs/build-hibench.md
Outdated
Supported modules includes: micro, ml(machine learning), websearch and graph (not tested streaming and structuredStreaming) (Does not support sql) | ||
|
||
### Build using JDK 1.11 | ||
If you are interested in building using Java 11 indicate that streaming benchmarks won't be compiled also, specify scala, spark and hadoop version as below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please indicate that.
51e5c71
to
bc996f9
Compare
* sparkbench/assembly/pom.xml: * Changed property name activation on `allModules` profile. * Added new profile that excludes `sparkbench-streaming` artifact. * sparkbench/pom.xml: * Changed property name activation on `allModules` profile. * Added new profile that excludes `streaming` module. * Added profile spark2.4 due spark-core_2.12 supports > 2.4.0 version. * Added profile scala 2.12. Scala < 2.12 does not compiles on java 1.11 jdk. * Added profile hadoop3.2 to propagate this variable to all spark benchmark. * sparkbecnh/streaming/pom.xml: * Added profile spark2.4 on sparkbench-streaming POM with spark-streaming-kafka-0-8_2.11 version 2.4.0. Signed-off-by: Luis Ponce <[email protected]>
bc996f9
to
a300f3b
Compare
Hi @carsonwang, @gczsjdy Important questions here: Apache Hadoop 3.x support only Java 8 according to official website Hadoop Java Versions, and Java 11 support is WIP. So, if HiBench is built using source and target JDK11 and then is run in Travis CI, then we get:
Nevertheless Clearlinux (and possibly other clients that has patched Hadoop too ) has compiled Hadoop 3.2 and Spark 2.4 using Java 11 patches as Data Analytics Reference Stacks documentation did. Is there a way to compile HiBench using Java11 but skip the testing part? We'd like to get HiBench JDK11 from upstream, and that's why this PR contribution.
Hi Bench log on following link: |
a97527b
to
e631f22
Compare
* Moved mapred-site and yarn-site xml files to a created a folder that contains the artifacts for either hadoop 2.6 or 3.2, those will be pickd up depending on the testing needs in travis.yml * Moved spark-env file to a created a folder that contains the artifacts for either spark1.6 or 2.4, those will be pickd up depending on the testing needs in travis.yml * Created hadoop-env.sh file for Hadoop 3.2 to store required environment variables to start hdfs and yarn services. * Removed harcoded values from haddop.conf and spark.conf, this will be filled up depending on the testing needs. * Added an `install_hadoop_spark` script that will download hadoop and spark binaries depending on the testing needs. * Added a `config_hadoop_spark` script that will setup hadoop, spark and hibench depending on the testing needs. * Added a `jdk_ver` script to pick up the current java version installed for travis CI. * `restart_hadoop_spark` script modified to be agnostic to the required binaries for testing. * travis/config_hadoop_spark.sh: * for Java 8 and 11 skiping `sql` test since HIVE is no longer used to perform queries. Newer Spark version perform queries using `SparkSession` no longer used `import org.apache.spark.sql` * .travis.yml: * Added `dist: trusty` to keep using this distro, Travis picks up xenial if not especified.. If Any greather Ubuntu version required in Travis won't support openjdk 7. * Refactored the CI flow to behave, download, setup, run and test hadoop and spark depending on the jdk required either versions 7, 8 and 11. * Hibench will be configured depending on the jdk required either versions 7, 8 and 11. * Hibench will be built depending on the jdk required either versions 7, 8 and 11. * benchmarks will be run for all jdk versions set. Signed-off-by: Luis Ponce <[email protected]>
* autogen/pom.xml * Add hadoop mr2 profile to be used for hadoop hdfs and client. Signed-off-by: Luis Ponce <[email protected]>
* docs/build-hibench.md: * Update 2.4 version to specify Spark Version. * Add Specify Hadoop version documentation. * Add Build using JDK 11 documentation. * README.md: * Update Supported Hadoop/Spark releases to hadoop 3.2 and spark 2.4 Signed-off-by: Luis Ponce <[email protected]>
e631f22
to
0e48596
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luisfponce, I noticed one issue in the pom. Others look good to me.
@@ -37,6 +37,11 @@ default . For example , if we want use spark2.0 and scala2.11 to build hibench. | |||
package` , but for spark2.0 and scala2.10 , we need use the command `mvn -Dspark=2.0 -Dscala=2.10 clean package` . | |||
Similarly , the spark1.6 is associated with the scala2.10 by default. | |||
|
|||
### Specify Hadoop Version ### | |||
To specify the spark version, use -Dhadoop=xxx(3.2). By default, it builds for hadoop 2.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spark version -> hadoop version
@@ -159,7 +159,43 @@ | |||
</dependencies> | |||
<activation> | |||
<property> | |||
<name>!modules</name> | |||
<name>!exclude-streaming</name> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user specifies modules=xxx and doesn't specify exclude-streaming, this allModules will be activated, which is not expected.
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value> | ||
</property> | ||
|
||
</configuration> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:nit empty line
<property> | ||
<name>dfs.client.use.datanode.hostname</name> | ||
<value>true</value> | ||
</property> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
@@ -28,7 +28,7 @@ Because some Maven plugins cannot support Scala version perfectly, there are som | |||
|
|||
|
|||
### Specify Spark Version ### | |||
To specify the spark version, use -Dspark=xxx(1.6, 2.0, 2.1 or 2.2). By default, it builds for spark 2.0 | |||
To specify the spark version, use -Dspark=xxx(1.6, 2.0, 2.1, 2.2 or 2.4). By default, it builds for spark 2.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this Spark 2.4 support doesn't include SQL
module?
This is the only main remaining concern for this patch, see
#586 (comment)
I think we can drop Spark 1.6 support and modify the SQL module code to support 2.4 in HiBench 8.0, whoever needs 1.6 can go to HiBench 7.0. @carsonwang
if [[ "$java_ver" == 11 ]]; then | ||
mvn clean package -Psparkbench -Phadoopbench -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dmaven-compiler-plugin.version=3.8.0 -Dexclude-streaming | ||
elif [[ "$java_ver" == 8 ]]; then | ||
mvn clean package -q -Dmaven.javadoc.skip=true -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about this, even if we don't run SQL module tests for Spark2.4, how did the compiling work...
Is there any progress on this ticket, when will this ticket be avaliable? |
Will retake it, resolve conflicts and get back to you @william-wang @gczsjdy |
|
Compile HiBench using JDK 1.11 for hadoop 3.2.0 and spark 2.4.0
supporting the following benchmarks:
Environment variables:
JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk/
Compile command:
mvn clean package -Psparkbench -Phadoopbench -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.12 -Dexclude-streaming
Log:
Compile HiBench using JDK 1.8 for hadoop 3.2.0 and spark 2.4.0
supporting the following benchmarks:
Environment variables:
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk/
Compile command:
mvn clean package -Dhadoop=3.2 -Dspark=2.4 -Dscala=2.11
Log: