Skip to content
This repository has been archived by the owner on Nov 23, 2017. It is now read-only.

Allow to specify hadoop minor version (2.4 and 2.6 at the moment) #56

Open
wants to merge 31 commits into
base: branch-2.0
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
fde24d2
adding hadoop 2.6 support
jamborta Sep 30, 2016
87153f2
a few lines of description
jamborta Sep 30, 2016
a242d0b
adding parameter to template file
jamborta Sep 30, 2016
064abd9
validate minor version
jamborta Sep 30, 2016
c282e06
validate minor version
jamborta Sep 30, 2016
a68ca9b
validate minor version
jamborta Sep 30, 2016
14f0d75
validate minor version
jamborta Sep 30, 2016
a308f74
bugfix
jamborta Oct 1, 2016
97cbb6b
correct path for ephemeral hdfs
jamborta Oct 1, 2016
e46020c
scala 2.11 for spark 2
jamborta Oct 1, 2016
c34d93e
document s3 changes
jamborta Oct 1, 2016
d8d4803
document s3 changes
jamborta Oct 1, 2016
c13a437
document s3 changes
jamborta Oct 1, 2016
9e6920c
Update README.md
jamborta Oct 1, 2016
833f2de
Update README.md
jamborta Oct 1, 2016
750ede8
Update README.md
jamborta Oct 1, 2016
1c34483
adding hadoop 2.7
jamborta Oct 8, 2016
46b6394
adding static variable VALID_HADOOP_MINOR_VERSIONS
jamborta Oct 12, 2016
ad525d7
disable tachyon for spark 2 and yarn
jamborta Oct 12, 2016
03e70b7
typo in version
jamborta Oct 13, 2016
653f338
separate case for each range of spark versions
jamborta Oct 19, 2016
21e03d0
update hadoop dependency download path
jamborta Oct 24, 2016
4a4f4a5
exhaustive checking of hadoop versions
jamborta Oct 24, 2016
d7e73bf
typo in options
jamborta Oct 24, 2016
332b90b
return -1 for unknown hadoop version
jamborta Oct 24, 2016
71c7047
return 1 for unknown hadoop version
jamborta Oct 24, 2016
a924690
safeguard for hadoop minor version 2.7
jamborta Oct 24, 2016
e3ee4e2
safeguard for hadoop minor version 2.6
jamborta Oct 24, 2016
db15dcc
resolve conflicts
jamborta Oct 24, 2016
246b888
update based on comments
jamborta Oct 24, 2016
9b0f1d1
use latest hadoop maintenance version
jamborta Oct 31, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,15 @@ EC2. These scripts are intended to be used by the default Spark AMI and is *not*
expected to work on other AMIs. If you wish to start a cluster using Spark,
please refer to http://spark-project.org/docs/latest/ec2-scripts.html

## Using S3 with Hadoop 2.6 or newer

Starting Hadoop 2.6.0, s3 FS connector has been moved to a separate library called hadoop-aws.

- In order to make the package available add it as a dependency, `libraryDependencies += "org.apache.hadoop" % "hadoop-aws" % "2.6.4"`.
- It can also be added it directly to spark-submit, `spark-submit --packages org.apache.hadoop:hadoop-aws:2.6.4 SimpleApp.py`.

On other related note, it is recommended to use `s3a` and not `s3n` filesystem starting Hadoop 2.6.0.

## spark-ec2 Internals

The Spark cluster setup is guided by the values set in `ec2-variables.sh`.`setup.sh`
Expand Down Expand Up @@ -237,3 +246,4 @@ after the templates have been configured. You can use the environment variables
get a list of slave hostnames and `/root/spark-ec2/copy-dir` to sync a directory across machines.

5. Modify `spark_ec2.py` to add your module to the list of enabled modules.

1 change: 1 addition & 0 deletions deploy.generic/root/spark-ec2/ec2-variables.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ export MODULES="{{modules}}"
export SPARK_VERSION="{{spark_version}}"
export TACHYON_VERSION="{{tachyon_version}}"
export HADOOP_MAJOR_VERSION="{{hadoop_major_version}}"
export HADOOP_MINOR_VERSION="{{hadoop_minor_version}}"
export SWAP_MB="{{swap}}"
export SPARK_WORKER_INSTANCES="{{spark_worker_instances}}"
export SPARK_MASTER_OPTS="{{spark_master_opts}}"
Expand Down
1 change: 1 addition & 0 deletions deploy_templates.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@
"spark_version": os.getenv("SPARK_VERSION"),
"tachyon_version": os.getenv("TACHYON_VERSION"),
"hadoop_major_version": os.getenv("HADOOP_MAJOR_VERSION"),
"hadoop_minor_version": os.getenv("HADOOP_MINOR_VERSION"),
"java_home": os.getenv("JAVA_HOME"),
"default_tachyon_mem": "%dMB" % tachyon_mb,
"system_ram_mb": "%d" % system_ram_mb,
Expand Down
26 changes: 21 additions & 5 deletions ephemeral-hdfs/init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,27 @@ case "$HADOOP_MAJOR_VERSION" in
cp /root/hadoop-native/* /root/ephemeral-hdfs/lib/native/
;;
yarn)
wget http://s3.amazonaws.com/spark-related-packages/hadoop-2.4.0.tar.gz
echo "Unpacking Hadoop"
tar xvzf hadoop-*.tar.gz > /tmp/spark-ec2_hadoop.log
rm hadoop-*.tar.gz
mv hadoop-2.4.0/ ephemeral-hdfs/
if [[ "$HADOOP_MINOR_VERSION" == "2.4" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/hadoop-2.4.0.tar.gz
echo "Unpacking Hadoop"
tar xvzf hadoop-*.tar.gz > /tmp/spark-ec2_hadoop.log
rm hadoop-*.tar.gz
mv hadoop-2.4.0/ ephemeral-hdfs/
elif [[ "$HADOOP_MINOR_VERSION" == "2.6" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/hadoop-2.6.0.tar.gz
echo "Unpacking Hadoop"
tar xvzf hadoop-*.tar.gz > /tmp/spark-ec2_hadoop.log
rm hadoop-*.tar.gz
mv hadoop-2.6.0/ ephemeral-hdfs/
elif [[ "$HADOOP_MINOR_VERSION" == "2.7" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/hadoop-2.7.0.tar.gz
echo "Unpacking Hadoop"
tar xvzf hadoop-*.tar.gz > /tmp/spark-ec2_hadoop.log
rm hadoop-*.tar.gz
mv hadoop-2.7.0/ ephemeral-hdfs/
else
echo "ERROR: Unknown Hadoop version"
fi

# Have single conf dir
rm -rf /root/ephemeral-hdfs/etc/hadoop/
Expand Down
26 changes: 21 additions & 5 deletions persistent-hdfs/init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,27 @@ case "$HADOOP_MAJOR_VERSION" in
cp /root/hadoop-native/* /root/persistent-hdfs/lib/native/
;;
yarn)
wget http://s3.amazonaws.com/spark-related-packages/hadoop-2.4.0.tar.gz
echo "Unpacking Hadoop"
tar xvzf hadoop-*.tar.gz > /tmp/spark-ec2_hadoop.log
rm hadoop-*.tar.gz
mv hadoop-2.4.0/ persistent-hdfs/
if [[ "$HADOOP_MINOR_VERSION" == "2.4" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/hadoop-2.4.0.tar.gz
echo "Unpacking Hadoop"
tar xvzf hadoop-*.tar.gz > /tmp/spark-ec2_hadoop.log
rm hadoop-*.tar.gz
mv hadoop-2.4.0/ persistent-hdfs/
elif [[ "$HADOOP_MINOR_VERSION" == "2.6" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/hadoop-2.6.0.tar.gz
echo "Unpacking Hadoop"
tar xvzf hadoop-*.tar.gz > /tmp/spark-ec2_hadoop.log
rm hadoop-*.tar.gz
mv hadoop-2.6.0/ persistent-hdfs/
elif [[ "$HADOOP_MINOR_VERSION" == "2.7" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/hadoop-2.7.0.tar.gz
echo "Unpacking Hadoop"
tar xvzf hadoop-*.tar.gz > /tmp/spark-ec2_hadoop.log
rm hadoop-*.tar.gz
mv hadoop-2.7.0/ persistent-hdfs/
else
echo "ERROR: Unknown Hadoop version"
fi

# Have single conf dir
rm -rf /root/persistent-hdfs/etc/hadoop/
Expand Down
7 changes: 6 additions & 1 deletion scala/init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,15 @@ SCALA_VERSION="2.10.3"

if [[ "0.7.3 0.8.0 0.8.1" =~ $SPARK_VERSION ]]; then
SCALA_VERSION="2.9.3"
wget http://s3.amazonaws.com/spark-related-packages/scala-$SCALA_VERSION.tgz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need a scala installation on the cluster anymore as Spark should just work with a JRE. But it seems fine to have this if people find it useful

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never tried spark without scala. even spark-shell does not need scala?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - recent Spark distribution includes the scala libraries that provide the shell and other support. But since this is a useful thing irrespective lets keep this.

elif [[ "2.0.0" =~ $SPARK_VERSION ]]; then
SCALA_VERSION="2.11.8"
wget http://downloads.lightbend.com/scala/2.11.8/scala-$SCALA_VERSION.tgz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also uploaded this to the s3 bucket. Lets switch to that to avoid depending on the lightbend source ?

else
wget http://s3.amazonaws.com/spark-related-packages/scala-$SCALA_VERSION.tgz
fi

echo "Unpacking Scala"
wget http://s3.amazonaws.com/spark-related-packages/scala-$SCALA_VERSION.tgz
tar xvzf scala-*.tgz > /tmp/spark-ec2_scala.log
rm scala-*.tgz
mv `ls -d scala-* | grep -v ec2` scala
Expand Down
147 changes: 54 additions & 93 deletions spark/init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,119 +24,80 @@ then

# Pre-packaged spark version:
else
case "$SPARK_VERSION" in
0.7.3)
case "$SPARK_VERSION" in
# 0.7.3 - 1.0.2
0\.[7-9]\.[0-3]|1\.0\.[0-2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this will work as the 0.8.0 and 0.9.0 have incubating in their package names. My take would be be to keep the existing long form for these early versions

if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-0.7.3-prebuilt-hadoop1.tgz
else
wget http://s3.amazonaws.com/spark-related-packages/spark-0.7.3-prebuilt-cdh4.tgz
fi
;;
0.8.0)
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-0.8.0-incubating-bin-hadoop1.tgz
else
wget http://s3.amazonaws.com/spark-related-packages/spark-0.8.0-incubating-bin-cdh4.tgz
fi
;;
0.8.1)
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-0.8.1-incubating-bin-hadoop1.tgz
else
wget http://s3.amazonaws.com/spark-related-packages/spark-0.8.1-incubating-bin-cdh4.tgz
fi
;;
0.9.0)
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-0.9.0-incubating-bin-hadoop1.tgz
else
wget http://s3.amazonaws.com/spark-related-packages/spark-0.9.0-incubating-bin-cdh4.tgz
fi
;;
0.9.1)
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-0.9.1-bin-hadoop1.tgz
else
wget http://s3.amazonaws.com/spark-related-packages/spark-0.9.1-bin-cdh4.tgz
fi
;;
0.9.2)
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-0.9.2-bin-hadoop1.tgz
else
wget http://s3.amazonaws.com/spark-related-packages/spark-0.9.2-bin-cdh4.tgz
fi
;;
1.0.0)
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-1.0.0-bin-hadoop1.tgz
else
wget http://s3.amazonaws.com/spark-related-packages/spark-1.0.0-bin-cdh4.tgz
fi
;;
1.0.1)
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-1.0.1-bin-hadoop1.tgz
else
wget http://s3.amazonaws.com/spark-related-packages/spark-1.0.1-bin-cdh4.tgz
fi
;;
1.0.2)
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-1.0.2-bin-hadoop1.tgz
else
wget http://s3.amazonaws.com/spark-related-packages/spark-1.0.2-bin-cdh4.tgz
fi
;;
1.1.0)
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-1.1.0-bin-hadoop1.tgz
wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-prebuilt-hadoop1.tgz
elif [[ "$HADOOP_MAJOR_VERSION" == "2" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-1.1.0-bin-cdh4.tgz
wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-prebuilt-cdh4.tgz
else
wget http://s3.amazonaws.com/spark-related-packages/spark-1.1.0-bin-hadoop2.4.tgz
echo "ERROR: Unsupported Hadoop major version"
return 1
fi
;;
1.1.1)
;;
# 1.1.0 - 1.3.0
1\.[1-2]\.[0-9]*|1\.3\.0)
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-1.1.1-bin-hadoop1.tgz
wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-hadoop1.tgz
elif [[ "$HADOOP_MAJOR_VERSION" == "2" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-1.1.1-bin-cdh4.tgz
wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-cdh4.tgz
elif [[ "$HADOOP_MAJOR_VERSION" == "yarn" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-hadoop2.4.tgz
else
wget http://s3.amazonaws.com/spark-related-packages/spark-1.1.1-bin-hadoop2.4.tgz
echo "ERROR: Unsupported Hadoop major version"
return 1
fi
;;
1.2.0)
;;
# 1.3.1 - 1.6.2
1\.[3-6]\.[0-2])
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-1.2.0-bin-hadoop1.tgz
wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-hadoop1.tgz
elif [[ "$HADOOP_MAJOR_VERSION" == "2" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-1.2.0-bin-cdh4.tgz
wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-cdh4.tgz
elif [[ "$HADOOP_MAJOR_VERSION" == "yarn" ]]; then
if [[ "$HADOOP_MINOR_VERSION" == "2.4" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-hadoop2.4.tgz
elif [[ "$HADOOP_MINOR_VERSION" == "2.6" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-hadoop2.6.tgz
else
echo "ERROR: Unknown Hadoop minor version"
return 1
fi
else
wget http://s3.amazonaws.com/spark-related-packages/spark-1.2.0-bin-hadoop2.4.tgz
echo "ERROR: Unsupported Hadoop major version"
return 1
fi
;;
1.2.1)
;;
# 2.0.0 - 2.0.1
2\.0\.[0-1])
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-1.2.1-bin-hadoop1.tgz
wget http://s3.amazonaws.com/spark-related-packages/spark-2.0.0-bin-hadoop1.tgz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version numbers here should not be hard coded to 2.0.0 ? Also it might be good to keep the future proof solution we had of `spark-$SPARK_VERSION-bin-hadoop$HADOOP_MINOR_VERSION.tgz' ?

So thinking more about this I think the idea should be that we do the checking of available spark / hadoop version combinations in the Python file (its easier to read / review / maintain than bash). Then the bash script just does the downloading / setup to handle corner cases like incubating etc. What do you think ?

elif [[ "$HADOOP_MAJOR_VERSION" == "2" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-1.2.1-bin-cdh4.tgz
wget http://s3.amazonaws.com/spark-related-packages/spark-2.0.0-bin-cdh4.tgz
elif [[ "$HADOOP_MAJOR_VERSION" == "yarn" ]]; then
if [[ "$HADOOP_MINOR_VERSION" == "2.4" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-2.0.0-bin-hadoop2.4.tgz
elif [[ "$HADOOP_MINOR_VERSION" == "2.6" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-2.0.0-bin-hadoop2.6.tgz
elif [[ "$HADOOP_MINOR_VERSION" == "2.7" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-2.0.0-bin-hadoop2.7.tgz
else
echo "ERROR: Unknown Hadoop version"
return 1
fi
else
wget http://s3.amazonaws.com/spark-related-packages/spark-1.2.1-bin-hadoop2.4.tgz
echo "ERROR: Unsupported Hadoop major version"
return 1
fi
;;
;;
*)
if [[ "$HADOOP_MAJOR_VERSION" == "1" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-hadoop1.tgz
elif [[ "$HADOOP_MAJOR_VERSION" == "2" ]]; then
wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-cdh4.tgz
else
wget http://s3.amazonaws.com/spark-related-packages/spark-$SPARK_VERSION-bin-hadoop2.4.tgz
fi
if [ $? != 0 ]; then
echo "ERROR: Unknown Spark version"
return -1
return 1
fi
esac
;;
esac

echo "Unpacking Spark"
tar xvzf spark-*.tgz > /tmp/spark-ec2_spark.log
Expand Down
41 changes: 34 additions & 7 deletions spark_ec2.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,12 @@
"2.0.1"
])

VALID_HADOOP_MINOR_VERSIONS = set([
"2.4",
"2.6",
"2.7"
])

SPARK_TACHYON_MAP = {
"1.0.0": "0.4.1",
"1.0.1": "0.4.1",
Expand Down Expand Up @@ -241,7 +247,11 @@ def parse_args():
parser.add_option(
"--hadoop-major-version", default="yarn",
help="Major version of Hadoop. Valid options are 1 (Hadoop 1.0.4), 2 (CDH 4.2.0), yarn " +
"(Hadoop 2.4.0) (default: %default)")
"(Hadoop 2.x) (default: %default)")
parser.add_option(
"--hadoop-minor-version", default="2.4",
help="Minor version of Hadoop. Valid options are 2.4 (Hadoop 2.4.0), 2.6 (Hadoop 2.6.0) and 2.7 (Hadoop 2.7.0). " +
"This only has any effect if yarn is specified as Hadoop major version/ (default: %default)")
parser.add_option(
"-D", metavar="[ADDRESS:]PORT", dest="proxy_port",
help="Use SSH dynamic port forwarding to create a SOCKS proxy at " +
Expand Down Expand Up @@ -371,19 +381,35 @@ def get_or_make_group(conn, name, vpc_id):
print("Creating security group " + name)
return conn.create_security_group(name, "Spark EC2 group", vpc_id)

def validate_spark_hadoop_version(spark_version, hadoop_version):

def validate_spark_hadoop_version(spark_version, hadoop_version, hadoop_minor_version):
if "." in spark_version:
parts = spark_version.split(".")
if parts[0].isdigit():
if parts[0].isdigit() and parts[0].isdigit():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant if check ? I guess this should be parts[1].isdigit() ?

spark_major_version = float(parts[0])
if spark_major_version > 1.0 and hadoop_version != "yarn":
print("Spark version: {v}, does not support Hadoop version: {hv}".
spark_minor_version = float(parts[1])
spark_major_minor_version = spark_major_version + (spark_minor_version / 10)
if spark_major_minor_version > 1.0 and hadoop_version != "yarn":
print("Spark version: {v}, does not support Hadoop major version: {hv}".
format(v=spark_version, hv=hadoop_version), file=stderr)
sys.exit(1)
if hadoop_version == "yarn" and hadoop_minor_version not in VALID_HADOOP_MINOR_VERSIONS:
print("Spark version: {v}, does not support Hadoop minor version: {hm}, supported minor versions: {sv}".
format(v=spark_version, hm=hadoop_minor_version, sv=",".join(VALID_HADOOP_MINOR_VERSIONS)), file=stderr)
sys.exit(1)
if hadoop_minor_version == "2.7" and spark_major_minor_version < 2.0:
print("Spark version: {v}, does not support Hadoop minor version: {hm}".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to list the supported minor versions. Also can we make this a list at the top of the file ? Might be easier to add more hadoop versions later on

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. added.

format(v=spark_version, hm=hadoop_minor_version, sv=",".join(VALID_HADOOP_MINOR_VERSIONS)), file=stderr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable sv is not used in this error message or the next one ?

sys.exit(1)
if hadoop_minor_version == "2.6" and spark_major_minor_version < 1.3:
print("Spark version: {v}, does not support Hadoop minor version: {hm}".
format(v=spark_version, hm=hadoop_minor_version, sv=",".join(VALID_HADOOP_MINOR_VERSIONS)), file=stderr)
sys.exit(1)
else:
print("Invalid Spark version: {v}".format(v=spark_version), file=stderr)
sys.exit(1)


def get_validate_spark_version(version, repo):
if "." in version:
# Remove leading v to handle inputs like v1.5.0
Expand Down Expand Up @@ -1086,7 +1112,7 @@ def deploy_files(conn, root_dir, opts, master_nodes, slave_nodes, modules):
if "." in opts.spark_version:
# Pre-built Spark deploy
spark_v = get_validate_spark_version(opts.spark_version, opts.spark_git_repo)
validate_spark_hadoop_version(spark_v, opts.hadoop_major_version)
validate_spark_hadoop_version(spark_v, opts.hadoop_major_version, opts.hadoop_minor_version)
tachyon_v = get_tachyon_version(spark_v)
else:
# Spark-only custom deploy
Expand All @@ -1113,6 +1139,7 @@ def deploy_files(conn, root_dir, opts, master_nodes, slave_nodes, modules):
"spark_version": spark_v,
"tachyon_version": tachyon_v,
"hadoop_major_version": opts.hadoop_major_version,
"hadoop_minor_version": opts.hadoop_minor_version,
"spark_worker_instances": worker_instances_str,
"spark_master_opts": opts.master_opts
}
Expand Down Expand Up @@ -1297,7 +1324,7 @@ def real_main():

# Input parameter validation
spark_v = get_validate_spark_version(opts.spark_version, opts.spark_git_repo)
validate_spark_hadoop_version(spark_v, opts.hadoop_major_version)
validate_spark_hadoop_version(spark_v, opts.hadoop_major_version, opts.hadoop_minor_version)

if opts.wait is not None:
# NOTE: DeprecationWarnings are silent in 2.7+ by default.
Expand Down
3 changes: 3 additions & 0 deletions tachyon/init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ then
# Not yet supported
echo "Tachyon git hashes are not yet supported. Please specify a Tachyon release version."
# Pre-package tachyon version
if [[ "$HADOOP_MAJOR_VERSION" == "yarn" || "$SPARK_VERSION" > "2" ]]
then
echo "Tachyon is not supported with yarn or Spark 2.0.0 and newer."
else
case "$TACHYON_VERSION" in
0.3.0)
Expand Down