Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(build): add Spark v3.4.1 #40

Merged
merged 11 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,30 @@ jobs:
scala: "2.13"
with_hive: "true"
with_pyspark: "true"
- spark: "3.4.1"
java: "8"
hadoop: "3.3.4"
scala: "2.12"
with_hive: "true"
with_pyspark: "true"
- spark: "3.4.1"
java: "8"
hadoop: "3.3.4"
scala: "2.13"
with_hive: "true"
with_pyspark: "true"
- spark: "3.4.1"
java: "11"
hadoop: "3.3.4"
scala: "2.12"
with_hive: "true"
with_pyspark: "true"
- spark: "3.4.1"
java: "11"
hadoop: "3.3.4"
scala: "2.13"
with_hive: "true"
with_pyspark: "true"
runs-on: ubuntu-20.04
env:
IMAGE_NAME: "spark-k8s"
Expand Down
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
## v3

- (Temporarily drop support for R due to keyserver issues)
- Only supports for for 3.1.3, 3.2.2, 3.3.0 (dropped 2.4.8).
- Only supports for for 3.1.3, 3.2.2, 3.3.0, 3.4.1 (dropped 2.4.8).
- Supports both Java 8 and 11 for Spark 3 builds.
- Add Ubuntu-based image since the migration to eclipse-temurin for jre image source.

## v2

Expand Down
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,22 @@ Debian:
- `3.3.0`
- `3.2.2`
- `3.1.3`
- `3.4.1`

## Note

(R builds are temporarily suspended due to keyserver issues at current time.)

All the build images here are Debian based as the official Spark repository now
uses `openjdk:<java>-jdk-slim-buster` as the base image for Kubernetes build.
Because currently the official Dockerfiles do not pin the Debian distribution,
they are incorrectly using the latest Debian `bullseye`, which does not have
support for Python 2, and its Python 3.9 do not work well with PySpark.
Build image for Spark 3.4.1 is Ubuntu based because openjdk is deprecated and
going forward the official Spark repository uses `eclipse-temurin:<java>-jre`
where slim variants of jre images are not available at the moment.

All the build images with Spark before v3.4.0 are Debian based as the official
Spark repository now uses `openjdk:<java>-jre-slim-buster` as the base image
for Kubernetes build. Because currently the official Dockerfiles do not pin
the Debian distribution, they are incorrectly using the latest Debian `bullseye`,
which does not have support for Python 2, and its Python 3.9 do not work well
with PySpark.

Hence some Dockerfile overrides are in-place to make sure that Spark 2 builds
can still work.
Expand Down
11 changes: 10 additions & 1 deletion make-distribution.sh
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,15 @@ else
DOCKERFILE_PY="./resource-managers/kubernetes/docker/src/main/dockerfiles/spark/bindings/python/Dockerfile"
fi

if [[ ${SPARK_MAJOR_VERSION} -eq 3 && ${SPARK_MINOR_VERSION} -ge 4 ]]; then # >=3.4
# From Spark v3.4.0 onwards, openjdk is not the prefered base image source as it i
# deprecated and taken over by eclipse-temurin. slim-buster variants are not available
# on eclipse-temurin at the moment.
IMAGE_VARIANT="jre"
else
IMAGE_VARIANT="jre-slim-buster"
fi

# Temporarily remove R build due to keyserver issue
# DOCKERFILE_R="./resource-managers/kubernetes/docker/src/main/dockerfiles/R/Dockerfile"

Expand All @@ -83,7 +92,7 @@ TAG_NAME="${SELF_VERSION}_${SPARK_LABEL}_hadoop-${HADOOP_VERSION}_scala-${SCALA_
# build

./bin/docker-image-tool.sh \
-b java_image_tag=${JAVA_VERSION}-jre-slim-buster \
-b java_image_tag=${JAVA_VERSION}-${IMAGE_VARIANT} \
-r "${IMAGE_NAME}" \
-t "${TAG_NAME}" \
-f "${DOCKERFILE_BASE}" \
Expand Down
5 changes: 5 additions & 0 deletions templates/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,8 @@ versions:
java: ['8', '11']
hadoop: ['3.3.2']
scala: ['2.12', '2.13']

- spark: ['3.4.1']
java: ['8', '11']
hadoop: ['3.3.4']
scala: ['2.12', '2.13']
Loading