Skip to content

Commit

Permalink
[#4551] feat(iceberg): add S3 and GCS support for IcebergRESTService …
Browse files Browse the repository at this point in the history
…docker image (#5377)

### What changes were proposed in this pull request?
1.  add AWS and GCP bundle jar to IcebergRESTServer docker image
2.  use environment variable to change the config 

### Why are the changes needed?


Fix: #4551 

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
run SQL with access S3 and GCS data

Co-authored-by: FANNG <[email protected]>
  • Loading branch information
github-actions[bot] and FANNG1 authored Oct 30, 2024
1 parent 6832d1c commit de27d58
Show file tree
Hide file tree
Showing 7 changed files with 162 additions and 5 deletions.
1 change: 1 addition & 0 deletions bundles/gcp-bundle/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ tasks.withType(ShadowJar::class.java) {
relocate("org.apache.httpcomponents", "org.apache.gravitino.shaded.org.apache.httpcomponents")
relocate("org.apache.commons", "org.apache.gravitino.shaded.org.apache.commons")
relocate("com.google", "org.apache.gravitino.shaded.com.google")
relocate("com.fasterxml", "org.apache.gravitino.shaded.com.fasterxml")
}

tasks.jar {
Expand Down
2 changes: 1 addition & 1 deletion dev/docker/iceberg-rest-server/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ COPY packages/gravitino-iceberg-rest-server /root/gravitino-iceberg-rest-server

EXPOSE 9001

ENTRYPOINT ["/bin/bash", "/root/gravitino-iceberg-rest-server/bin/gravitino-iceberg-rest-server.sh", "start"]
ENTRYPOINT ["/bin/bash", "/root/gravitino-iceberg-rest-server/bin/start-iceberg-rest-server.sh"]
28 changes: 28 additions & 0 deletions dev/docker/iceberg-rest-server/iceberg-rest-server-dependency.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,34 @@ cd distribution
tar xfz gravitino-iceberg-rest-server-*.tar.gz
cp -r gravitino-iceberg-rest-server*-bin ${iceberg_rest_server_dir}/packages/gravitino-iceberg-rest-server

cd ${gravitino_home}
./gradlew :bundles:gcp-bundle:jar
./gradlew :bundles:aws-bundle:jar

# prepare bundle jar
cd ${iceberg_rest_server_dir}
mkdir -p bundles
cp ${gravitino_home}/bundles/gcp-bundle/build/libs/gravitino-gcp-bundle-*.jar bundles/
cp ${gravitino_home}/bundles/aws-bundle/build/libs/gravitino-aws-bundle-*.jar bundles/

iceberg_gcp_bundle="iceberg-gcp-bundle-1.5.2.jar"
if [ ! -f "bundles/${iceberg_gcp_bundle}" ]; then
curl -L -s -o bundles/${iceberg_gcp_bundle} https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-gcp-bundle/1.5.2/${iceberg_gcp_bundle}
fi

iceberg_aws_bundle="iceberg-aws-bundle-1.5.2.jar"
if [ ! -f "bundles/${iceberg_aws_bundle}" ]; then
curl -L -s -o bundles/${iceberg_aws_bundle} https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/1.5.2/${iceberg_aws_bundle}
fi

# download jdbc driver
curl -L -s -o bundles/sqlite-jdbc-3.42.0.0.jar https://repo1.maven.org/maven2/org/xerial/sqlite-jdbc/3.42.0.0/sqlite-jdbc-3.42.0.0.jar

cp bundles/*jar ${iceberg_rest_server_dir}/packages/gravitino-iceberg-rest-server/libs/

cp start-iceberg-rest-server.sh ${iceberg_rest_server_dir}/packages/gravitino-iceberg-rest-server/bin/
cp rewrite_config.py ${iceberg_rest_server_dir}/packages/gravitino-iceberg-rest-server/bin/

# Keeping the container running at all times
cat <<EOF >> "${iceberg_rest_server_dir}/packages/gravitino-iceberg-rest-server/bin/gravitino-iceberg-rest-server.sh"
Expand Down
78 changes: 78 additions & 0 deletions dev/docker/iceberg-rest-server/rewrite_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
#!/usr/bin/env
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

import os

env_map = {
"GRAVITINO_IO_IMPL" : "io-impl",
"GRAVITINO_URI" : "uri",
"GRAVITINO_WAREHOUSE" : "warehouse",
"GRAVITINO_CREDENTIAL_PROVIDER_TYPE" : "credential-provider-type",
"GRAVITINO_GCS_CREDENTIAL_FILE_PATH" : "gcs-credential-file-path",
"GRAVITINO_S3_ACCESS_KEY" : "s3-access-key-id",
"GRAVITINO_S3_SECRET_KEY" : "s3-secret-access-key",
"GRAVITINO_S3_REGION" : "s3-region",
"GRAVITINO_S3_ROLE_ARN" : "s3-role-arn",
"GRAVITINO_S3_EXTERNAL_ID" : "s3-external-id"
}

init_config = {
"catalog-backend" : "jdbc",
"jdbc-driver" : "org.sqlite.JDBC",
"uri" : "jdbc:sqlite::memory:",
"jdbc-user" : "iceberg",
"jdbc-password" : "iceberg",
"jdbc-initialize" : "true",
"jdbc.schema-version" : "V1"
}


def parse_config_file(file_path):
config_map = {}
with open(file_path, 'r') as file:
for line in file:
stripped_line = line.strip()
if stripped_line and not stripped_line.startswith('#'):
key, value = stripped_line.split('=')
key = key.strip()
value = value.strip()
config_map[key] = value
return config_map

config_prefix = "gravitino.iceberg-rest."

def update_config(config, key, value):
config[config_prefix + key] = value

config_file_path = 'conf/gravitino-iceberg-rest-server.conf'
config_map = parse_config_file(config_file_path)

for k, v in init_config.items():
update_config(config_map, k, v)

for k, v in env_map.items():
if k in os.environ:
update_config(config_map, v, os.environ[k])

if os.path.exists(config_file_path):
os.remove(config_file_path)

with open(config_file_path, 'w') as file:
for key, value in config_map.items():
line = "{} = {}\n".format(key, value)
file.write(line)
29 changes: 29 additions & 0 deletions dev/docker/iceberg-rest-server/start-iceberg-rest-server.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

set -ex
bin_dir="$(dirname "${BASH_SOURCE-$0}")"
iceberg_rest_server_dir="$(cd "${bin_dir}/../">/dev/null; pwd)"

cd ${iceberg_rest_server_dir}

python bin/rewrite_config.py

./bin/gravitino-iceberg-rest-server.sh start
8 changes: 7 additions & 1 deletion docs/docker-image-details.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,11 +51,17 @@ You can deploy the standalone Gravitino Iceberg REST server with the Docker imag
Container startup commands

```shell
docker run --rm -d -p 9001:9001 apache/gravitino-iceberg-rest:0.6.1-incubating
docker run --rm -d -p 9001:9001 apache/gravitino-iceberg-rest:0.7.0-incubating
```

Changelog

- apache/gravitino-iceberg-rest:0.7.0-incubating
- Using JDBC catalog backend.
- Supports S3 and GCS storage.
- Supports credential vending.
- Supports changing configuration by environment variables.

- apache/gravitino-iceberg-rest:0.6.1-incubating
- Based on Gravitino 0.6.1-incubating, you can know more information from 0.6.1-incubating release notes.

Expand Down
21 changes: 18 additions & 3 deletions docs/iceberg-rest-service.md
Original file line number Diff line number Diff line change
Expand Up @@ -399,13 +399,28 @@ SELECT * FROM dml.test;
You could run Gravitino Iceberg REST server though docker container:

```shell
docker run -d -p 9001:9001 apache/gravitino-iceberg-rest:0.6.0
docker run -d -p 9001:9001 apache/gravitino-iceberg-rest:0.7.0-incubating
```

Or build it manually to add custom logics:
Gravitino Iceberg REST server in docker image could access local storage by default, you could set the following environment variables if the storage is cloud/remote storage like S3, please refer to [storage section](#storage) for more details.

| Environment variables | Configuration items | Since version |
|--------------------------------------|---------------------------------------------------|-------------------|
| `GRAVITINO_IO_IMPL` | `gravitino.iceberg-rest.io-impl` | 0.7.0-incubating |
| `GRAVITINO_URI` | `gravitino.iceberg-rest.uri` | 0.7.0-incubating |
| `GRAVITINO_WAREHOUSE` | `gravitino.iceberg-rest.warehouse` | 0.7.0-incubating |
| `GRAVITINO_CREDENTIAL_PROVIDER_TYPE` | `gravitino.iceberg-rest.credential-provider-type` | 0.7.0-incubating |
| `GRAVITINO_GCS_CREDENTIAL_FILE_PATH` | `gravitino.iceberg-rest.gcs-credential-file-path` | 0.7.0-incubating |
| `GRAVITINO_S3_ACCESS_KEY` | `gravitino.iceberg-rest.s3-access-key-id` | 0.7.0-incubating |
| `GRAVITINO_S3_SECRET_KEY` | `gravitino.iceberg-rest.s3-secret-access-key` | 0.7.0-incubating |
| `GRAVITINO_S3_REGION` | `gravitino.iceberg-rest.s3-region` | 0.7.0-incubating |
| `GRAVITINO_S3_ROLE_ARN` | `gravitino.iceberg-rest.s3-role-arn` | 0.7.0-incubating |
| `GRAVITINO_S3_EXTERNAL_ID` | `gravitino.iceberg-rest.s3-external-id` | 0.7.0-incubating |

Or build it manually to add custom configuration or logics:

```shell
sh ./dev/docker/build-docker.sh --platform linux/arm64 --type iceberg-rest-server --image apache/gravitino-iceberg-rest --tag 0.6.0
sh ./dev/docker/build-docker.sh --platform linux/arm64 --type iceberg-rest-server --image apache/gravitino-iceberg-rest --tag 0.7.0-incubating
```

You could try Spark with Gravitino REST catalog service in our [playground](./how-to-use-the-playground.md#using-apache-iceberg-rest-service).

0 comments on commit de27d58

Please sign in to comment.