async-file-io

An implementation of Apache Iceberg's FileIO that downloads files asynchronously.

Async downloads are started when a new InputFile is created from the FileIO instance. The InputFile returned will block when newStream is called until the download completes.

The underlying ResolvingFileIO is used for newOutputFile and deleteFile.

Building

To build, run gradle build:

./gradlew build

Configuration

To configure this FileIO, set the io-impl property on a catalog.

Here is an example of Spark configuration for a catalog named prod:

spark.sql.catalog.prod=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.prod.type=rest
spark.sql.catalog.prod.uri=https://api.tabular.io/ws
spark.sql.catalog.prod.credential=...
spark.sql.catalog.prod.warehouse=prod
spark.sql.catalog.prod.io-impl=io.tabular.AsyncFileIO
spark.sql.catalog.prod.async.cache-location=file:/tmp

Where data is locally stored is configured by async.cache-location. The cache location can be either a local path (e.g. file:/tmp) or memory:/ to cache data in an in-memory FileIO.

To configure the number of background threads, set the Java system property iceberg.worker.num-threads.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
gradle/wrapper		gradle/wrapper
src/main/java/io/tabular		src/main/java/io/tabular
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle		build.gradle
gradlew		gradlew
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

async-file-io

Building

Configuration

About

Releases

Packages

Contributors 2

Languages

License

tabular-io/async-file-io

Folders and files

Latest commit

History

Repository files navigation

async-file-io

Building

Configuration

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages