pubsub_bigquery

Project status: ALPHA

A Google BigQuery sink for Google Pubsub events written in Rust. This project originated after some frustrations with writing another version of this flow using the Apache Beam SDK. Although the Apache Beam SDK is relatively straight forward, I noticed that it's either:

Expensive (when using the Google Dataflow runner)
Slow (when using the SparkRunner)
In both cases quite memory intensive

Because of these reasons it seemed like a good idea to implement this flow in Rust using the Google API's directly. This application currently does not have any options for horizontal scaling

Quickstart

The application requires a few settings to be configured through a config file. The example below illustrates a simple setup:

Build the application using Cargo
```
cargo build --release
```

Create a configuration file:

#config.toml

debug = false
mode = "subscribe"

[pubsub]
project_id = "PROJECT_ID"
topic = "projects/PROJECT_ID/topics/TOPIC"
subscription = "projects/PROJECT_ID/subscriptions/SUBSCRIPTION"

[bigquery]
project_id = "PROJECT_ID"
dataset = "DATASET_NAME"
table = "TABLE_NAME"
format = "CSV"
delimiter = "\t" // can be any ISO-8859-1 single-byte character
quote = ""
auto_detect = true // auto detect schema in pubsub topic
allow_jagged_rows = true // allow null values in last columns

[limits]
pubsub_max_messages = 500
bigquery_time_limit = 90
bigquery_max_messages = 1000

Run the application

cd target/release
./pubsub_bigquery config.toml

Creating a Docker image

It's possible to create a Docker container with a size of approximately 10MB by creating a static Rust binary. You will need the rust-musl-builder for it. More information about this can be found here

First create a release using the following command:

alias rust-musl-builder='docker run --rm -it -v "$(pwd)":/home/rust/src ekidd/rust-musl-builder'
rust-musl-builder cargo build --release

Then create a Docker image using the supplied Dockerfile:
```
docker build -t pubsub-bigquery:0.1 .
```
And lastly, run the Docker container:
```
docker run pubsub-bigquery:0.1
```

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pubsub_bigquery

Quickstart

Creating a Docker image

About

Releases

Packages

Languages

deBijenkorf/pubsub_bigquery

Folders and files

Latest commit

History

Repository files navigation

pubsub_bigquery

Quickstart

Creating a Docker image

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages