Skip to content
This repository has been archived by the owner on Apr 21, 2021. It is now read-only.

A Google BigQuery sink for Google Pubsub events written in Rust

Notifications You must be signed in to change notification settings

deBijenkorf/pubsub_bigquery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pubsub_bigquery

Build Status

Project status: ALPHA

A Google BigQuery sink for Google Pubsub events written in Rust. This project originated after some frustrations with writing another version of this flow using the Apache Beam SDK. Although the Apache Beam SDK is relatively straight forward, I noticed that it's either:

  • Expensive (when using the Google Dataflow runner)
  • Slow (when using the SparkRunner)
  • In both cases quite memory intensive

Because of these reasons it seemed like a good idea to implement this flow in Rust using the Google API's directly. This application currently does not have any options for horizontal scaling

Quickstart

The application requires a few settings to be configured through a config file. The example below illustrates a simple setup:

  1. Build the application using Cargo

    cargo build --release
    
  2. Create a configuration file:

    #config.toml
    
    debug = false
    mode = "subscribe"
    
    [pubsub]
    project_id = "PROJECT_ID"
    topic = "projects/PROJECT_ID/topics/TOPIC"
    subscription = "projects/PROJECT_ID/subscriptions/SUBSCRIPTION"
    
    [bigquery]
    project_id = "PROJECT_ID"
    dataset = "DATASET_NAME"
    table = "TABLE_NAME"
    format = "CSV"
    delimiter = "\t" // can be any ISO-8859-1 single-byte character
    quote = ""
    auto_detect = true // auto detect schema in pubsub topic
    allow_jagged_rows = true // allow null values in last columns
    
    [limits]
    pubsub_max_messages = 500
    bigquery_time_limit = 90
    bigquery_max_messages = 1000
    
  3. Run the application

    cd target/release
    ./pubsub_bigquery config.toml
    

Creating a Docker image

It's possible to create a Docker container with a size of approximately 10MB by creating a static Rust binary. You will need the rust-musl-builder for it. More information about this can be found here

  1. First create a release using the following command:

    alias rust-musl-builder='docker run --rm -it -v "$(pwd)":/home/rust/src ekidd/rust-musl-builder'
    rust-musl-builder cargo build --release
    
  2. Then create a Docker image using the supplied Dockerfile:

    docker build -t pubsub-bigquery:0.1 .
    
  3. And lastly, run the Docker container:

    docker run pubsub-bigquery:0.1
    

About

A Google BigQuery sink for Google Pubsub events written in Rust

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published