A Terraform module which deploys a Snowplow Postgres Loader application on Google running on top of Compute Engine. If you want to use a custom image for this deployment you will need to ensure it is based on top of Ubuntu 20.04.
WARNING: If you are upgrading from module version 0.1.x you will need to issue a manual table update - details can be found here. You will need to adjust the alter table command with the schema that your events
table is deployed within.
This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.
If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id
variable to include a valid email address which we can reach you at.
To disable telemetry simply set variable telemetry_enabled = false
.
For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry
The Postgres Loader can load both your enriched and bad data into a Postgres database - by default we are using CloudSQL as it affords a simple and cost effective way to get started.
To start loading "enriched" data into Postgres:
module "enriched_topic" {
source = "snowplow-devops/pubsub-topic/google"
version = "0.3.0"
name = "enriched-topic"
}
module "pipeline_db" {
source = "snowplow-devops/cloud-sql/google"
version = "0.3.0"
name = "pipeline-db"
region = var.region
db_name = local.pipeline_db_name
db_username = local.pipeline_db_username
db_password = local.pipeline_db_password
# Note: this exposes your data to the internet - take care to ensure your allowlist is strict enough
authorized_networks = local.pipeline_authorized_networks
# Note: required for higher concurrent connections count which is neccesary for loading both good and bad data at the same time
tier = "db-g1-small"
}
module "postgres_loader_enriched" {
source = "snowplow-devops/postgres-loader-pubsub-ce/google"
accept_limited_use_license = true
name = "pg-loader-enriched-server"
network = var.network
subnetwork = var.subnetwork
region = var.region
project_id = var.project_id
ssh_key_pairs = []
ssh_ip_allowlist = ["0.0.0.0/0"]
in_topic_name = module.enriched_topic.name
purpose = "ENRICHED_EVENTS"
schema_name = "atomic"
# Note: Using the connection_name will enforce the use of a Cloud SQL Proxy rather than a direct connection
# To instead use a direct connection you will need to define the `db_host` parameter instead.
db_instance_name = module.pipeline_db.connection_name
db_port = module.pipeline_db.port
db_name = local.pipeline_db_name
db_username = local.pipeline_db_username
db_password = local.pipeline_db_password
# Linking in the custom Iglu Server here
custom_iglu_resolvers = [
{
name = "Iglu Server"
priority = 0
uri = "http://your-iglu-server-endpoint/api"
api_key = var.iglu_super_api_key
vendor_prefixes = []
}
]
}
To load the "bad" data instead:
module "bad_1_topic" {
source = "snowplow-devops/pubsub-topic/google"
version = "0.3.0"
name = "bad-1-topic"
}
module "postgres_loader_bad" {
source = "snowplow-devops/postgres-loader-pubsub-ce/google"
accept_limited_use_license = true
name = "pg-loader-bad-server"
network = var.network
subnetwork = var.subnetwork
region = var.region
project_id = var.project_id
ssh_key_pairs = []
ssh_ip_allowlist = ["0.0.0.0/0"]
in_topic_name = module.bad_1_topic.name
# Note: The purpose defines what the input data set should look like
purpose = "JSON"
# Note: This schema is created automatically by the VM on launch
schema_name = "atomic_bad"
# Note: Using the connection_name will enforce the use of a Cloud SQL Proxy rather than a direct connection
# To instead use a direct connection you will need to define the `db_host` parameter instead.
db_instance_name = module.pipeline_db.connection_name
db_port = module.pipeline_db.port
db_name = local.pipeline_db_name
db_username = local.pipeline_db_username
db_password = local.pipeline_db_password
# Linking in the custom Iglu Server here
custom_iglu_resolvers = [
{
name = "Iglu Server"
priority = 0
uri = "http://your-iglu-server-endpoint/api"
api_key = var.iglu_super_api_key
vendor_prefixes = []
}
]
}
Name | Version |
---|---|
terraform | >= 1.0.0 |
>= 3.44.0 |
Name | Version |
---|---|
>= 3.44.0 |
Name | Source | Version |
---|---|---|
service | snowplow-devops/service-ce/google | 0.1.0 |
telemetry | snowplow-devops/telemetry/snowplow | 0.5.0 |
Name | Type |
---|---|
google_compute_firewall.egress | resource |
google_compute_firewall.ingress_ssh | resource |
google_project_iam_member.sa_cloud_sql_client | resource |
google_project_iam_member.sa_logging_log_writer | resource |
google_project_iam_member.sa_pubsub_publisher | resource |
google_project_iam_member.sa_pubsub_subscriber | resource |
google_project_iam_member.sa_pubsub_viewer | resource |
google_pubsub_subscription.in | resource |
google_service_account.sa | resource |
Name | Description | Type | Default | Required |
---|---|---|---|---|
db_name | The name of the database to connect to | string |
n/a | yes |
db_password | The password to use to connect to the database | string |
n/a | yes |
db_port | The port the database is running on | number |
n/a | yes |
db_username | The username to use to connect to the database | string |
n/a | yes |
in_topic_name | The name of the input pubsub topic that the loader will pull data from | string |
n/a | yes |
name | A name which will be pre-pended to the resources created | string |
n/a | yes |
network | The name of the network to deploy within | string |
n/a | yes |
project_id | The project ID in which the stack is being deployed | string |
n/a | yes |
purpose | The type of data the loader will be pulling which can be one of ENRICHED_EVENTS or JSON (Note: JSON can be used for loading bad rows) | string |
n/a | yes |
region | The name of the region to deploy within | string |
n/a | yes |
schema_name | The database schema to load data into (e.g atomic | atomic_bad) | string |
n/a | yes |
accept_limited_use_license | Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/) | bool |
false |
no |
app_version | App version to use. This variable facilitates dev flow, the modules may not work with anything other than the default value. | string |
"0.3.1" |
no |
associate_public_ip_address | Whether to assign a public ip address to this instance; if false this instance must be behind a Cloud NAT to connect to the internet | bool |
true |
no |
custom_iglu_resolvers | The custom Iglu Resolvers that will be used by the loader to resolve and validate events | list(object({ |
[] |
no |
db_host | The hostname of the database to connect to (Note: if db_instance_name is non-empty this setting is ignored) | string |
"" |
no |
db_instance_name | The instance name of the CloudSQL instance to connect to (Note: if set db_host will be ignored and a proxy established instead) | string |
"" |
no |
db_max_connections | The maximum number of connections to the backing database | number |
10 |
no |
default_iglu_resolvers | The default Iglu Resolvers that will be used by the loader to resolve and validate events | list(object({ |
[ |
no |
gcp_logs_enabled | Whether application logs should be reported to GCP Logging | bool |
true |
no |
in_max_concurrent_checkpoints | The maximum number of concurrent effects for the topic checkpointing system - essentially how many concurrent acks we will make to PubSub | number |
100 |
no |
java_opts | Custom JAVA Options | string |
"-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" |
no |
labels | The labels to append to this resource | map(string) |
{} |
no |
machine_type | The machine type to use | string |
"e2-small" |
no |
network_project_id | The project ID of the shared VPC in which the stack is being deployed | string |
"" |
no |
ssh_block_project_keys | Whether to block project wide SSH keys | bool |
true |
no |
ssh_ip_allowlist | The list of CIDR ranges to allow SSH traffic from | list(any) |
[ |
no |
ssh_key_pairs | The list of SSH key-pairs to add to the servers | list(object({ |
[] |
no |
subnetwork | The name of the sub-network to deploy within; if populated will override the 'network' setting | string |
"" |
no |
target_size | The number of servers to deploy | number |
1 |
no |
telemetry_enabled | Whether or not to send telemetry information back to Snowplow Analytics Ltd | bool |
true |
no |
ubuntu_20_04_source_image | The source image to use which must be based of of Ubuntu 20.04; by default the latest community version is used | string |
"" |
no |
user_provided_id | An optional unique identifier to identify the telemetry events emitted by this stack | string |
"" |
no |
Name | Description |
---|---|
instance_group_url | The full URL of the instance group created by the manager |
manager_id | Identifier for the instance group manager |
manager_self_link | The URL for the instance group manager |
Copyright 2021-present Snowplow Analytics Ltd.
Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)