Skip to content

snowplow-devops/terraform-google-collector-pubsub-ce

Repository files navigation

Release CI License Registry Source

terraform-google-collector-pubsub-ce

A Terraform module which deploys the Snowplow Stream Collector on CE. If you want to use a custom image for this deployment you will need to ensure it is based on top of Ubuntu 20.04.

Telemetry

This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.

If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id variable to include a valid email address which we can reach you at.

How do I disable it?

To disable telemetry simply set variable telemetry_enabled = false.

What are you collecting?

For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry

Usage

A collector requires two output PubSub Topics and a Load Balancer which is deployed upstream. The Load Balancer ensures we can easily configure TLS termination later in the setup and provides a simple mechanism for setting up DNS.

module "raw_topic" {
  source  = "snowplow-devops/pubsub-topic/google"
  version = "0.3.0"

  name = "raw-topic"
}

module "bad_1_topic" {
  source  = "snowplow-devops/pubsub-topic/google"
  version = "0.3.0"

  name = "bad-1-topic"
}

module "collector_pubsub" {
  source  = "snowplow-devops/collector-pubsub-ce/google"

  accept_limited_use_license = true

  name = "collector-server"

  network    = var.network
  subnetwork = var.subnetwork
  region     = var.region

  ssh_ip_allowlist = ["0.0.0.0/0"]
  ssh_key_pairs    = []

  topic_project_id = var.project_id
  good_topic_name  = module.raw_topic.name
  bad_topic_name   = module.bad_1_topic.name
}

module "collector_lb" {
  source  = "snowplow-devops/lb/google"
  version = "0.3.0"

  name = "collector-lb"

  instance_group_named_port_http = module.collector_pubsub.named_port_http
  instance_group_url             = module.collector_pubsub.instance_group_url
  health_check_self_link         = module.collector_pubsub.health_check_self_link
}

Requirements

Name Version
terraform >= 1.0.0
google >= 3.44.0

Providers

Name Version
google >= 3.44.0

Modules

Name Source Version
service snowplow-devops/service-ce/google 0.1.0
telemetry snowplow-devops/telemetry/snowplow 0.5.0

Resources

Name Type
google_compute_firewall.egress resource
google_compute_firewall.ingress resource
google_compute_firewall.ingress_ssh resource
google_project_iam_member.sa_logging_log_writer resource
google_project_iam_member.sa_pubsub_publisher resource
google_project_iam_member.sa_pubsub_viewer resource
google_service_account.sa resource

Inputs

Name Description Type Default Required
bad_topic_name The name of the bad pubsub topic that the collector will insert data into string n/a yes
good_topic_name The name of the good pubsub topic that the collector will insert data into string n/a yes
name A name which will be pre-pended to the resources created string n/a yes
network The name of the network to deploy within string n/a yes
project_id The project ID in which the stack is being deployed string n/a yes
region The name of the region to deploy within string n/a yes
topic_project_id The project ID in which the topics are deployed string n/a yes
accept_limited_use_license Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/) bool false no
app_version App version to use. This variable facilitates dev flow, the modules may not work with anything other than the default value. string "3.0.1" no
associate_public_ip_address Whether to assign a public ip address to this instance; if false this instance must be behind a Cloud NAT to connect to the internet bool true no
byte_limit The amount of bytes to buffer events before pushing them to PubSub number 1000000 no
cookie_domain Optional first party cookie domain for the collector to set cookies on (e.g. acme.com) string "" no
custom_paths Optional custom paths that the collector will respond to, typical paths to override are '/com.snowplowanalytics.snowplow/tp2', '/com.snowplowanalytics.iglu/v1' and '/r/tp2'. e.g. { "/custom/path/" : "/com.snowplowanalytics.snowplow/tp2"} map(string) {} no
gcp_logs_enabled Whether application logs should be reported to GCP Logging bool true no
health_check_path The path to bind for health checks string "/health" no
ingress_port The port that the collector will be bound to and expose over HTTP number 8080 no
java_opts Custom JAVA Options string "-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" no
labels The labels to append to this resource map(string) {} no
machine_type The machine type to use string "e2-small" no
network_project_id The project ID of the shared VPC in which the stack is being deployed string "" no
record_limit The number of events to buffer before pushing them to PubSub number 500 no
ssh_block_project_keys Whether to block project wide SSH keys bool true no
ssh_ip_allowlist The list of CIDR ranges to allow SSH traffic from list(any)
[
"0.0.0.0/0"
]
no
ssh_key_pairs The list of SSH key-pairs to add to the servers
list(object({
user_name = string
public_key = string
}))
[] no
subnetwork The name of the sub-network to deploy within; if populated will override the 'network' setting string "" no
target_size The number of servers to deploy number 1 no
telemetry_enabled Whether or not to send telemetry information back to Snowplow Analytics Ltd bool true no
time_limit_ms The amount of time to buffer events before pushing them to PubSub number 500 no
ubuntu_20_04_source_image The source image to use which must be based of of Ubuntu 20.04; by default the latest community version is used string "" no
user_provided_id An optional unique identifier to identify the telemetry events emitted by this stack string "" no

Outputs

Name Description
health_check_id Identifier for the health check on the instance group
health_check_self_link The URL for the health check on the instance group
instance_group_url The full URL of the instance group created by the manager
manager_id Identifier for the instance group manager
manager_self_link The URL for the instance group manager
named_port_http The name of the port exposed by the instance group
named_port_value The named port value (e.g. 8080)

Copyright and license

Copyright 2021-present Snowplow Analytics Ltd.

Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)