Skip to content

snowplow-devops/terraform-azurerm-transformer-event-hub-vmss

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Release CI License Registry Source

terraform-azurerm-transformer-event-hub-vmss

A Terraform module which deploys the Transformer EventHub service on VMSS.

WARNING: Due to the ability to introduce large numbers of duplicates when scaling this application horizontally we lock the application to a single instance - if you need more throughput from this application you will need to "vertically" scale it by changing the vm_sku to a large node type and re-applying the module. By default this is a Standard_B2s which should handle over 100 RPS without needing any scale-up.

Telemetry

This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.

If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id variable to include a valid email address which we can reach you at.

How do I disable it?

To disable telemetry simply set variable telemetry_enabled = false.

What are you collecting?

For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry

Usage

Transformer takes data from a enriched input topic and transforms this data and writes it into Cloud Storage. There are two type of transformations - Wide row JSON, and Wide row Parquet. When wide row JSON is activated, it only converts event to JSON format. When Wide row Parquet is activated, it converts the event to Parquet format.

module "eh_namespace" {
  source  = "snowplow-devops/event-hub-namespace/azurerm"
  version = "0.1.1"

  name                = "snowplow-pipeline"
  resource_group_name = var.resource_group_name
}

module "enriched_eh_topic" {
  source  = "snowplow-devops/event-hub/azurerm"
  version = "0.1.1"

  name                = "enriched-topic"
  namespace_name      = module.eh_namespace.name
  resource_group_name = var.resource_group_name
}

module "queue_eh_topic" {
  source  = "snowplow-devops/event-hub/azurerm"
  version = "0.1.1"

  name                = "queue-topic"
  namespace_name      = module.eh_namespace.name
  resource_group_name = var.resource_group_name
}

module "storage_account" {
  source = "snowplow-devops/storage-account/azurerm"
  version = "0.1.2"

  name                = "snowplow-storage"
  resource_group_name = var.resource_group_name
}

module "storage_container" {
  source = "snowplow-devops/storage-container/azurerm"
  version = "0.1.1"

  name                 = "transformer-storage"
  storage_account_name = module.storage_account.name
}

module "transformer_service" {
  source = "snowplow-devops/transformer-event-hub-vmss/azurerm"

  accept_limited_use_license = true

  name                = "transformer-server"
  resource_group_name = var.resource_group_name
  subnet_id           = var.subnet_id_for_servers

  enriched_topic_name           = module.enriched_eh_topic.name
  enriched_topic_kafka_password = module.enriched_eh_topic.read_only_primary_connection_string
  queue_topic_name              = module.queue_eh_topic.name
  queue_topic_kafka_password    = module.queue_eh_topic.read_write_primary_connection_string
  eh_namespace_name             = module.eh_namespace.name
  kafka_brokers                 = module.eh_namespace.broker

  storage_account_name   = module.storage_account.name
  storage_container_name = module.storage_container.name
  window_period_min      = 10

  ssh_public_key   = "your-public-key-here"
  ssh_ip_allowlist = ["0.0.0.0/0"]

  # Linking in the custom Iglu Server here
  custom_iglu_resolvers = [
    {
      name            = "Iglu Server"
      priority        = 0
      uri             = "http://your-iglu-server-endpoint/api"
      api_key         = var.iglu_super_api_key
      vendor_prefixes = []
    }
  ]
}

Requirements

Name Version
terraform >= 1.0.0
azuread >= 2.39.0
azurerm >= 3.58.0

Providers

Name Version
azuread >= 2.39.0
azurerm >= 3.58.0

Modules

Name Source Version
service snowplow-devops/service-vmss/azurerm 0.1.1
telemetry snowplow-devops/telemetry/snowplow 0.5.0

Resources

Name Type
azuread_application.transformer_app_registration resource
azuread_application_password.transformer_app_pasword resource
azuread_service_principal.transformer_sp resource
azurerm_eventhub_consumer_group.enriched_topic resource
azurerm_network_security_group.nsg resource
azurerm_network_security_rule.egress_tcp_443 resource
azurerm_network_security_rule.egress_tcp_80 resource
azurerm_network_security_rule.egress_udp_123 resource
azurerm_network_security_rule.ingress_tcp_22 resource
azurerm_role_assignment.transformer_app_ra resource
azuread_client_config.current data source
azurerm_resource_group.rg data source
azurerm_storage_container.sc data source

Inputs

Name Description Type Default Required
enriched_topic_kafka_password Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for reading is expected) string n/a yes
enriched_topic_name The name of the enriched Event Hubs topic that transformer will pull data from string n/a yes
kafka_brokers The brokers to configure for access to the Kafka Cluster (note: as default the EventHubs namespace broker) string n/a yes
name A name which will be pre-pended to the resources created string n/a yes
queue_topic_kafka_password Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for writing is expected) string n/a yes
queue_topic_name The name of the queue Event Hubs topic that the transformer will push messages to for the loader string n/a yes
resource_group_name The name of the resource group to deploy the service into string n/a yes
ssh_public_key The SSH public key attached for access to the servers string n/a yes
storage_account_name Name of the output storage account string n/a yes
storage_container_name Name of the output storage container string n/a yes
subnet_id The subnet id to deploy the service into string n/a yes
window_period_min Frequency to emit loading finished message - 5,10,15,20,30,60 etc minutes number n/a yes
accept_limited_use_license Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/) bool false no
app_version Transformer app version to use. This variable facilitates dev flow, the modules may not work with anything other than the default value. string "5.7.5" no
associate_public_ip_address Whether to assign a public ip address to this instance bool true no
custom_iglu_resolvers The custom Iglu Resolvers that will be used by Enrichment to resolve and validate events
list(object({
name = string
priority = number
uri = string
api_key = string
vendor_prefixes = list(string)
}))
[] no
default_iglu_resolvers The default Iglu Resolvers that will be used by Enrichment to resolve and validate events
list(object({
name = string
priority = number
uri = string
api_key = string
vendor_prefixes = list(string)
}))
[
{
"api_key": "",
"name": "Iglu Central",
"priority": 10,
"uri": "http://iglucentral.com",
"vendor_prefixes": []
},
{
"api_key": "",
"name": "Iglu Central - Mirror 01",
"priority": 20,
"uri": "http://mirror01.iglucentral.com",
"vendor_prefixes": []
}
]
no
eh_namespace_name The name of the Event Hubs namespace (note: if you are not using EventHubs leave this blank) string "" no
enriched_topic_kafka_username Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) string "$ConnectionString" no
java_opts Custom JAVA Options string "-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" no
kafka_source The source providing the Kafka connectivity (def: azure_event_hubs) string "azure_event_hubs" no
queue_topic_kafka_username Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) string "$ConnectionString" no
ssh_ip_allowlist The comma-seperated list of CIDR ranges to allow SSH traffic from list(string)
[
"0.0.0.0/0"
]
no
tags The tags to append to this resource map(string) {} no
telemetry_enabled Whether or not to send telemetry information back to Snowplow Analytics Ltd bool true no
transformer_compression Transformer output compression, GZIP or NONE string "GZIP" no
user_provided_id An optional unique identifier to identify the telemetry events emitted by this stack string "" no
vm_sku The instance type to use string "Standard_B2s" no
widerow_file_format The output file_format from the widerow transformation_type selected (json or parquet) string "json" no

Outputs

Name Description
nsg_id ID of the network security group attached to the Transformer Server nodes
vmss_id ID of the VM scale-set

Copyright and license

Copyright 2023-present Snowplow Analytics Ltd.

Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)