A Terraform module which deploys the Transformer EventHub service on VMSS.
WARNING: Due to the ability to introduce large numbers of duplicates when scaling this application horizontally we lock the application to a single instance - if you need more throughput from this application you will need to "vertically" scale it by changing the vm_sku
to a large node type and re-applying the module. By default this is a Standard_B2s
which should handle over 100 RPS without needing any scale-up.
This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.
If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id
variable to include a valid email address which we can reach you at.
To disable telemetry simply set variable telemetry_enabled = false
.
For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry
Transformer takes data from a enriched input topic and transforms this data and writes it into Cloud Storage. There are two type of transformations - Wide row JSON, and Wide row Parquet. When wide row JSON is activated, it only converts event to JSON format. When Wide row Parquet is activated, it converts the event to Parquet format.
module "eh_namespace" {
source = "snowplow-devops/event-hub-namespace/azurerm"
version = "0.1.1"
name = "snowplow-pipeline"
resource_group_name = var.resource_group_name
}
module "enriched_eh_topic" {
source = "snowplow-devops/event-hub/azurerm"
version = "0.1.1"
name = "enriched-topic"
namespace_name = module.eh_namespace.name
resource_group_name = var.resource_group_name
}
module "queue_eh_topic" {
source = "snowplow-devops/event-hub/azurerm"
version = "0.1.1"
name = "queue-topic"
namespace_name = module.eh_namespace.name
resource_group_name = var.resource_group_name
}
module "storage_account" {
source = "snowplow-devops/storage-account/azurerm"
version = "0.1.2"
name = "snowplow-storage"
resource_group_name = var.resource_group_name
}
module "storage_container" {
source = "snowplow-devops/storage-container/azurerm"
version = "0.1.1"
name = "transformer-storage"
storage_account_name = module.storage_account.name
}
module "transformer_service" {
source = "snowplow-devops/transformer-event-hub-vmss/azurerm"
accept_limited_use_license = true
name = "transformer-server"
resource_group_name = var.resource_group_name
subnet_id = var.subnet_id_for_servers
enriched_topic_name = module.enriched_eh_topic.name
enriched_topic_kafka_password = module.enriched_eh_topic.read_only_primary_connection_string
queue_topic_name = module.queue_eh_topic.name
queue_topic_kafka_password = module.queue_eh_topic.read_write_primary_connection_string
eh_namespace_name = module.eh_namespace.name
kafka_brokers = module.eh_namespace.broker
storage_account_name = module.storage_account.name
storage_container_name = module.storage_container.name
window_period_min = 10
ssh_public_key = "your-public-key-here"
ssh_ip_allowlist = ["0.0.0.0/0"]
# Linking in the custom Iglu Server here
custom_iglu_resolvers = [
{
name = "Iglu Server"
priority = 0
uri = "http://your-iglu-server-endpoint/api"
api_key = var.iglu_super_api_key
vendor_prefixes = []
}
]
}
Name | Version |
---|---|
terraform | >= 1.0.0 |
azuread | >= 2.39.0 |
azurerm | >= 3.58.0 |
Name | Version |
---|---|
azuread | >= 2.39.0 |
azurerm | >= 3.58.0 |
Name | Source | Version |
---|---|---|
service | snowplow-devops/service-vmss/azurerm | 0.1.1 |
telemetry | snowplow-devops/telemetry/snowplow | 0.5.0 |
Name | Type |
---|---|
azuread_application.transformer_app_registration | resource |
azuread_application_password.transformer_app_pasword | resource |
azuread_service_principal.transformer_sp | resource |
azurerm_eventhub_consumer_group.enriched_topic | resource |
azurerm_network_security_group.nsg | resource |
azurerm_network_security_rule.egress_tcp_443 | resource |
azurerm_network_security_rule.egress_tcp_80 | resource |
azurerm_network_security_rule.egress_udp_123 | resource |
azurerm_network_security_rule.ingress_tcp_22 | resource |
azurerm_role_assignment.transformer_app_ra | resource |
azuread_client_config.current | data source |
azurerm_resource_group.rg | data source |
azurerm_storage_container.sc | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
enriched_topic_kafka_password | Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for reading is expected) | string |
n/a | yes |
enriched_topic_name | The name of the enriched Event Hubs topic that transformer will pull data from | string |
n/a | yes |
kafka_brokers | The brokers to configure for access to the Kafka Cluster (note: as default the EventHubs namespace broker) | string |
n/a | yes |
name | A name which will be pre-pended to the resources created | string |
n/a | yes |
queue_topic_kafka_password | Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for writing is expected) | string |
n/a | yes |
queue_topic_name | The name of the queue Event Hubs topic that the transformer will push messages to for the loader | string |
n/a | yes |
resource_group_name | The name of the resource group to deploy the service into | string |
n/a | yes |
ssh_public_key | The SSH public key attached for access to the servers | string |
n/a | yes |
storage_account_name | Name of the output storage account | string |
n/a | yes |
storage_container_name | Name of the output storage container | string |
n/a | yes |
subnet_id | The subnet id to deploy the service into | string |
n/a | yes |
window_period_min | Frequency to emit loading finished message - 5,10,15,20,30,60 etc minutes | number |
n/a | yes |
accept_limited_use_license | Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/) | bool |
false |
no |
app_version | Transformer app version to use. This variable facilitates dev flow, the modules may not work with anything other than the default value. | string |
"5.7.5" |
no |
associate_public_ip_address | Whether to assign a public ip address to this instance | bool |
true |
no |
custom_iglu_resolvers | The custom Iglu Resolvers that will be used by Enrichment to resolve and validate events | list(object({ |
[] |
no |
default_iglu_resolvers | The default Iglu Resolvers that will be used by Enrichment to resolve and validate events | list(object({ |
[ |
no |
eh_namespace_name | The name of the Event Hubs namespace (note: if you are not using EventHubs leave this blank) | string |
"" |
no |
enriched_topic_kafka_username | Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) | string |
"$ConnectionString" |
no |
java_opts | Custom JAVA Options | string |
"-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" |
no |
kafka_source | The source providing the Kafka connectivity (def: azure_event_hubs) | string |
"azure_event_hubs" |
no |
queue_topic_kafka_username | Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) | string |
"$ConnectionString" |
no |
ssh_ip_allowlist | The comma-seperated list of CIDR ranges to allow SSH traffic from | list(string) |
[ |
no |
tags | The tags to append to this resource | map(string) |
{} |
no |
telemetry_enabled | Whether or not to send telemetry information back to Snowplow Analytics Ltd | bool |
true |
no |
transformer_compression | Transformer output compression, GZIP or NONE | string |
"GZIP" |
no |
user_provided_id | An optional unique identifier to identify the telemetry events emitted by this stack | string |
"" |
no |
vm_sku | The instance type to use | string |
"Standard_B2s" |
no |
widerow_file_format | The output file_format from the widerow transformation_type selected (json or parquet) | string |
"json" |
no |
Name | Description |
---|---|
nsg_id | ID of the network security group attached to the Transformer Server nodes |
vmss_id | ID of the VM scale-set |
Copyright 2023-present Snowplow Analytics Ltd.
Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)