Snowplow is a highly customisable behavioural data platform, and blessed be that company since their code is open-source (amen).
For some reason, the only guide for implementing Snowplow on Google Cloud Platform was written in 2019, so I think it’s time for an update. Most of the content below was shamefully lifted directly from the aforementioned article.
You should have your GCP account with billing enabled.
Host your Snowplow JS tracker file somewhere
- Register in Search Console your prefered domain name that host the JS file; once done, you can delete the record that was used to verify
- Follow this guide
Tag up your site
- Do your thing in GTM
Enable the required services
Go ahead and switch on:
- Compute Engine API
- Cloud Pub/Sub API
Install Google Cloud CLI
It makes interacting with GCP easier.
Service account
If you’ve used Compute Engine before, your should already a powerful service account set up for you. Otherwise, you can quickly set one up for yourself.
Set up Pub/Sub topics
Create topics called “good”, “bq-failed-inserts”, “bq-types”, “enriched-good”, and “bad”, though only the first four need subscriptions.
Create the config files
Four files you’ll need, at the minimum. They are:
- Stream collector config
- Enricher config
- Loader config
- Iglu resolver config Store them all in cloud storage, for you’ll need them later.
Create a HTTPS endpoint that connects to stream collector
Create your stream collector template
- Go to Compute Engine section
- Create instance template
- Choose “Set access for each API”
- Enable Cloud Pub/Sub
- Under Firewall, select “Allow HTTP traffic”
- Expand “Advance options”
- Expand “Management”
Under “Automation”, fill in the script below:
#! /bin/bash sudo apt-get update sudo apt-get -y install default-jre sudo apt-get -y install unzip sudo apt-get -y install wget wget “” gsutil cp gs://your-bucket/your-stream-collector-config . java -jar snowplow-stream-collector-google-pubsub-2.8.2.jar –config your-stream-collector-config
- Expand “Networking”
- In “Network tags”, add
- Click “Create”
Add firewall rule
- In VPC network section, go to “Firewall”
- In “Target tags”, type
- In “Source IPv4 ranges”, type
- Tick TCP, then type
- Click “Create”
Create a health check in Compute Engine
- Protocol: HTTP
- Port: 8080
- Request path: /health
- Check interval: 10 seconds
- Unhealthy threshold: 3 consecutive failures
Create stream collector instance group
- In “Instance template”, select the template you created
- In “Health check”, select the health check you created
- Click “Create”
Create a load balancer
- In “Network services”, create a load balancer
- Select “HTTP(S) Load Balancing”
- On next screen, keep things kosher
- For Protocol, change to HTTPS
- In IP address, click “CREATE IP ADDRESS”, then reserve one for yourself
- In Certificate, create a new one, choose Google-managed, then fill in the domain of your tracker
- For the backend, select that stream collector instance group you’ve got, then put in 8080 port number
- Scroll down, select the Health check you’ve created
- Once done, in your tracker domain DNS configuration, add an “A” record that points to the static IP you reserved