GI
GitLab
Senior Site Reliability Engineer, Tenant Services: Geo
Job Description
Role Overview
As an SRE on the Tenant Services, Geo team, you'll keep GitLab's production systems running smoothly with a focus on Geo — GitLab's data replication and disaster recovery feature. You'll execute Dedicated customer migrations end-to-end, improve tooling and automation, and work closely with the core Geo team, Dedicated migrations, and Support to make cutovers faster, safer, and more predictable.
Responsibilities
- Execute Dedicated Geo migrations and cutovers end-to-end — planning, pre-cutover validation, execution, and post-cutover verification
- Participate in shift and weekend coverage rotation for Dedicated cutovers across EMEA and US hours
- Join the SaaS SRE on-call rotation to respond to incidents impacting GitLab.com availability
- Handle environment preparation, data hygiene checks, replication, and Geo-related escalations from Support
- Design, build, and maintain automation, tooling, and runbooks to make migrations repeatable and reliable
- Run infrastructure using Ansible, Chef, Terraform, GitLab CI/CD, and Kubernetes
- Build and maintain monitoring, alerting, and dashboards to detect issues early and track migration SLOs
- Contribute to incident reviews, root cause analyses, and readiness reviews
- Document all actions — runbooks, architecture decisions, and post-incident reviews
- Proactively identify and reduce toil by automating repetitive operational work
Requirements
- Experience operating highly-available distributed systems at scale in a SaaS environment
- Hands-on experience with GCP or AWS — networking, storage, and managed services
- Experience with Kubernetes and its ecosystem (Helm), including deploying and troubleshooting workloads
- Experience with IaC and configuration management tools — Terraform, Ansible, or Chef
- Strong programming skills in Go or Ruby, and scripting proficiency in Shell or Python
- Experience with observability systems — Prometheus, Grafana, logging stacks
- Practical exposure to data replication, backup/restore, or migration scenarios
- Comfortable with on-call rotations and driving follow-through on corrective actions
- Ability to engage directly with enterprise customers during migrations and incidents
- Strong written and verbal communication skills with a bias toward async documentation
Good to Have
- Experience with disaster recovery technologies
- Experience with compliance-sensitive environments — SOC2, ISO
- Prior work on large-scale data migrations or cutovers
- Hands-on experience with PostgreSQL or AWS RDS replication and cutover workflows
- Familiarity with multi-tenant architectures or GitLab itself
Benefits
- Flexible Paid Time Off
- Equity Compensation and Employee Stock Purchase Plan
- Growth and Development Fund
- Parental Leave
- Home Office Support
Tech Stack Required
Ansible
Chef
Terraform
GitLab CI/CD
Kubernetes
Similar Jobs