GitLab

Senior Site Reliability Engineer, Tenant Services: Geo

Remote, India
Remote
SRE
Posted 09 May 2026
Job Description

Role Overview

As an SRE on the Tenant Services, Geo team, you'll keep GitLab's production systems running smoothly with a focus on Geo — GitLab's data replication and disaster recovery feature. You'll execute Dedicated customer migrations end-to-end, improve tooling and automation, and work closely with the core Geo team, Dedicated migrations, and Support to make cutovers faster, safer, and more predictable.

Responsibilities

  • Execute Dedicated Geo migrations and cutovers end-to-end — planning, pre-cutover validation, execution, and post-cutover verification
  • Participate in shift and weekend coverage rotation for Dedicated cutovers across EMEA and US hours
  • Join the SaaS SRE on-call rotation to respond to incidents impacting GitLab.com availability
  • Handle environment preparation, data hygiene checks, replication, and Geo-related escalations from Support
  • Design, build, and maintain automation, tooling, and runbooks to make migrations repeatable and reliable
  • Run infrastructure using Ansible, Chef, Terraform, GitLab CI/CD, and Kubernetes
  • Build and maintain monitoring, alerting, and dashboards to detect issues early and track migration SLOs
  • Contribute to incident reviews, root cause analyses, and readiness reviews
  • Document all actions — runbooks, architecture decisions, and post-incident reviews
  • Proactively identify and reduce toil by automating repetitive operational work

Requirements

  • Experience operating highly-available distributed systems at scale in a SaaS environment
  • Hands-on experience with GCP or AWS — networking, storage, and managed services
  • Experience with Kubernetes and its ecosystem (Helm), including deploying and troubleshooting workloads
  • Experience with IaC and configuration management tools — Terraform, Ansible, or Chef
  • Strong programming skills in Go or Ruby, and scripting proficiency in Shell or Python
  • Experience with observability systems — Prometheus, Grafana, logging stacks
  • Practical exposure to data replication, backup/restore, or migration scenarios
  • Comfortable with on-call rotations and driving follow-through on corrective actions
  • Ability to engage directly with enterprise customers during migrations and incidents
  • Strong written and verbal communication skills with a bias toward async documentation

Good to Have

  • Experience with disaster recovery technologies
  • Experience with compliance-sensitive environments — SOC2, ISO
  • Prior work on large-scale data migrations or cutovers
  • Hands-on experience with PostgreSQL or AWS RDS replication and cutover workflows
  • Familiarity with multi-tenant architectures or GitLab itself

Benefits

  • Flexible Paid Time Off
  • Equity Compensation and Employee Stock Purchase Plan
  • Growth and Development Fund
  • Parental Leave
  • Home Office Support
Tech Stack Required
Ansible
Chef
Terraform
GitLab CI/CD
Kubernetes