Elastic

Senior SRE - Platform (Managed Kubernetes Infrastructure)

Canada

Hybrid

SRE

Posted 5 days ago

Golang Kubernetes Docker Terraform Crossplane AWS GCP Azure Linux Elastic Stack Prometheus InfluxDB

Job Description

Role Overview

As a Site Reliability Engineer within Platform Engineering, you'll help design, build, and scale Elastic's multi-cloud platform that powers internal services, Elastic Cloud Hosted, and Serverless offerings. You'll drive reliability, automation, and operational excellence while developing tools and infrastructure that support large-scale distributed systems.

Responsibilities

Lead technical initiatives focused on infrastructure automation and platform reliability.
Develop and maintain software, tooling, and automation to support platform growth.
Scale and improve multi-cloud infrastructure to meet increasing business demands.
Collaborate with engineering teams to enhance operational excellence and system reliability.
Respond to major incidents and implement long-term solutions to prevent recurring issues.
Participate in a follow-the-sun on-call rotation.
Drive continuous improvements across platform operations and customer experience.

Requirements

Strong Site Reliability Engineering (SRE) mindset with a focus on reliability and customer experience.
Software engineering background with experience building automation and infrastructure solutions.
Experience with Golang or similar programming languages.
Hands-on experience with public cloud platforms.
Experience managing Kubernetes infrastructure at scale.
Strong understanding of distributed systems and platform operations.
Excellent communication and collaboration skills.
Experience working in remote or globally distributed teams.

Preferred Qualifications

Experience operating SaaS products in public cloud environments.
Hands-on experience with Infrastructure as Code tools such as Terraform or Crossplane.
Experience managing Kubernetes environments across multiple cloud providers.
Experience with Docker and containerized applications.
Experience with monitoring and observability tools such as Elastic Stack, Prometheus, or InfluxDB.
Strong Linux system administration skills.
Experience with incident management and alerting systems.
Experience mentoring and supporting engineering teams.

Technologies

Golang, Kubernetes, Docker, Terraform, Crossplane, AWS, GCP, Azure, Linux, Elastic Stack, Prometheus, InfluxDB

Benefits

Competitive compensation package
Employee stock program eligibility
Retirement savings plan with employer matching
Comprehensive health and wellness benefits
Flexible work environment
Generous paid time off and parental leave
Volunteer and community contribution programs

Tech Stack Required

Golang

Kubernetes

Docker

Terraform

Crossplane

AWS

GCP

Azure

Linux

Elastic Stack

Prometheus

InfluxDB

Similar Jobs