Site Reliability Engineer

Posted May 5

Pelago is the world’s leading virtual clinic for Substance Use Management. Our program provides guidance, support and treatment for members seeking to overcome their tobacco, alcohol and opioid use. From unhealthy habits to active substance use disorders, Pelago delivers a personalized solution based on individual health, habits, genetics, and goals, providing care for members wherever they might be on the substance use spectrum.

Pelago's suite of virtual services ranges from education, to cognitive behavioral therapy (CBT) to comprehensive medication-assisted treatment (MAT). Pelago enables employers and health plans to deliver accessible, affordable, and effective treatment for substance misuse.

Pelago has scaled to helping hundreds of employers and health plans and has already helped more than 750,000 members manage their substance use better. We have recently closed our Series C and raised over $151m from leading global investors. If you are passionate about making an impact on the health of others, join us and make it happen!

Role Overview:

At Pelago, we run a serverless architecture on AWS, with infrastructure managed using Terraform. Our system has been built to deliver our virtual clinic for Substance Use Management, and we are looking for a talented Site Reliability Engineer to join the engineering team supporting Pelago. As a HIPAA compliant, HITRUST certified organization it is essential that our architecture is built in compliance with information security and data privacy requirements. Experience and knowledge of security best practices in the context of AWS is essential.

In this role, you will...

Maintain Pelago’s system built on AWS
Develop a deep understanding of the development workflow at Pelago
Be responsible for the planning, implementation, and growth of the AWS cloud infrastructure
Troubleshoot issues on our platform, find the root cause, and if required, interface with engineering teams to resolve
Monitor application metrics to proactively raise issues to the relevant engineering functional team
Own the reliability, availability, latency, performance and capacity planning of the Pelago environment
Perform incident response and blameless post-mortems
Implement infrastructure as code for provisioning, configuration and deployment using Terraform
Build, release, and manage the configuration of all production systems
Conduct load testing to identify bottlenecks before they impact customers
Work alongside our developers to drive automation, maximize efficiency and improve reliability
Occasional on-call shift required on a rotational basis

The background we're looking for...

2+ years experience working in SRE or reliability focused production engineering roles identifying application problems from monitoring, health checks and application performance
A solid understanding of supporting AWS, serverless architecture, and Terraform
Experience with building / maintaining platforms that adhere to security standards working on a system that has scaled to 50m+ users
Proficiency in script development and scripting languages
Strong troubleshooting background with experience in identifying and remediating issues
Team player mentality with strong communication and collaboration skills

The provided range reflects our US target salary range for this full-time position, which is part of our broader total compensation package, including stock options, comprehensive benefits, and incentive pay applicable to eligible roles. Individual pay within the range will vary based on a variety of factors like role-related experience and education, internal pay equity, and other relevant business factors. At Pelago, we are committed to an equitable and fair pay philosophy and review total compensation for our employees at least twice a year.

Pay Range

$120,000—$170,000 USD