Site Reliability Engineer lll

Posted Sep 14

Develop software and software fixes to integrate internal systems. Ensure code quality, test and distribute code updates, and monitor the health and stability of the servers.

What you'll do:

Meet and beat Key Performance Indicators, SLAs, maintain an error budget and adhere to it.
Identify, evaluate, and execute preventative measures to minimize and avoid impact to the customer experience
Employ deep troubleshooting skills to improve the availability, performance, and security for CR and Emburse, ensure services are designed with 24/7 availability and operational readiness and rigor
Coding and Automation of Applications on Cloud Platforms
Work with Engineering leadership to build shared services that meet the requirements and need of the platform and application teams
Work with Cloud Platform and Operations leaders to develop narratives, backlog grooming, epic planning and overall sprint planning processes
Ensure the platform holds a high degree of reliability, at least four 9s.
Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems
Own technically intricate issues that cross between DevOps, Databases, Networking, Code, Infrastructure and people; drive them to satisfactory completion.
Work closely with product different stakeholders to align Operational priorities and planning with the product and engineering roadmap
Prepare and present engineering related documents to key stakeholders
Provide recommendations and feedback in review sessions, design reviews and review sessions.
Mentor SRE I and II’s
Assist guiding more junior engineers in best practices
Conduct and assist with investigation, test and deployment activities, identify and mitigate risks in development activities

What we're looking for:

Bachelor’s degree in Computer Science or a STEM field required
Minimum of 7 years’ experience in an engineering role required
Deep understanding of infrastructure as code, scripting, self-healing, containers, DevOps tooling, distributed systems higly desired
Experience working with Ansible and Terraform tools hightly desirable
Excellent written and verbal communication skills, in English
Experience with full lifecycle of SaaS implementations as well as Infrastructure as code
Excellent follow-up and project management skills
Proven ability to create and maintain new tools
Excellent troubleshooting skills
Excellent technical skills. Up to 70% of the job is hands on in a distributed Linux environment
Strong scripting skills. OOP is a plus
Liaise between other teams to help prioritize and align priorities
Experience working with an off shore team