Site Reliability Engineer lll
Develop software and software fixes to integrate internal systems. Ensure code quality, test and distribute code updates, and monitor the health and stability of the servers.
What you'll do:
- Meet and beat Key Performance Indicators, SLAs, maintain an error budget and adhere to it.
- Identify, evaluate, and execute preventative measures to minimize and avoid impact to the customer experience
- Employ deep troubleshooting skills to improve the availability, performance, and security for CR and Emburse, ensure services are designed with 24/7 availability and operational readiness and rigor
- Coding and Automation of Applications on Cloud Platforms
- Work with Engineering leadership to build shared services that meet the requirements and need of the platform and application teams
- Work with Cloud Platform and Operations leaders to develop narratives, backlog grooming, epic planning and overall sprint planning processes
- Ensure the platform holds a high degree of reliability, at least four 9s.
- Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems
- Own technically intricate issues that cross between DevOps, Databases, Networking, Code, Infrastructure and people; drive them to satisfactory completion.
- Work closely with product different stakeholders to align Operational priorities and planning with the product and engineering roadmap
- Prepare and present engineering related documents to key stakeholders
- Provide recommendations and feedback in review sessions, design reviews and review sessions.
- Mentor SRE I and II’s
- Assist guiding more junior engineers in best practices
- Conduct and assist with investigation, test and deployment activities, identify and mitigate risks in development activities
What we're looking for:
- Bachelor’s degree in Computer Science or a STEM field required
- Minimum of 7 years’ experience in an engineering role required
- Deep understanding of infrastructure as code, scripting, self-healing, containers, DevOps tooling, distributed systems higly desired
- Experience working with Ansible and Terraform tools hightly desirable
- Excellent written and verbal communication skills, in English
- Experience with full lifecycle of SaaS implementations as well as Infrastructure as code
- Excellent follow-up and project management skills
- Proven ability to create and maintain new tools
- Excellent troubleshooting skills
- Excellent technical skills. Up to 70% of the job is hands on in a distributed Linux environment
- Strong scripting skills. OOP is a plus
- Liaise between other teams to help prioritize and align priorities
- Experience working with an off shore team