Sr DevOps Engineer
Description
Are you ready to join an established (but also growing and innovating!) tech company that’s making waves in the eLearning space? Are you ready to improve the continuing education process for millions of nurses, physicians, surgeons, attorneys, and other professionals across the country? We’re looking for a talented and experienced fully remote Senior DevOps Engineer for our Path LMS team.
A Little About Our Architecture
- Ruby / Rails
- Postgresql / Redis / MongoDB
- Sidekiq
- Memcache
- Elasticsearch
- AWS / Terraform
- Docker / Kubernetes / Qovery
- Datadog, Sentry, Corologix for logging/error handling
- Atlassian suite of productivity and project management tools (JIRA, OpsGenie, etc.)
- Github for source control
- Heroku PaaS for app, DB, etc.
What You’ll Be Doing
We have some significant challenges in scaling our application due to the unique traffic flow of multiple, large live-streamed events occurring nearly daily. We understand that the root of many of these scaling issues is buried within our code base, but you’ll be able to help us scale in the meantime in order to give us plenty of runways to fix the code. We also need help in migrating our cloud infrastructure and apps off of Heroku to AWS utilizing Terraform and Qovery with assistance from our MAP Partner.
Beyond just our immediate needs, you’ll help us scale and fine-tune our architecture, infrastructure, and monitoring. You’ll use your skills to help us expand our capabilities in automating and developing tooling to support our developers as we grow our team even more. We have multiple exciting initiatives on the horizon (like deploying Kubernetes, database scaling, and streamlining our CI/CD infrastructure), and we need a passionate and experienced pro who can help Blue Sky scale and improve through infrastructure and automation.
Here’s a rough run-down of responsibilities:
- Migrate our cloud-based infrastructure for existing and new applications, including our core application and supporting microservices to AWS.
- Develop automated solutions to monitor and alert on performance & stability in our cloud systems.
- Run the DevOps Guild to partner with engineers to implement and improve the current DevOps processes and identify cross-project improvements.
- Champion & implement CI/CD best practices as well as monitor for failures and enforce best practices.
- Help set standards for services and software to streamline test and release cycles and improve system maintenance.
- Support our SDLC through automation, tooling, and monitoring and help build and maintain comprehensive documentation of our infrastructure and tools.
- Constantly reviewing and updating our infrastructure to ensure we are scalable and handling end user demand.
- Collaborate with the support team and engineers to troubleshoot production alerts and both addressing in the short term and preventing in the long term.
- Constantly update alert thresholds to help identify problems and reduce noise.
- Lead and coach the team on how to better monitor solutions, to ensure we have a full understanding of how features/systems are performing.
- Creating and modifying dashboards to show overall platform health.
- Ensure frameworks, and dependencies are up to date and have correct open-source licenses.
- Participate in project planning meetings to share your point of view of system options, impact, risk, and costs vs. benefits. Communicate current operational requirements and development predictions.
- Organize and participate in on-call duties for production issues. Don’t worry – you won’t be on-call all the time for all the things. We simply ask that you be included in the rotation. We activate our on-call system maybe twice a year as most incidents occur during business hours.
Requirements
- 5+ years of experience in DevOps, specifically for web-based SaaS products
- 3+ years of experience with AWS
- 1+ years of experience with a modern Cloud Infrastructure Platform (Qovery, etc.)
- Proficiency with Kubernetes
- 3+ years of experience with Docker
- 3+ years of experience with CircleCI or other industry-leading CI/CD
- Experience working with and scaling PostgreSQL, Redis, Memcache, and MongoDB
- Strong understanding of Information Technology operations, infrastructure, and application architecture principals
- Strong experience with observability tools, APM tools, and cloud monitoring tools
- Experience migrating between cloud providers and/or building multi-cloud environments
- Experience developing and executing disaster recovery plans and various methods to thwart or minimize the impact of outages and potential incidents.
- Concrete understanding of DevSecOps principles
The Nice-To-Haves (but not required):
- Experience deploying and scaling monolithic web applications built using the Ruby on Rails framework.
- Experience deploying, monitoring, and operating applications deployed on the Heroku PaaS.
- Experience with database administration, specifically Postgresql and maybe a little MongoDB
- Experience monitoring and scaling background services built with Sidekiq Pro and using Redis as a FIFO queue for jobs.
Our Recruitment Process
- 15-minute Initial Call
- 20-minute take-home skills test
- 30-minute Call with Recruiter (project, benefits, etc.)
- Interviews directly with the client (depending on the project the # of interviews may vary, this may include a code assessment)
- Final Offer
Benefits
- Work Remote Monday - Friday, 40 hours a week (no weekends)
- Vacation: 10 business days a year
- Holidays: 5 National Holidays a year
- Company Holidays: 5 Company Holidays a year (Christmas Eve, Christmas Day, New Years Eve, New Years Day, Zipdev Day)
- Parental Leave
- Health Care Reimbursement
- Active Lifestyle Reimbursement
- Quarterly Home Office Reimbursement
- Payroll Deduction Purchase Plans
- Longevity Bonus
- Continuous Learning Bonus
- Access to Training and Professional Development Platforms
- Did we mention it's REMOTE?!!