Our Grafana Cloud pipeline moves millions of data points, log lines, and traces per second from our customers' environments into a highly available, low-latency stack that processes and stores the data, and serves it to dashboards and alerting tools. We aim to grow this to hundreds of millions per second, and it's critical that as we grow, we improve our performance, increase our reliability, and do it all more efficiently.
Cloud roles at Grafana Labs require engineers with a passion for performance and reliability, and who enjoy taking projects from conception to production. Grafana Cloud hosts services in Kubernetes. The Cloud Platform team owns and maintains the platform delivering Kubernetes and its required complementary services, including our release and deployment tools and services to Grafana Engineering, as well as designing, implementing and maintaining the virtual network infrastructure.
Because we deploy production services, we have on-call rotations to ensure the health of the system. We dogfood our own services so being on call is an important way to understand our system and how to use the products we create.
Our culture is one of remote-first, and our engineering organization is largely remote. We provide guidance and meet regularly using video calls, and we need people who can work independently and can communicate well. Even if you are located near one of our small offices, working from home is both common and encouraged. Our teams also plan in-person team building meetups and also gather to attend industry conferences.
We care deeply about open source and the projects generally are open source, check them out: https://github.com/grafana.
About the role:
We are looking for an experienced software or site reliability engineer to join the Grafana Labs R&D team. We are hiring for the Cloud Platform team that provides the platform on which Grafana Cloud delivers its services.
- Maintain and improve Grafana Labs’ provisioning, release and deployment tools and processes for infrastructure and services
- Provision and administer the core infrastructure platform, Kubernetes
- Provision and administer the required Cloud Service Provider resources
- Maintain and improve Grafana Labs’ monitoring tools and practices to maximise system uptime and health
- Work with other engineering teams to help them plan and optimise their use of Cloud Service Provider resources
- Commercial experience as a site reliability, network and/or software engineer in Cloud and Linux environments, especially with distributed architectures
- Programming experience - we use Go, Jsonnet, Python and Shell
- Experience with containers and orchestration - we use Docker and Kubernetes
- Proficiency with infrastructure as code and/or configuration management - we use Terraform and Tanka/Jsonnet
- Experience with dashboards and monitoring tools like Grafana and Prometheus
Nice to have:
- Commercial experience in designing and managing networking in a Virtual Private Cloud
- Commercial experience of network services, including load balancers, firewalls and DNS
- Experience working in remote and/or distributed business environments, demonstrating self motivation and communication skills
- Commercial experience in Cloud Service Provider utilisation management, and pricing and discount structures
- Flexible hours
- The equipment you need to get the job done
- Generous vacation policy of 30 days per annum with national holidays in your country of residence on top
- Grafana operates in 32+ countries. We try to operate as one team and focus on global benefits which our whole team can enjoy. Inevitably there are some regional variations and we discuss the benefits offered in your country of residence through our interview process.
- We offer a competitive healthcare plan (Medical, Dental & Vision) for our US based employees via our co-employer JustWorks.
- We offer a 4% employer contribution match on our 401K/pension plans or a one time 4% salary increase after 6 months tenure depending on your location
Our hiring process:
- Video chat with one of our Talent Managers (30 mins)
- Video chat with 2 Hiring Managers (30 mins)
- Live Coding Interview with 2 Engineers (60 mins)
- Systems Design focused interview (45 mins)
About Grafana Labs: There are more than 950,000 active installations of Grafana around the globe, monitoring everything from beehives to climate change in the Alps. The instantly recognizable dashboards have been spotted everywhere from a NASA launch and Minecraft HQ to Wimbledon and the Tour de France. Grafana Labs also helps companies including Bloomberg, JPMorgan Chase, and eBay manage their observability strategies with full-stack offerings that can be run fully managed with Grafana Cloud, or self-managed with Grafana Enterprise Stack. The Grafana stack has grown to include four other open source projects, Grafana Loki (for logs), Grafana Tempo (for traces), Grafana Mimir (for metrics), and Grafana OnCall (for on-call management).
Benefits: For more information about the perks and benefits of working at Grafana, please check out our careers page.
A note about covid-19: All Grafanistas who wish to attend in-person events or travel for Grafana Labs must be fully-vaccinated.
Equal Opportunity Employer: At Grafana Labs we’re building a company where a diverse mix of talented people want to come, stay, and do their best work. We know that our company runs on the hard work and the dedication of our passionate and creative employees.
We will recruit, train, compensate and promote regardless of race, religion, colour, national origin, gender, disability, age, veteran status, and all the other fascinating characteristics that make us different and unique. We believe that equality and diversity builds a strong organisation and we’re working hard to make sure that’s the foundation of our organisation as we grow.
Job Type: Full-time