Senior Site Reliability Engineer

Posted Aug 1

Be Part of Building the Future

Dremio is the SQL Lakehouse company, enabling companies to leverage open data architectures. Dremio's SQL Lakehouse Platform simplifies data engineering and eliminates the need to copy and move data to proprietary data warehouses or create cubes, aggregation tables and BI extracts, providing flexibility and control for data architects and data engineers, and self-service for data consumers. Founded in 2015, Dremio is headquartered in Santa Clara, CA. Investors include Cisco Investments, Insight Partners, Lightspeed Venture Partners, Norwest Venture Partners, Redpoint Ventures, and Sapphire Ventures. For more information, visit www.dremio.com. Connect with Dremio on GitHub, LinkedIn, Twitter, and Facebook.

If you, like us, say "bring it on" to exciting challenges that really do change the world, we have endless opportunities where you can make your mark.

About the role

We're looking for a Senior SRE, who will be involved in exciting technical challenges by analyzing, troubleshooting, and designing vital services, platforms, and infrastructure while always thinking about reliability, scalability, resilience, security, and performance. We believe that our role is to be continually learning: improving our understanding of and ability to safely operate our service and provide the best possible experience for our users. Psychological safety and a blameless culture are critical components of our SRE culture.

What you'll be doing

  • Evangelize and advocate for reliability practices across our organization
  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and production readiness reviews
  • Ability to debug and optimize code and automate routine tasks: reduce toil
  • Analyze and optimize our core product by developing and implementing reliability and performance practices
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity
  • Be on-call for production services
  • Practice sustainable incident response and blameless retrospectives

What we're looking for

  • 5+ years of relevant experience in the following areas: SRE, DevOps, Cloud Operations, Systems Engineering, or Software Engineering
  • BS/MS/PhD in Computer Science or related field
  • Excellent command of cloud services on AWS/GCP/Azure, Kubernetes and CI/CD pipelines
  • Experience with monitoring/alerting (Prometheus, Thanos, Victoria Metrics, Grafana, vmrules)
  • Have moderate-advanced experience in Java, C, C++, Python, Go or other object-oriented programming languages
  • You are Interested in designing, analyzing and troubleshooting large-scale distributed systems
  • You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
  • You have a great ability to debug and optimize code and automate routine tasks
  • You have a solid background in software development and architecting resilient and reliable applications
  • You are a good communicator and comfortable working with other engineers across the organization

Bonus points if you have

  • Experience being on-call for an internet facing production system
  • Expertise in k8s, helm, yaml, GitOps, ArgoCD, Distributed Tracing (Lightstep, Honeycomb, OpenTelemetry), k8s resource management (e.g. kubecost)

What we offer

  • Medical, dental and vision insurance
  • 401(k) Plan
  • Short term / long term disability and life insurance
  • Pre-IPO stock options
  • Flexible PTO
  • 16 hours of volunteer time off
  • 12 company paid holidays, including Juneteenth
  • Remote work options
  • Monthly "Get Stuff Done" (GSD) Days
  • Paid parental leave
  • Employee Assistance Program (EAP)
  • Quarterly swag surprise

**Certain benefits are only allowed to full-time Dremio employees and may not be the same across all locations.

#LI-Remote #LI-KL1

What we value

At Dremio, we hold ourselves to high standards when it comes to People, Thinking, and Action. Our Gnarlies (that's what we call our employees) communicate with clarity, drive accountability, and are respectful towards each other. We confront brutal facts and focus on results while operating with a sense of urgency and building a "flywheel". People who like to jump in and drive momentum will thrive in our #GnarlyLife.

Dremio is an equal opportunity employer supporting workforce diversity. We do not discriminate on the basis of race, religion, color, national origin, gender identity, sexual orientation, age, marital status, protected veteran status, disability status, or any other unlawful factor.

Dremio is committed to providing any necessary accommodations for individuals with disabilities within our application and interview process. To request accommodation due to a disability, please inform your recruiter.

Dremio has policies in place to protect the personal information that employees and applicants disclose to us. Please click here to review the privacy notice.