Site Reliability Engineer (Middle / Senior)

Posted Feb 25

Bitquery is an API-first product company dedicated to powering and solving blockchain data problems using ground truth, and on-chain data. Bitquery extracts and presents valuable data via APIs. These APIs are delivering solutions to multiple verticals like Decentralize Finance (DeFi), DEX Arbitrage Analytics, Crypto Surveillance & Forensics across all major blockchains like Bitcoin, Ethereum, EOS, and Tezos.

We are an international company of developers of software for the analysis of decentralized data (40+ chains). Bitquery is a distributed team. Currently, are looking for a full-time SRE engineer to further develop/monitor/support the infrastructure, and automation of various processes. Also, you can be on duty with shift time.

Roles & Responsibilities:

  • Ensuring the smooth operation of software, environments and company services
  • Analyzing and improving the performance and availability of products
  • Identification of bottlenecks in the architecture and in the infrastructure
  • Improvement of system alerting and incident management
  • Improvements of the monitoring systems based on SLI (Prometheus, Icinga, Grafana etc.)
  • Formalization of SLI under the main business requirements
  • Formation of SLO for services and infrastructure in general
  • Minimization of system recovery time (RPO and RTO)
  • Analysis of incidents in the prod environment
  • Capacity management

Requirements

  • 5+ years of work experience implementing, troubleshooting, and supporting infrastructure software and distributed systems
  • Develop in one or more languages (Golang, python, ruby) for at least 2 years
  • Worked with virtualization and containerization technologies (containerd, docker, k8s) for more than 2 years
  • Set up CI of varying complexity (Jenkins) with CD to different environments
  • Experience in creating and maintaining a fault-tolerant system, with log coverage, monitoring, and alerting
  • Understanding the principle of "infrastructure as code" and the ability to test it (Ansible Terraform)
  • Principles of organizing network security (IPsec, WAF, IPS)

Our Tech Stack:

  • Infrastructure: Bare-metal / AWS
  • Databases: Clickhouse / MySQL
  • SCM: git / GitHub
  • Message broker: Kafka
  • Repository: Nexus
  • CI/CD: Jenkins
  • Monitoring: Icinga 2, Grafana, Prometheus, Victoria metrics, ELK
  • Orchestration: k8s, Ansible, Terraform
  • Containers: LXC, Docker
  • Scripting: Python, Golang, Ruby, Groovy
  • OS: Debian/Ubuntu
  • Others: Docker compose, IPSec

Benefits

  • Opportunity to work & collaborate with a truly global team spread across 5 countries
  • Work from anywhere in the world
  • Choose your own work hours
  • Yearly trip with Bitquery team to any remote destination
  • A promise to finish the interview processes within 1-2 weeks

Being a startup we take decisions & move fairly fast, while giving candidates great experience with the interview process. We have a flat hierarchy in the organization where we empower individuals and provide an opportunity to deliver results as per his/her working style. Come and join a great culture and build Bitquery with us.