Site Reliability Engineer (Middle / Senior)
Bitquery is an API-first product company dedicated to powering and solving blockchain data problems using ground truth, and on-chain data. Bitquery extracts and presents valuable data via APIs. These APIs are delivering solutions to multiple verticals like Decentralize Finance (DeFi), DEX Arbitrage Analytics, Crypto Surveillance & Forensics across all major blockchains like Bitcoin, Ethereum, EOS, and Tezos.
We are an international company of developers of software for the analysis of decentralized data (40+ chains). Bitquery is a distributed team. Currently, are looking for a full-time SRE engineer to further develop/monitor/support the infrastructure, and automation of various processes. Also, you can be on duty with shift time.
Roles & Responsibilities:
- Ensuring the smooth operation of software, environments and company services
- Analyzing and improving the performance and availability of products
- Identification of bottlenecks in the architecture and in the infrastructure
- Improvement of system alerting and incident management
- Improvements of the monitoring systems based on SLI (Prometheus, Icinga, Grafana etc.)
- Formalization of SLI under the main business requirements
- Formation of SLO for services and infrastructure in general
- Minimization of system recovery time (RPO and RTO)
- Analysis of incidents in the prod environment
- Capacity management
Requirements
- 5+ years of work experience implementing, troubleshooting, and supporting infrastructure software and distributed systems
- Develop in one or more languages (Golang, python, ruby) for at least 2 years
- Worked with virtualization and containerization technologies (containerd, docker, k8s) for more than 2 years
- Set up CI of varying complexity (Jenkins) with CD to different environments
- Experience in creating and maintaining a fault-tolerant system, with log coverage, monitoring, and alerting
- Understanding the principle of "infrastructure as code" and the ability to test it (Ansible Terraform)
- Principles of organizing network security (IPsec, WAF, IPS)
Our Tech Stack:
- Infrastructure: Bare-metal / AWS
- Databases: Clickhouse / MySQL
- SCM: git / GitHub
- Message broker: Kafka
- Repository: Nexus
- CI/CD: Jenkins
- Monitoring: Icinga 2, Grafana, Prometheus, Victoria metrics, ELK
- Orchestration: k8s, Ansible, Terraform
- Containers: LXC, Docker
- Scripting: Python, Golang, Ruby, Groovy
- OS: Debian/Ubuntu
- Others: Docker compose, IPSec
Benefits
- Opportunity to work & collaborate with a truly global team spread across 5 countries
- Work from anywhere in the world
- Choose your own work hours
- Yearly trip with Bitquery team to any remote destination
- A promise to finish the interview processes within 1-2 weeks
Being a startup we take decisions & move fairly fast, while giving candidates great experience with the interview process. We have a flat hierarchy in the organization where we empower individuals and provide an opportunity to deliver results as per his/her working style. Come and join a great culture and build Bitquery with us.