As a Site Reliability Engineer, you will be responsible for the operation and maintenance of our client’s hybrid infrastructure in three data centres and on the Amazon Web Services. This spans hundreds of database instances, CentOS-based virtualization, Kubernetes production clusters, and more. You will join a team that implements infrastructure as code approach and uses modern infrastructure management technologies at large scale in mission critical production environments.
Skills:
- At least 3-5 years of experience as a DevOps / SRE
- Strong Linux skills and Linux OS fundamentals.
- Experience in automation, CI and CD (Jenkins and Ansible)
- Experience with cloud infrastructure management tools: CloudFormation, Terraform
- Knowledge working with Cassandra/MySQL/Postgres
- Scalable networking technologies such as Load Balancers (HAProxy)
- Familiarity with Python/Bash/Golang or other scripting languages
- Deep understanding of the Kubernetes architecture
- Strong monitoring experience (Grafana/Prometheus/CheckMK)
- Solid troubleshooting skills and networking knowledge
- Extensive knowledge of AWS is a plus
- RHCE, RHCA, and Kubernetes certifications are beneficial, but not required
- Detail-oriented, self-driven, with excellent communication skills
- Good oral and written English