We are looking for a smart and passionate Site Reliability Engineer (m/f/d) to join our team. For this position, we are currently hiring someone who is willing to work remotely in either Germany or Portugal.
The Reliability Engineering team is one of the engineering pillars responsible for critical components such as data ingestion, monitoring, quality and retrieval. Positioned at the foundation of the technology stack, the team also takes the lead on several initiatives that impact the whole engineering team, such as CI/CD pipelines, automation, engineering workflows and, in general, new technologies.
- You manage and monitor a multi-datacenter environment with an Infrastructure as Code methodology
- You build automation to prevent problem recurrence and to reduce deployment times and errors
- You participate in the design of distributed systems architectures
- You ensure scalability, availability and performance of our software stack
- You ensure correctness and availability of the data
- You engage in service capacity planning and demand forecasting
- You maintain security, backup, and redundancy strategies
- Your Profile BSc degree in Computer Science or a related technical field, or equivalent practical experience
- Minimum 3 years of professional experience
- Consolidated knowledge of Unix/Linux systems and their internals
- Strong experience with at least one programming language (e.g., Python, Go)
- Professional experience with containers and their orchestration (e.g., Docker, Swarm, Kubernetes, Marathon/Mesos, etc) Knowledge of networking theory (OSI layers, NAT), protocols (TCP/IP, UDP, Ethernet, DNS) and networking tools (e.g., tcpdump, iptables, netstat)
- Ability to design large-scale distributed systems
- Experience in the management of distributed storage (e.g., Hadoop/HDFS, MinIO, Ceph, Gluster, etc)
- Experience with logging and monitoring systems (e.g., Elastic Stack, Prometheus, Grafana, etc)
- Experience with automation software (e.g., Ansible, Puppet, Jenkins, Tekton)
- Experience with Cloud technologies (e.g., AWS, GCP, Azure)
- Experience in managing complex backup solutions and disaster recovery plans
- Be part of an exciting and ambitious start-up that puts its people at the heart of its business.
- Be part of a diverse, international, cross-disciplinary team of highly motivated, hands-on experts that tackle unique challenges with a positive spirit and lots of fun.
- A flexible work schedule, a dynamic environment where everyone can have a substantial impact, career development programs and additional company holidays.
Our job offer
Software Engineer Site Reliability - Remote / Python (m/f/d) sounds interesting? Then we are looking forward to receiving your application via Campusjäger by Workwise.
With our partner Campusjäger, you can apply for this job in just a few minutes without a cover letter and track the status of your application live.