Site Reliability Engineer

Location

Remote, Switzerland

Job Type

Description

Cureous is a Swiss open-science startup, creating a groundbreaking new human health research platform to support the investigation of new treatments for chronic conditions.

Our mission is to enable patients and researchers to collaboratively research and develop new, safe and cost-effective treatments for chronic conditions that become part of practiced health care and improve the life of millions.

We are a closely knit team working remotely from locations across Europe. We live and breathe through Slack, Trello and Quip. We co-locate when necessary, to tackle hard problems, and on a regular basis to touch base as a team and have fun. You should be willing to travel for about a week per quarter.

As our Site Reliability Engineer, you will have the crucial role of ensuring that our cloud-based infrastructure, API and database services remain secure, fast and highly available. As a central player in our team, you will collaborate with our Backend Lead on architecture challenges and with the support team on optimal processes and tools. Comfortable with scripting, you not only troubleshoot issues but design and automate solutions for the long-term.

Roles and responsibilities:

Ensure a high level of security, performance, and availability of our infrastructure and services
Configure and extend monitoring, logging, and reporting solutions
Automate and document our software deployment and infrastructure tasks (e.g. setting up a new node)
Participate in the design of our system architecture and maintain security, backup, disaster recovery, and redundancy strategies
Analyze user feedback and develop processes and tools for 2nd level support

Requirements:

Fluent English
BSc/MSc degree in computer science or a related field, or equivalent experience
Experience with cloud based infrastructure services (at least 2 years)
Understanding of TCP/IP LAN/WAN networking technologies and troubleshooting techniques
Experience with IT infrastructure automation tools (e.g. Ansible, CloudFormation, Terraform, Chef, Puppet)
Good knowledge of Linux/Ubuntu operating system
Scripting proficiency (e.g. Python, Shell)
Experience with IT systems metrics analysis, alerting and reporting (e.g. Prometheus, NagiOS, Icinga)
Good understanding of IT security concerns
Autonomy and accountability, especially in a remote working setup

Experience in these specific domains is a plus:

PostgreSQL administration
Go

Site Reliability Engineer

Description

ABOUT US

FOLLOW US