Site Reliability Engineer

Cureous Published: September 20, 2018
Location
Remote, Switzerland
Job Type

Description

Cureous is a Swiss open-science startup, creating a groundbreaking new human health research platform to support the investigation of new treatments for chronic conditions.

Our mission is to enable patients and researchers to collaboratively research and develop new, safe and cost-effective treatments for chronic conditions that become part of practiced health care and improve the life of millions.

We are a closely knit team working remotely from locations across Europe. We live and breathe through Slack, Trello and Quip. We co-locate when necessary, to tackle hard problems, and on a regular basis to touch base as a team and have fun. You should be willing to travel for about a week per quarter.

As our Site Reliability Engineer, you will have the crucial role of ensuring that our cloud-based infrastructure, API and database services remain secure, fast and highly available. As a central player in our team, you will collaborate with our Backend Lead on architecture challenges and with the support team on optimal processes and tools. Comfortable with scripting, you not only troubleshoot issues but design and automate solutions for the long-term.

Roles and responsibilities:

  • Ensure a high level of security, performance, and availability of our infrastructure and services
  • Configure and extend monitoring, logging, and reporting solutions
  • Automate and document our software deployment and infrastructure tasks (e.g. setting up a new node)
  • Participate in the design of our system architecture and maintain security, backup, disaster recovery, and redundancy strategies
  • Analyze user feedback and develop processes and tools for 2nd level support

Requirements:

  • Fluent English
  • BSc/MSc degree in computer science or a related field, or equivalent experience
  • Experience with cloud based infrastructure services (at least 2 years)
  • Understanding of TCP/IP LAN/WAN networking technologies and troubleshooting techniques
  • Experience with IT infrastructure automation tools (e.g. Ansible, CloudFormation, Terraform, Chef, Puppet)
  • Good knowledge of Linux/Ubuntu operating system
  • Scripting proficiency (e.g. Python, Shell)
  • Experience with IT systems metrics analysis, alerting and reporting (e.g. Prometheus, NagiOS, Icinga)
  • Good understanding of IT security concerns
  • Autonomy and accountability, especially in a remote working setup

Experience in these specific domains is a plus:

  • PostgreSQL administration
  • Go

Related Jobs

Scientific Lead   Barcelona, Spain
February 7, 2024