Cloud Kafka SRE (Lead)

This job posting expired and applications are no longer accepted. Published: September 27, 2021
Munich, Germany
Job Type


Who are we?

At we are on a mission to democratize streaming data, helping businesses to operate in real-time. Our technology allows organizations to practice DataOps, a new way of consuming, processing and moving data.

Our product makes working with cutting-edge open-source technologies more accessible to users and operationalizes them for enterprises. Our customers, who range from household names like Daimler-Benz and Adidas to cloud-native startups, use Lenses to help run their strategic streaming and software applications: think IoT, fraud-detection AI engines and cutting-edge consumer-facing microservices.

Looking to make a difference? Are you someone that gets excited about building things? Keen to see your ideas and knowledge shine as they directly impact our product? We are proud of our engineering culture at

During a steep growth period, we have a consistent and opinionated set of best practices that are helping us to scale newly minted technologies like Apache Kafka into enterprise-ready software.

Are you ready to work with Kafka, Kubernetes, Streaming SQL, and other modern technologies on-prem and across clouds? If the answer is ‘yes’ and you are curious, open-minded, hungry, and eager to learn then is the place for you!

What does your day to day look like?

This is a brand new team, reporting to the Director of Streaming Cloud, which will be responsible for operating and growing the DataOps Cloud for streaming data.

You will:

  • Ensure stability of tenants Lenses and Kafka deployments, constantly monitoring, alerting and proactively manage thousands of data pipelines running via Lenses Deploy and maintain instances Lenses and Apache Kafka on demand.
  • Be a part of a team that owns the health of our product and ensure its end-to-end availability and performance according to defined SLOs.
  • Apply your strong troubleshooting skills and strategic disaster recovery thinking.
  • Apply software engineering and SRE principles to design, write and deliver software, improve the availability, scalability and efficiency of our product.
  • You will be working extensively with Strimzi and may be required to contribute back to the project as well as developing and maintaining other Kubernetes operators for Lenses and ancillary services.
  • Constantly improve our monitoring, metrics and KPIs as well as define and implement missing SLOsDrive blameless lessons learned, implement processes and automation to ensure prevention of problem recurrence and document the acquired knowledge sharing it among all teams. What you bring to the table.

We are going to spend many years working together. It’s important that we are compatible:

  • You are metrics-driven, constantly working towards goals and KPIs, improving all the time.
  • You love being an SRE and are eager to learn and become better.
  • You are not afraid to ask questions, give and receive feedback.
  • You have experience in most of our primary technology stack (Kafka or Puslar are a must.)
  • You have strong problem-solving skills, which for us means not only you can solve a problem but also you can explain how you did it and why your solution is correct.
  • You are security, performance, and best-practice conscious.
  • You are able to take responsibilities and own projects (implementation or even design-wise)
  • You like working in a team and you are able to push through language and cultural barriers to work with engineers all over the world.
  • You want to automate everything.
  • No task is beneath or above you.
  • Documentation, support, and even little tasks are things we all do often. This applies even to our CEO!

You are passionate about:

  • Technology
  • Customer Success
  • Trying new tactics and messaging
  • Being part of a team
  • Sharing success with others
  • Inspiring people
  • Primary Technology Stack
  • Experience operating distributed streaming services such as Apache Kafka or Pulsar is required.

The most common parts of our current stack include:

  • Docker & Kubernetes/Openshift including operator patterns
  • ArgoCdAzure & Amazon AWS Scripting
  • Go language Strimzi (big plus)Apache Kafka, Puslar or other distributed streaming technologies (must) Helm/KustomizeJava/Scala/Springboot Datadog/ Prometheus/Grafana


Related Jobs