We are looking for a skilled Kafka Operations Administrator with hands-on production-grade Apache Kafka operations experience. The ideal candidate will be responsible for managing, maintaining, and upgrading Kafka clusters in production environments with a strong focus on high availability, disaster recovery, fail-over, and overall reliability.
Key Requirements
- Production-grade Apache Kafka operations experience, managing, maintaining, and upgrading Kafka clusters in production environments with a focus on high availability, disaster recovery, fail-over and overall reliability.
- Kafka ecosystem tooling experience: Kafka Connect, Schema Registry.
- Proficiency in installing and configuring monitoring systems using Grafana (building dashboards), Prometheus, JMX metrics and Splunk.
- Automation and orchestration experience: Terraform, Ansible, Helm, Kubernetes (EKS/AKS/GKE) or equivalent.
- Scripting and tooling experience: Python or Bash for automation and runbooks.
- Strong Linux system administration experience, including troubleshooting, automation and scripting for efficient infrastructure management.
- Knowledge of networking concepts across on-prem VMs and cloud environments, ensuring seamless integration and communication between services.
- Strong understanding of topic management and security best practices for streaming platforms: TLS, ACLs, RBAC, encryption at rest/in transit.
- Experience participating in 24x7 on-call rotations, JVM tuning, GC Analysis, network and disk I/O diagnostics and documenting incidents/postmortems.
- Experience in TCP/IP, routing, switching and firewall configurations relevant to Kafka operations.
Good to Have
- Deep Kafka performance tuning and capacity planning experience.
- Knowledge of message delivery semantics and guarantees (at-least-once, exactly-once).
- Cloud-native security/compliance experience (IAM, VPC, KMS, Security Groups).
- Certifications: Confluent Certified Administrator, AWS/Azure/GCP certifications.
- Experience with Apache Kafka in KRaft mode, including setup, configuration, troubleshooting, and cluster management.
- Containerization and container orchestration tools experience: Docker, Kubernetes.
- Experience with CI/CD pipelines and Git-based workflows.
- Experience building custom Kafka Connect libraries and understanding of data serialization formats (e.g., Avro, JSON).