We are seeking a talented Spark Engineer to join our data engineering team. In this role, you will design, implement, and optimize large-scale data processing pipelines using Apache Spark and other big data technologies. You will work with cross-functional teams to process and analyze vast amounts of data, enabling data-driven decision-making across the organization.
Key Responsibilities
- Design, implement, and optimize scalable data processing pipelines using Apache Spark (batch and streaming), ensuring high performance and reliability.
- Work with large datasets, implementing data transformations, aggregations, and other complex processing tasks across distributed environments.
- Integrate Spark with other big data technologies like Hadoop, Hive, Kafka, AWS, GCP, and Azure.
- Work closely with data scientists, analysts, and software engineers to deliver end-to-end data solutions.
- Troubleshoot, profile, and optimize Spark jobs for maximum performance and scalability.
Required Qualifications
- 8+ years of hands-on experience with Apache Spark (PySpark, Scala, or Java).
- Strong understanding of big data technologies (Hadoop, Kafka, Hive).
- Experience with distributed computing and parallel processing frameworks.
- Proficiency in Python or Scala for Spark-based development.
- Knowledge of cloud platforms (AWS, GCP, or Azure) and cloud-based data storage (e.g., S3, Google Cloud Storage).
- Experience with SQL and database technologies (relational and NoSQL).
- Strong debugging, troubleshooting, and performance tuning skills.
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.