We are seeking a highly skilled Sr. PySpark Engineer with extensive experience in big data, distributed computing, and cloud data platforms. The ideal candidate will lead data engineering efforts, design scalable ETL pipelines, and mentor junior team members.
Required Skills & Experience
- 10+ years of experience in big data and distributed computing.
- Strong hands-on expertise with PySpark, Apache Spark, and Python.
- Proficiency in SQL and NoSQL databases (e.g., DB2, PostgreSQL, Snowflake).
- Solid experience in data modeling, ETL workflows, and pipeline design.
- Experience with workflow schedulers such as Airflow.
- Hands-on experience with AWS cloud-based data platforms.
- Familiarity with DevOps, CI/CD pipelines, Docker, and Kubernetes is a plus.
- Strong problem-solving, team leadership, and mentoring skills.
Key Responsibilities
- Design, develop, and maintain large-scale PySpark ETL pipelines.
- Optimize and troubleshoot big data workflows for performance and reliability.
- Collaborate with cross-functional teams to implement data-driven solutions.
- Lead and mentor junior data engineers, ensuring adherence to best practices.
- Work with cloud-based data platforms and DevOps pipelines to enable automated deployment.