This role blends advanced Data Science, Machine Learning, and Generative AI with robust Data Engineering, particularly using Spark and AWS, to deliver scalable, production-grade analytical products. You will design end-to-end data solutions—from data ingestion to model deployment—while partnering with engineering and product teams to translate insights into actionable outcomes.
Roles & Responsibilities
- Lead the full ML development lifecycle: problem framing, hypothesis formulation, feature engineering, model development, validation, deployment, and monitoring.
- Develop, test, and optimize machine learning models including:
- Supervised & unsupervised learning
- Statistical modeling and forecasting
- Natural Language Processing (NLP)
- Generative AI techniques for automation and insight extraction
- Graph/network analytics for analyzing network behaviors and relationships
- Build advanced anomaly detection, predictive maintenance, and risk scoring models for network security and operational efficiency.
- Conduct large-scale exploratory data analysis (EDA) to identify trends, data quality issues, and opportunities for automation.
- Define and implement model evaluation and A/B testing strategies.
- Collaborate with ML engineering teams to operationalize models using MLOps best practices.
- Communicate complex analytical findings through clear narratives, visualizations, and presentations tailored to technical and non-technical audiences.
Data Engineering & ETL
- Design, develop, and maintain scalable, fault-tolerant ETL pipelines using Spark to support analytics and machine learning workloads.
- Implement monitoring, alerting, and automated recovery mechanisms to ensure data pipeline reliability.
- Build robust feature pipelines that enable real-time and batch ML processing.
- Integrate data from a wide range of sources: APIs, Flat files , Relational databases and Distributed file systems (HDFS/S3)
- Support continuous integration and continuous delivery (CI/CD) workflows for data and ML components.
Required Qualifications
- Strong communication, presentation skills, and ability to translate analytics into business value.
- Expertise in programming languages commonly used in data science: Python (primary), Scala or Java (preferred for ETL/engineering).
- Proven experience with Spark and large-scale distributed data processing.
- Deep understanding of: Statistical modeling, Hypothesis testing, Experimental design and Causality and multicollinearity.
- Strong SQL skills and experience with relational and NoSQL databases.
- Expertise across a wide range of ML methodologies: Regression, classification, clustering, Time-series forecasting, Bayesian methods, NLP and text analytics and Graph analytics.
- Experience with data preprocessing, feature engineering, and EDA.
- Familiarity with data architectures such as data lakes, warehouses, and marts.
- Demonstrated ability to continuously learn, adapt, and share knowledge.
Preferred Qualifications
- Experience with AWS services (S3, EMR, Lambda, Glue, SageMaker).
- Prior exposure to Generative AI, LLMs, prompt engineering, or building AI-driven automation systems.
- Experience with Linux-based systems.
- Background in text mining, document classification, or large-scale unstructured data processing.
- Bachelor’s degree in Computer Science, Data Science, Statistics, Mathematics, Physics, Engineering, Operations Research, or a related field.
- Master’s degree with 6+ years or Bachelor’s degree with 8+ years of relevant work experience.