Big Data (PySpark) Tech Lead Jobs in Irving, TX/ Jacksonville, FL/ Jersey City, NJ

Long Finch Technologies

Irving, TX/ Jacksonville, FL/ Jersey City, NJ

Posted On: Dec 04, 2024

Job Type

Full-time

Experience

10 - 25 Years

Salary

$120,000 - $150,000 Per Year

Work Arrangement

On-Site

Travel Requirement

Required Skills

Design, build, and unit test applications using the Spark framework with Python.
Develop PySpark-based applications for both batch and streaming data processing.
Optimize the performance of Spark applications in Hadoop by configuring Spark Context, Spark-SQL, Data Frames, and Pair RDDs. Choose the right native Hadoop file formats (Avro, Parquet, ORC) and compression codecs for optimal data access.
Design and develop real-time data applications using Apache Kafka and Spark Streaming to support dynamic data processing needs.
Develop and execute data pipeline testing processes, validating business rules and ensuring data quality.
Build integrated solutions using Unix shell scripting, RDBMS, Hive, HDFS File System, and HDFS file types. Implement data tokenization libraries for column-level obfuscation and integration with Hive and Spark.
Process and manage large volumes of structured and unstructured data, integrating data from multiple sources to create cohesive data solutions.
Create and maintain automated integration and regression testing frameworks using Jenkins, integrated with Bitbucket and/or GIT repositories.
Participate actively in the Agile development process, communicate issues and bugs during scrum meetings, and document project developments.
Develop and review comprehensive technical documentation for all delivered artifacts.
Solve complex data-driven scenarios, troubleshoot defects, and address production issues effectively.

10+ years in data management, data lakes, and data warehouse development.
6+ years of experience with Hadoop, Hive, Sqoop, SQL, and Teradata.
6+ years of hands-on experience with PySpark (Python and Spark) and Unix.
Knowledge of industry-leading ETL processes is a plus.
Experience in the banking domain is highly desirable.
Expertise in optimizing Spark applications and data access.
Proven experience in building real-time data solutions using Apache Kafka and Spark Streaming.
Ability to work with various data storage and processing technologies, including HDFS, Hive, and RDBMS.
Strong experience in creating automated testing frameworks and continuous integration using Jenkins, Bitbucket, and/or GIT.
Demonstrated ability to triage complex data issues and production problems.
Strong written and verbal communication skills for documentation and collaboration with team members and stakeholders.

Job ID: LF240511

Posted By

Andy

HR Manager