Data Engineer Consultant - Spark/Python
Job Description
Required Skills :
• Must have excellent knowledge in Apache Spark and Python programming experience
• Deep technical understanding of distributed computing and broader awareness of different Spark version
• Strong UNIX operating system concepts and shell scripting knowledge
• Hands-on experience using Spark & Python
• Deep experience in developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
• Externally certified in one of the cloud services (foundational or advanced)- (AWS, GCP, Azure, Snowflake, Databricks)
• Experience in deployment and operationalizing the code, knowledge of scheduling tools like Airflow, Control-M etc. is preferred
• Experience in creating visualizations in either of Tableau, PowerBI, Qlik, Looker or any of the other reporting tools
• Good knowledge of Hadoop, Hive and Cloudera/ Hortonworks Data Platform
• Should have exposure with Jenkins or equivalent CICD tool & Git repository
• Experience handling CDC operations for huge volume of data
• Should understand and have operating experience with Agile delivery model
• Should have experience in Spark related performance tuning
• Should be well versed with understanding of design documents like HLD, TDD etc
• Should be well versed with Data historical load and overall Framework concepts
• Should have participated in different kinds of testing like Unit Testing, System Testing, User Acceptance Testing, etc
Preferred Skills
• Exposure to PySpark, Cloudera/ Hortonworks, Hadoop and Hive.
• Exposure to AWS S3/EC2 and Apache Airflow
(ref:hirist.tech)