Country Delight - Data Engineer - Spark/Python
Position Overview
We are seeking a talented and experienced Data Engineer with expertise in Apache Spark, Python / Java and distributed systems. The ideal candidate will be skilled in creating and managing data pipelines using AWS.
Key Responsibilities
• Design, develop, and implement data pipelines for ingesting, transforming, and loading data at scale.
• Utilize Apache Spark for data processing and analysis.
• Utilize AWS services (S3, Redshift, EMR, Glue) to build and manage efficient data pipelines.
• Optimize data pipelines for performance and scalability, considering factors like partitioning, bucketing, and caching.
• Write efficient and maintainable Python code.
• Implement and manage distributed systems for data processing.
• Collaborate with cross-functional teams to understand data requirements and deliver optimal solutions.
• Ensure data quality and integrity throughout the data :
• Proven experience with Apache Spark and Python / Java.
• Strong knowledge of distributed systems.
• Proficiency in creating data pipelines with AWS.
• Excellent problem-solving and analytical skills.
• Ability to work independently and as part of a team.
• Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience).
• Proven experience in designing and developing data pipelines using Apache Spark and Python.
• Experience with distributed systems concepts (Hadoop, YARN) is a plus.
• In-depth knowledge of AWS cloud services for data engineering (S3, Redshift, EMR, Glue).
• Familiarity with data warehousing concepts (data modeling, ETL) is preferred.
• Strong programming skills in Python (Pandas, NumPy, Scikit-learn are a plus).
• Experience with data pipeline orchestration tools (Airflow, Luigi) is a plus.
• Excellent problem-solving and analytical skills.
• Strong communication and collaboration Qualifications :
• Experience with additional AWS services (e.g., AWS Glue, AWS Lambda, Amazon Redshift).
• Familiarity with data warehousing and ETL processes.
• Knowledge of data governance and best practices.
• Have a good understanding of the oops concept.
• Hands-on experience with SQL database design
• Experience with Python, SQL, and data visualization/exploration tools
(ref:hirist.tech)