Pyspark Data Engineer
Are you passionate about big data and ready to take your expertise to the next level? We are seeking an experienced PySpark Data Engineer who is ready to join and contribute to building a world-class data infrastructure. If you have a strong technical background, a knack for problem-solving, and experience in building scalable data solutions, this role is for you!
About our client:
Our client is an award winning design & development agency specialising in websites, custom software and digital campaigns.
Job description:
As a PySpark Data Engineer, you will play a crucial role in designing, developing, and optimizing data pipelines that support our business’s data analytics and engineering needs. This role involves using PySpark, SQL, and other tools to build scalable data solutions and drive efficiencies in our data processing environments.
Key Responsibilities:
· Design, develop, test, deploy, maintain, and optimize scalable data pipelines and integration solutions using PySpark.
· Build and refine data pipelines and analysis tools to support business needs and enhance data-driven insights.
· Collaborate closely with cross-functional teams to understand data requirements and integrate data solutions into our Enterprise Data Platform.
· Conduct requirements gathering, author technical design documents, and create testing plans and scripts to ensure robust data solutions.
· Provide analytical expertise by writing complex SQL queries, optimizing performance, and implementing user-defined functions, views, indexes, and more.
· Facilitate process mapping workshops and support standard operating procedure implementation.
Must-Have Competencies:
· Experience: 5+ years in PySpark development, with experience in data pipeline building and analysis tools.
· Technical Skills:
· Strong knowledge of PySpark and SQL, including ETL experience and data warehousing environment exposure.
· Proficiency in Python and familiarity with commonly used Python libraries.
· Demonstrated ability to write complex queries, optimize queries, and handle advanced database functions.
· Additional Skills: Solid understanding of data integration, Databricks, and/or Apache Spark is a plus.
Good-to-Have Competencies:
• Experience with Power BI, including Semantic Modeling and DAX.
• Familiarity with ETL processes and various database systems such as SQL Server, Oracle, and PostgreSQL.
• Knowledge of cloud platforms like AWS, Azure, or GCP.
Desired Experience Range:
5 to 8 years
What does our client offer:
· Innovative Environment: Collaborate in a forward-thinking workspace that values learning and growth.
· Long term contract
· Flexible Work Options: 8 per day; 8am to 4 pm – working hours are flexible
· Career Development: Continuous learning opportunities, certifications, and skill-building resources.
· Competitive Package: Attractive salary and benefits package.