The Role
We are looking for a highly skilled Data Engineer with strong expertise in Python programming, data processing, and analytical problem-solving. This role requires a blend of analytical skills, engineering capabilities, and hands-on data manipulation to derive actionable insights, build efficient pipelines, and support data-driven decision-making across teams.
Responsibilities:
Data Exploration & Analysis:
- Analyze large and complex datasets to extract meaningful insights and drive decision-making processes.
- Identify data trends, anomalies, and opportunities for improvement within datasets and communicate findings clearly to stakeholders.
- Collaborate with cross-functional teams to understand business requirements and transform them into technical solutions.
Data Pipeline Development:
- Design, develop, and maintain robust data pipelines for efficient data ingestion, transformation, and storage.
- Optimize and automate data workflows to improve data availability, quality, and processing efficiency.
- Implement ETL (Extract, Transform, Load) processes to support analytics and reporting needs.
Data Modeling & Feature Engineering:
- Build, validate, and maintain data models to support machine learning and statistical analysis needs.
- Engineer and preprocess features for machine learning algorithms and ensure data quality and consistency.
- Develop scalable solutions for feature storage, retrieval, and real-time model serving.
Programming & Scripting:
- Write efficient, scalable, and well-documented Python code to support data engineering and analysis tasks.
- Collaborate on code reviews, optimize code performance, and apply best practices in coding and version control.
- Use Python libraries (e.g., Pandas, NumPy, SQLAlchemy) to streamline data workflows and support analysis.
Performance Optimization & Troubleshooting:
- Monitor, troubleshoot, and enhance the performance of data systems and pipelines.
- Address data integrity and pipeline issues promptly to ensure reliable data availability and system uptime.
- Implement monitoring and logging to preemptively detect and resolve issues.
Collaboration & Communication:
- Work closely with data scientists, analysts, and other engineers to develop cohesive data solutions.
- Translate complex technical issues into non-technical language for clear communication with stakeholders.
- Contribute to documentation, data standards, and best practices to foster a data-centric culture.
Job Requirements:
- Technical Skills: Strong proficiency in Python and familiarity with data processing libraries (e.g., Pandas, NumPy, PySpark). Experience with SQL for data extraction and manipulation.
- Data Engineering Knowledge: Experience in designing, building, and managing data pipelines, ETL workflows, and data warehousing solutions.
- Statistical & Analytical Skills: Ability to apply statistical methods for data analysis and familiarity with machine learning concepts.
- Problem-Solving Mindset: Proven ability to troubleshoot complex data issues and continuously improve workflows for efficiency and accuracy.
- Communication: Effective communication skills to convey data insights to technical and non-technical stakeholders alike.
- Bonus: Experience with cloud platforms (e.g., AWS, GCP), containerization (e.g., Docker), and orchestration tools (e.g., Airflow) is a plus.\
Preferred Education & Experience:
- Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, Mathematics, or a related field.
- 3+ years of experience in a data science or data engineering role.
Benefits
- Compensation commensurate with experience
- Unlimited vacation
- Ongoing education and training
- Bonuses and profit-sharing