The Role

We are looking for a highly skilled Data Scientist/Engineer with strong expertise in Python programming, data processing, and analytical problem-solving. This role requires a blend of analytical skills, engineering capabilities, and hands-on data manipulation to derive actionable insights, build efficient pipelines, and support data-driven decision-making across teams.

Responsibilities:

Data Exploration & Analysis:

  • Analyze large and complex datasets to extract meaningful insights and drive decision-making processes.
  • Identify data trends, anomalies, and opportunities for improvement within datasets and communicate findings clearly to stakeholders.
  • Collaborate with cross-functional teams to understand business requirements and transform them into technical solutions.

Data Pipeline Development:

  • Design, develop, and maintain robust data pipelines for efficient data ingestion, transformation, and storage.
  • Optimize and automate data workflows to improve data availability, quality, and processing efficiency.
  • Implement ETL (Extract, Transform, Load) processes to support analytics and reporting needs.

Data Modeling & Feature Engineering:

  • Build, validate, and maintain data models to support machine learning and statistical analysis needs.
  • Engineer and preprocess features for machine learning algorithms and ensure data quality and consistency.
  • Develop scalable solutions for feature storage, retrieval, and real-time model serving.

Programming & Scripting:

  • Write efficient, scalable, and well-documented Python code to support data engineering and analysis tasks.
  • Collaborate on code reviews, optimize code performance, and apply best practices in coding and version control.
  • Use Python libraries (e.g., Pandas, NumPy, SQLAlchemy) to streamline data workflows and support analysis.

Performance Optimization & Troubleshooting:

  • Monitor, troubleshoot, and enhance the performance of data systems and pipelines.
  • Address data integrity and pipeline issues promptly to ensure reliable data availability and system uptime.
  • Implement monitoring and logging to preemptively detect and resolve issues.

Collaboration & Communication:

  • Work closely with data scientists, analysts, and other engineers to develop cohesive data solutions.
  • Translate complex technical issues into non-technical language for clear communication with stakeholders.
  • Contribute to documentation, data standards, and best practices to foster a data-centric culture.

Job Requirements:

  • Technical Skills: Strong proficiency in Python and familiarity with data processing libraries (e.g., Pandas, NumPy, PySpark). Experience with SQL for data extraction and manipulation.
  • Data Engineering Knowledge: Experience in designing, building, and managing data pipelines, ETL workflows, and data warehousing solutions.
  • Statistical & Analytical Skills: Ability to apply statistical methods for data analysis and familiarity with machine learning concepts.
  • Problem-Solving Mindset: Proven ability to troubleshoot complex data issues and continuously improve workflows for efficiency and accuracy.
  • Communication: Effective communication skills to convey data insights to technical and non-technical stakeholders alike.
  • Bonus: Experience with cloud platforms (e.g., AWS, GCP), containerization (e.g., Docker), and orchestration tools (e.g., Airflow) is a plus.\

Preferred Education & Experience:

  • Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, Mathematics, or a related field.
  • 3+ years of experience in a data science or data engineering role.