Guide
Data Roles: Analyst, Data Scientist, Data Engineer, Analytics Engineer, MLE, and Data Product Manager
A podcast-backed guide to common data roles, how their responsibilities differ, how to choose a target role, and what portfolio evidence each role needs.
Related Wiki Pages
Data roles are easier to compare by ownership than by title. In the DataTalks.Club archive, an analyst explains what happened. A data scientist turns ambiguous questions into evidence, experiments, and models.
A data engineer makes data reliable enough for other people to use. An analytics engineer turns raw data into trusted business models. A machine learning engineer ships model-backed systems. A data product manager decides which data capability should exist and how success will be measured.
The boundaries move by company size and maturity. The role overview in Data Team Roles Explained separates the team flow. Product managers stay close to users, analysts quantify problems and KPIs, and data scientists predict and evaluate. Data engineers prepare usable data, while machine learning engineers help scale model-backed services.
In How to Build and Scale ML Teams, Dat Tran adds the startup version. Early teams often need generalists first. Specialists become easier to justify as the product and data platform mature.
Choosing a target role means comparing adjacent responsibilities and building portfolio evidence that matches the job you want. For deeper archive pages, start with Data Analyst Role, Data Scientist Role, and Data Engineer Role. Then compare Analytics Engineering, Machine Learning Engineer Role, and Data Product Management.
Common Data Roles
The data analyst role is the best entry point for people who like metrics and SQL. It also fits people who enjoy dashboards and stakeholder questions. Analysts help teams understand what happened, why a metric moved, and what decision should follow.
The role definition episode puts analysts close to product managers because analysts know company data. They can also quantify whether the team should solve a problem (Data Team Roles Explained). The broader archive adds product analytics and experiments to that definition. It also adds funnels, cohorts, and dashboard communication (Data Analyst Role, Product Analytics).
The data scientist role is broader and less stable. A data scientist may work on prediction, experimentation, or decision science. The same title can also mean product analytics or applied machine learning. In the role definition episode, the simplest split is that analysts explain what happened while data scientists predict and help integrate predictions into products.
In CRISP-DM, the data science workflow starts with business understanding. It then moves through data preparation, modeling, evaluation, and deployment. That makes the job less about a notebook and more about tested evidence for a decision (Data Scientist Role).
The data engineer role owns dependable data movement: engineers ingest and store data. They transform and orchestrate datasets, then test, document, and operate them for downstream teams.
The role definition episode describes data engineers as the people who make user-generated data available in usable form. In Big Data Engineer vs Data Scientist, Roksolana Diachuk grounds the engineering side in ETL and storage. Spark performance, monitoring, and schema work also sit on that side (Data Engineer Role, Data Engineering).
The analytics engineer role sits between analyst and data engineer. The useful definition is more precise than “half analyst, half engineer.” Analytics engineers build reusable SQL models and tests. They also own documentation, semantic definitions, and BI-ready marts.
Victoria Perez Mola
grounds the role in modeling, data quality, dbt, and Looker in
Master Analytics Engineering.
Juan Manuel Perafan adds
that the role makes business reality visible in safe data systems in
Foundations of the Analytics Engineer Role
(Analytics Engineering).
The machine learning engineer role begins when models need to become reliable software. MLEs package models and expose inference paths. They build serving paths, test deployments, and monitor model behavior.
The role definition episode describes MLEs as the people who help data scientists scale model-backed services. Ben Wilson adds the maintainability lens in Practical Machine Learning Engineering for Production. Good ML engineering favors modular systems the team can test and operate (Machine Learning Engineer Role, Machine Learning System Design).
The data product manager role owns product judgment around data capabilities. That capability might be a dashboard, metric layer, or recommender. It might also be a data platform or MLOps platform. Sara Menefee describes the role through customer discovery and hypothesis formation.
Data literacy and launch work also belong in the role. Quality and documentation appear there too in Product Designer to Data Product Manager. Greg Coquillo adds roadmaps and customer journey mapping. He also adds success metrics and problem-first AI product work in Build & Scale Data Products for AI (Data Product Management, Data Products).
Role Boundaries
Analyst versus data scientist is usually a split between explanation and prediction, but the boundary is fuzzy. Analysts often own SQL, dashboards, and metrics. Cohort analysis and experiment readouts can sit there too.
Data scientists add heavier modeling, experiment design, predictive features, and uncertainty analysis. In Hiring Data Scientists and Analysts, Alicja Notowska treats title ambiguity as a hiring reality. The actual responsibilities matter more than the label (Data Analyst Role, Data Scientist Role).
Data scientist versus data engineer is a split between decision logic and data systems. Data scientists own framing and feature reasoning. They also own modeling, evaluation, and interpretation. Data engineers own ingestion and storage. They also own orchestration, schemas, quality checks, and platform reliability.
Roksolana Diachuk places data cleaning and feature engineering near data science. Model cycles sit there too. ETL and storage stay closer to data engineering, along with Spark optimization, monitoring, and schema changes (Data Engineer vs Data Scientist).
Data analyst versus analytics engineer is a split between interpreting questions and maintaining reusable analytical assets. Analysts answer product and business questions. Analytics engineers encode trusted models, metric definitions, tests, and documentation so other people don’t rebuild the same logic in every dashboard.
Nikola Maksimovic
shows the overlap in
From Marketing to Analytics Engineering.
Marketing reporting and SQL can sit on the path from analyst-like work into
analytics engineering. Looker, product analytics, dbt, and A/B testing can
join the same path
(Data Analyst vs Analytics Engineer).
Data engineer versus analytics engineer is a split between platform paths and business-facing models. Data engineers tend to own ingestion and orchestration. Raw storage and runtime reliability sit there too.
Analytics engineers depend on those paths. They then add modeled domains, semantic definitions, and BI-ready marts. Tests and documentation sit with that work too.
The boundary is clear in modern-stack discussions such as ETL, ELT, and the Modern Data Stack. Natalie Kwong places data marts after ingestion and storage. She places ELT transformations before those marts (Modern Data Stack).
Data scientist versus machine learning engineer is a split between model reasoning and production ownership. Data scientists usually own the problem, data, features, and model choice. They also own evaluation and interpretation.
ML engineers own packaging and serving, plus scalability and maintainability. They also own deployment and runtime behavior. The split moves in small teams. The handoff is visible whenever a model becomes a batch job or API. It’s also visible when a model becomes a monitored service or product feature (Machine Learning Infrastructure, MLOps).
Data product manager versus every technical role is a split between product decision and implementation. The data product manager decides which user problem matters, which outcome proves success, which constraints set the roadmap, and how adoption will happen. Technical leads and contributors decide how to build the solution. Geo Jolly makes that split concrete in ML Product Manager and MLOps Platform Strategy. The PM defines the problem and target outcome, while the engineering team defines the solution (ML Product Manager Role).
Choosing a Target Data Role
Choose the role by the work you want to own every week:
- If you like business questions, SQL, dashboards, and metric movement, start with the data analyst role.
- If you like prediction, experiments, ambiguity, and model evaluation, compare data science careers with the data scientist role.
- If you like systems, data movement, and reliability, use the data engineering roadmap.
If you like SQL and business meaning, analytics engineering is a strong target
when you want more software rigor than dashboard work usually offers. The
Analytics Engineering Roadmap
keeps the sequence practical. Start with SQL and modeling. Then add
dbt-style projects, tests, documentation, and BI consumption.
If you already write production software and want to move toward models, use Software Engineer to Machine Learning and the Machine Learning Engineer Role.
If you like discovery and stakeholder translation, consider data product management. Roadmaps, adoption, and measurement belong in that path too. The path rewards product sense and enough data literacy to make credible tradeoffs. It doesn’t require replacing the data engineer, data scientist, or ML engineer. It does require understanding what their work costs and what users need from the result (Data Product Adoption, Metrics).
Team stage matters too because Dat Tran’s team-building episode argues that early startups often need T-shaped generalists. They may need to move across product, data engineering, and ML. Later teams can afford more specialization (How to Build and Scale ML Teams). That means a first data hire may do analyst, engineer, scientist, and product work in the same month. A mature platform team may split those same responsibilities across several people.
For career changers, the archive’s repeated advice is to translate prior work into role evidence. Ksenia Legostay turned project management and KPI work into data science evidence in Project Manager to Data Scientist. Nikola Maksimovic turned marketing funnels and reporting into analytics engineering evidence. Santiago Valdarrama turned software engineering into ML system work in Software Engineering to Machine Learning. The target role decides which old skill is an asset and which gap you need to close (Career Transitions in Data).
Portfolio Evidence by Role
A data analyst portfolio should prove SQL and metric thinking. It should also show visualization and recommendation quality. A good project starts with a business or product question. It then shows data checks, cohort or funnel logic, and dashboard design. It should finish with a written decision.
Product analytics projects should explain events, segments, and metric definitions. They should also name caveats (Product Analytics, Experimentation).
A data scientist portfolio should prove problem framing and baselines. It should show data preparation, modeling, evaluation, and communication.
CRISP-DM starts the project with business understanding, which makes it a good portfolio template.
Luke Whipps gives the hiring version: projects should connect the claimed tech stack to concrete work and business impact.
A data engineering portfolio should prove that data can run without manual notebook work. Build a pipeline from source data to modeled tables. Add tests or quality checks, then document the schema and make reruns inspectable.
Jeff Katz emphasizes SQL, Python, and cloud fundamentals in his data engineering career and job-prep episodes. Docker and Airflow also appear there, along with warehouses (Build a Data Engineering Career, Data Engineering Job Prep, Data Engineering Portfolio Projects).
An analytics engineering portfolio should prove reusable modeling. Show raw source assumptions and staging models, then show marts, tests, and documentation. Add one BI or query layer that consumes the shared model. Explain table grain and metric definitions.
Victoria Perez Mola’s role episode and Juan Manuel Perafan’s foundations episode both reward modeling and tests over dashboard-only proof (Analytics Engineering Portfolio Projects).
A machine learning engineering portfolio should prove that a model-backed system can run, change, and fail predictably. A training script plus batch scoring job can be stronger than an advanced model with no run path. Add an API or CLI, Docker, tests, and evaluation. Add monitoring notes and a simple deployment path.
Ben Wilson’s production ML discussion and Nadia Nahar’s Software Engineering for ML both make requirements and modular code part of ML engineering evidence. Tests and deployment gaps matter too (Machine Learning Portfolio Projects).
A data product manager portfolio should prove product judgment around a data capability. Write a short case study that names the user and workflow. Add the pain point and candidate solution. Include tradeoffs, a roadmap, and a success metric. Cover launch risk and adoption too.
Greg Coquillo’s roadmap and success-metrics discussion supports that structure, while Geo Jolly’s ML platform episode adds the internal-product version. Interview users and prioritize platform gaps. Define adoption metrics and explain why a capability should ship now (Data Product Management, ML Product Manager Role).
Learning Paths and Next Steps
Start with the role page closest to the work you want, then use one roadmap and one portfolio page. For analyst or product analytics paths, read Data Analyst Role, Data Analyst Careers, and Product Analytics. For data science, use Data Scientist Role, Data Scientist Interview Roadmap, and Machine Learning Portfolio Projects.
For data engineering, use Data Engineer Role, Data Engineering Roadmap, and Data Engineering Portfolio Projects. For analytics engineering, use Analytics Engineering, Analytics Engineering Roadmap, and Analytics Engineering Portfolio Projects.
For ML engineering, use Machine Learning Engineer Role, Machine Learning System Design, and MLOps Roadmap.
For product paths, start with Data Product Management and Data Products. Then use Data Product Adoption and Data Product Manager Role.
The practical sequence is the same across roles. Choose one target, build one project that proves the target responsibility, and write the case study in the language of that role. Recruiters and hiring managers shouldn’t have to infer whether you want analytics, data science, or data engineering. They also shouldn’t have to infer whether you want ML engineering or data product work. Your project, resume, and interview story should make that choice visible.