Wiki
Analytics Engineering Portfolio Projects
Archive-backed guidance for analytics engineering portfolio projects that prove SQL modeling, metric ownership, dbt-style tests, documentation, BI readiness, and stakeholder judgment.
Related Wiki Pages
Definition
An analytics engineering portfolio project should prove that you can turn messy source data into reusable models and shared metric definitions. The strongest projects show more than SQL or a dashboard. They show table grain, modeled layers, and tests. They show documentation and BI consumption too. They also show the business question behind the model.
The DataTalks.Club analytics-engineering episodes set that bar. Victoria Perez Mola describes daily work around data modeling and data quality. Pipelines and Looker exposure are part of the same job. She also describes dbt handling SQL transformations and tests. Documentation and dependency graphs belong in that workflow too (Master Analytics Engineering at 4:05-10:04).
Juan Manuel Perafan adds that analytics engineering should make business reality match the data. Engineering discipline then makes the result safer (Foundations of the Analytics Engineer Role at 11:03 and 46:34).
This topic covers portfolios aimed at analytics engineering, BI-heavy data roles, and analyst-to-engineer transitions. For ingestion, orchestration, and platform-heavy work, use Data Engineering Portfolio Projects. For learning order, use Analytics Engineering Roadmap.
Link Map
Start with these role and project pages:
- Analytics Engineering for the role definition.
- Analytics Engineering Roadmap for sequencing SQL, modeling, dbt, and testing.
- Data Analyst vs Analytics Engineer for the boundary between dashboard interpretation and reusable model ownership.
- Marketing to Analytics Engineering for a transition path through BI, Looker, SQL, and dbt. It also covers product analytics and funnels.
- Modern Data Stack, ETL vs ELT, and dbt for the stack behind project choices.
- Product Analytics, Event Tracking, Metrics, and Data Activation for growth and product-oriented projects.
- Data Quality and Observability and DataOps for tests, alerts, lineage, and run behavior.
The main podcast anchors are:
- Master Analytics Engineering with Victoria Perez Mola for data modeling, Looker, dbt, and tests. It also covers docs, DAGs, and role-fit signals.
- From Marketing to Analytics Engineering with Nikola Maksimovic for BI projects, dbt migration, LookML, and product analytics. It also covers A/B testing and wide-versus-narrow modeling tradeoffs.
- Foundations of the Analytics Engineer Role with Juan Manuel Perafan for business-reality modeling, testing dashboards, and software-engineering rigor.
- ETL, ELT, and the Modern Data Stack with Natalie Kwong for ELT, data marts, warehouses, and dbt. It also covers orchestration, CDC, and reverse flows.
- Data-Led Growth Stack with Arpit Choudhury for tracking plans, event semantics, warehouse transformations, and BI. It also covers reverse ETL.
- Building and Scaling a Data Team with Tammy Liang for business health dashboards, dbt, documentation, and testing. It also covers monitoring and adoption.
- Build a Data Engineering Career with Jeff Katz for SQL mastery, OLTP versus OLAP, and modeling practice. The analytics-engineering module uses dbt, Snowflake, Mode, and Fivetran.
Common Definition
Across the archive, a good analytics engineering portfolio project starts with a repeated business question and ends with a trusted analytical surface. The repository should make the path easy to review. Show raw source assumptions and staging models. Add intermediate logic, marts, tests, and docs. Then show a dashboard or query layer that consumes the shared models.
Victoria Perez Mola grounds the role in SQL models that analysts and data scientists can use. Looker is the consumption layer, and dbt is the transformation layer (4:05-8:59). That makes a dashboard-only project weak unless the dashboard sits on reusable models. It also makes a dbt-only project weak unless the models answer a business question and expose definitions to consumers.
Juan Manuel Perafan pushes the definition beyond “between analyst and engineer.” He frames the work as making data reflect business reality, then adding robustness and software-engineering discipline (7:56-16:25 and 46:34). A portfolio should therefore explain why the model represents the business correctly. It should state what one row means and which joins preserve the grain. It should also name accepted edge cases and caveats for stakeholders.
Natalie Kwong places the same work inside ELT. Data arrives first, then analysts or analytics engineers transform it with SQL and dbt. They publish data marts and consumption tables (7:57-18:47 and 31:31). That evidence favors projects that show both source ingestion assumptions and warehouse-side transformations, even when the portfolio isn’t a full data-engineering project.
Guest Differences
Guests differ on whether the portfolio should sell a distinct job title or a way of working. Victoria Perez Mola describes a recognizable analytics-engineer role with modeling, quality, Looker, and dbt. The role also collaborates with analysts, data scientists, and backend engineers (14:34-20:52 and 33:02).
Juan Manuel Perafan is more cautious about defining the role only by the gap between analysts and engineers. His evidence points portfolio builders toward the craft: modeling business reality and testing dashboards. Rigor in data workflows matters too (7:56-11:03 and 38:41-46:34).
Nikola Maksimovic shows a transition version of the portfolio, and the proof didn’t start as a public repository. It started with marketing reporting, BI-team conversations, and Looker work. SQL practice and BI projects happened alongside marketing work. The later role included dbt migration, LookML, product analytics, and A/B testing (7:18-23:12 and 38:27).
That evidence supports portfolios that turn domain knowledge into modeled metrics instead of treating domain context as background.
Arpit Choudhury widens the project boundary toward growth systems. His episode connects tracking plans, event collection, warehouse transformations, and BI to activation work. Reverse ETL then sends modeled data to support, sales, and engagement tools (13:34-30:03 and 37:25-46:13). That makes activation projects legitimate analytics-engineering evidence when you document event ownership, data meaning, and downstream consequences.
Project Evidence
A strong project makes the analytical promise visible. Start with a consumer such as a growth manager, finance stakeholder, operations team, or product manager. Then show how the modeled data supports that decision.
Tammy Liang gives a useful adoption test. Her first data-team work focused on business-health monitoring, streamlined reporting, and rebuilding trust. The stack later included Stitch, GCP, dbt, and Data Studio. A Notion wiki documented the work. The team then added testing, monitoring, and workshops for adoption (7:22-22:32 and 35:38-49:00).
For a portfolio, the README shouldn’t stop at “here’s the dashboard.” It should show who uses the dashboard and what changed from the old spreadsheet or duplicated query. It should also show how another analyst finds the definitions.
The minimum evidence set is:
- Business question: name the decision and the metric owner. This follows Nikola Maksimovic from performance marketing into BI and product analytics. Funnels, retention, RFM analysis, and A/B testing gave modeling work a target (2:53 and 38:27-41:50).
- Source semantics: document source tables or events, update cadence, known defects, and ownership because Arpit Choudhury grounds this in tracking plans with events, properties, and ownership (13:34-20:47).
- Modeled layers: separate sources, staging logic, intermediate joins, and marts. Natalie Kwong distinguishes warehouses, transformations, and data marts inside ELT (10:00-18:47).
- Metric definitions: state grain, dimensions, facts, filters, and accepted business logic, because Juan Manuel Perafan ties this to making data resemble business reality (11:03-20:21).
- Tests and docs: add non-null checks, unique checks, accepted-values checks, relationship checks, and freshness checks. Add custom tests where they match the data rules, as Victoria Perez Mola discusses dbt tests and upstream checks. She also covers warnings, errors, docs, and profiling tools (36:44-38:53 and 50:46).
- BI or activation surface: publish a dashboard, semantic layer, notebook, or reverse-ETL segment that consumes only modeled data. Arpit Choudhury connects warehouse transformations to BI and operational activation (28:52-37:25).
- Run behavior: explain which failures block the build, which warnings need review, and how consumers learn about data issues. Barr Moses frames freshness, volume, distribution, and schema as reliability signals. She also covers lineage, ownership, and SLAs (16:38-35:24 and 58:51).
Project Types
A metric mart and dashboard project is the clearest portfolio option. Pick a domain with repeatable decisions. Good domains include subscriptions and ecommerce. Marketing spend and SaaS usage also work. Logistics and finance can work too.
Build source models first before adding staging tables plus facts and dimensions. Define KPIs and add tests. Publish one documented dashboard that uses only the modeled layer.
This matches Victoria Perez Mola on modeling and Looker exposure. It also matches Tammy Liang on dashboards, documentation and adoption (Master Analytics Engineering, Building and Scaling a Data Team).
A dbt migration or refactor project works when the starting point is messy SQL, duplicated dashboard logic, or spreadsheet-defined metrics. Refactor the logic into model layers and add tests, docs, lineage, and a deployment note. Use reusable macros only where they remove duplication.
Nikola Maksimovic grounds this in a real dbt migration, LookML reporting, wide-versus-narrow tables, and incrementalization tradeoffs (18:34-33:46). Christopher Bergh adds the DataOps standard. He covers version control and tests. He also covers CI/CD, runbooks, documentation, and end-to-end versioning (33:47-51:21).
A product analytics project should start with events, not charts. Write a tracking plan, then simulate or instrument events. Model user journeys and publish activation, retention, funnel, or experiment metrics.
Arpit Choudhury names signup and project-created events as SaaS examples. Invite and invoice events fit there too. He then connects collection and storage with transformation, analysis, and activation (13:34-30:03). Nikola Maksimovic shows why marketing and product domain knowledge matter for funnels, retention, RFM, and A/B testing (38:27-41:50).
A reverse ETL or activation project is useful when the portfolio needs to show operational consequences. Model a customer or account segment in the warehouse. Then push it to a mock CRM, support tool, or marketing destination. Document ownership, refresh cadence, and privacy assumptions. Also explain the consequence of a wrong segment.
Arpit Choudhury covers reverse ETL and product-led activation in Data-Led Growth Stack at 37:25-56:08. Natalie Kwong covers warehouse tables flowing back into operational systems in ETL, ELT, and the Modern Data Stack at 35:42.
A hiring-focused fundamentals project should go deep on SQL and modeling before adding tools. Jeff Katz places an analytics-engineering module around dbt, Snowflake, Mode, and Fivetran. He also emphasizes SQL mastery and window functions.
Katz also treats OLTP versus OLAP and sample database modeling practice as fundamentals (36:18-45:14). Junior candidates can win with a smaller project. Strong grain definitions, tests, docs and SQL explanations can beat a broad stack that hides the modeling decisions.
Portfolio Proof
Review the project as if another analyst must maintain it next month.
They should be able to find these answers from the repository, dashboard, and docs:
- Row grain: state what one row represents and which joins preserve or change that grain, because Juan Manuel Perafan ties this modeling question to representing business reality (11:03-20:21).
- Source fields: name fields from user events, backend systems, ads platforms, finance tools, and manual inputs. Arpit Choudhury makes source awareness and tracking-plan ownership part of data-led work (10:45-20:47).
- Model layers: explain why the models are layered this way and where business logic lives, as Natalie Kwong separates warehouse storage, transformations, and marts in the modern stack (15:30-18:47).
- Failure rules: state which assumptions fail the build and which assumptions warn a human, since Victoria Perez Mola discusses dbt checks and upstream checks. She also covers warnings and errors (38:53).
- Documentation: make owners, purpose, caveats, and columns findable. Include dependencies and example queries because Tammy Liang uses a Notion wiki plus dashboard checks. Workshops make data work adopted outside the data team (22:32 and 49:00).
- Consumption: make the dashboard or activation flow use shared models instead of embedded duplicate metric logic. Nikola Maksimovic connects Looker, LookML, dbt migration, and product analytics in the same BI stack (20:34-23:12).
- Reconciliation: explain how a stakeholder would reconcile changed numbers after a migration or source fix. Barr Moses connects schema changes, lineage, ownership, and SLAs to data reliability (19:10-35:24 and 58:51).
Anti-Patterns
Avoid a dashboard built directly from raw tables with metric logic hidden in charts. Victoria Perez Mola places analytics-engineering value in modeled data, dbt transformations, and Looker exposure, not in isolated charts (4:05-8:59).
Avoid a dbt repository with many models but no business definitions, tests, owners, or BI consumer. Juan Manuel Perafan argues that the work should map business reality and make the data safer. Tammy Liang shows that adoption, documentation, and trust matter after the models exist (Foundations episode, Building and Scaling a Data Team).
Avoid copying a public template without explaining grain, joins, slowly changing attributes, or incremental logic. Nikola Maksimovic grounds the role in practical data-modeling tradeoffs during a dbt migration, including wide versus narrow tables and incrementalization (30:28-33:46).
Avoid final KPI screenshots without source caveats, data-quality checks, or reconciliation notes. Barr Moses shows how silent failures, schema changes, freshness, and lineage break trust. Ownership matters too when teams only look at the final output (13:40-29:00).
Avoid treating analytics engineering as “SQL plus dashboard.” The archive returns to software practices and tests, then to docs and lineage. Version control, warehouse transformations and adoption matter too (Master Analytics Engineering, Mastering DataOps, Analytics Engineering).
Related Pages
Use these pages for the role, stack, and adjacent portfolio context:
- Analytics Engineering
- Analytics Engineering Roadmap
- Data Analyst vs Analytics Engineer
- Marketing to Analytics Engineering
- Data Engineering Portfolio Projects
- Product Analytics
- Event Tracking
- Metrics
- Data Activation
- Modern Data Stack
- ETL vs ELT
- dbt
- Data Quality and Observability
- DataOps
- Data Product Management
- Job Search