Wiki

Analytics Engineering Portfolio Projects

Archive-backed guidance for analytics engineering portfolio projects that prove SQL modeling, metric ownership, dbt-style tests, documentation, BI readiness, and stakeholder judgment.

Definition

An analytics engineering portfolio project should prove that you can turn messy source data into reusable models and shared metric definitions. The strongest projects show more than SQL or a dashboard. They show table grain, modeled layers, and tests. They show documentation and BI consumption too. They also show the business question behind the model.

The DataTalks.Club analytics-engineering episodes set that bar. Victoria Perez Mola describes daily work around data modeling and data quality. Pipelines and Looker exposure are part of the same job. She also describes dbt handling SQL transformations and tests. Documentation and dependency graphs belong in that workflow too (Master Analytics Engineering at 4:05-10:04).

Juan Manuel Perafan adds that analytics engineering should make business reality match the data. Engineering discipline then makes the result safer (Foundations of the Analytics Engineer Role at 11:03 and 46:34).

This topic covers portfolios aimed at analytics engineering, BI-heavy data roles, and analyst-to-engineer transitions. For ingestion, orchestration, and platform-heavy work, use Data Engineering Portfolio Projects. For learning order, use Analytics Engineering Roadmap.

Start with these role and project pages:

The main podcast anchors are:

Common Definition

Across the archive, a good analytics engineering portfolio project starts with a repeated business question and ends with a trusted analytical surface. The repository should make the path easy to review. Show raw source assumptions and staging models. Add intermediate logic, marts, tests, and docs. Then show a dashboard or query layer that consumes the shared models.

Victoria Perez Mola grounds the role in SQL models that analysts and data scientists can use. Looker is the consumption layer, and dbt is the transformation layer (4:05-8:59). That makes a dashboard-only project weak unless the dashboard sits on reusable models. It also makes a dbt-only project weak unless the models answer a business question and expose definitions to consumers.

Juan Manuel Perafan pushes the definition beyond “between analyst and engineer.” He frames the work as making data reflect business reality, then adding robustness and software-engineering discipline (7:56-16:25 and 46:34). A portfolio should therefore explain why the model represents the business correctly. It should state what one row means and which joins preserve the grain. It should also name accepted edge cases and caveats for stakeholders.

Natalie Kwong places the same work inside ELT. Data arrives first, then analysts or analytics engineers transform it with SQL and dbt. They publish data marts and consumption tables (7:57-18:47 and 31:31). That evidence favors projects that show both source ingestion assumptions and warehouse-side transformations, even when the portfolio isn’t a full data-engineering project.

Guest Differences

Guests differ on whether the portfolio should sell a distinct job title or a way of working. Victoria Perez Mola describes a recognizable analytics-engineer role with modeling, quality, Looker, and dbt. The role also collaborates with analysts, data scientists, and backend engineers (14:34-20:52 and 33:02).

Juan Manuel Perafan is more cautious about defining the role only by the gap between analysts and engineers. His evidence points portfolio builders toward the craft: modeling business reality and testing dashboards. Rigor in data workflows matters too (7:56-11:03 and 38:41-46:34).

Nikola Maksimovic shows a transition version of the portfolio, and the proof didn’t start as a public repository. It started with marketing reporting, BI-team conversations, and Looker work. SQL practice and BI projects happened alongside marketing work. The later role included dbt migration, LookML, product analytics, and A/B testing (7:18-23:12 and 38:27).

That evidence supports portfolios that turn domain knowledge into modeled metrics instead of treating domain context as background.

Arpit Choudhury widens the project boundary toward growth systems. His episode connects tracking plans, event collection, warehouse transformations, and BI to activation work. Reverse ETL then sends modeled data to support, sales, and engagement tools (13:34-30:03 and 37:25-46:13). That makes activation projects legitimate analytics-engineering evidence when you document event ownership, data meaning, and downstream consequences.

Project Evidence

A strong project makes the analytical promise visible. Start with a consumer such as a growth manager, finance stakeholder, operations team, or product manager. Then show how the modeled data supports that decision.

Tammy Liang gives a useful adoption test. Her first data-team work focused on business-health monitoring, streamlined reporting, and rebuilding trust. The stack later included Stitch, GCP, dbt, and Data Studio. A Notion wiki documented the work. The team then added testing, monitoring, and workshops for adoption (7:22-22:32 and 35:38-49:00).

For a portfolio, the README shouldn’t stop at “here’s the dashboard.” It should show who uses the dashboard and what changed from the old spreadsheet or duplicated query. It should also show how another analyst finds the definitions.

The minimum evidence set is:

Project Types

A metric mart and dashboard project is the clearest portfolio option. Pick a domain with repeatable decisions. Good domains include subscriptions and ecommerce. Marketing spend and SaaS usage also work. Logistics and finance can work too.

Build source models first before adding staging tables plus facts and dimensions. Define KPIs and add tests. Publish one documented dashboard that uses only the modeled layer.

This matches Victoria Perez Mola on modeling and Looker exposure. It also matches Tammy Liang on dashboards, documentation and adoption (Master Analytics Engineering, Building and Scaling a Data Team).

A dbt migration or refactor project works when the starting point is messy SQL, duplicated dashboard logic, or spreadsheet-defined metrics. Refactor the logic into model layers and add tests, docs, lineage, and a deployment note. Use reusable macros only where they remove duplication.

Nikola Maksimovic grounds this in a real dbt migration, LookML reporting, wide-versus-narrow tables, and incrementalization tradeoffs (18:34-33:46). Christopher Bergh adds the DataOps standard. He covers version control and tests. He also covers CI/CD, runbooks, documentation, and end-to-end versioning (33:47-51:21).

A product analytics project should start with events, not charts. Write a tracking plan, then simulate or instrument events. Model user journeys and publish activation, retention, funnel, or experiment metrics.

Arpit Choudhury names signup and project-created events as SaaS examples. Invite and invoice events fit there too. He then connects collection and storage with transformation, analysis, and activation (13:34-30:03). Nikola Maksimovic shows why marketing and product domain knowledge matter for funnels, retention, RFM, and A/B testing (38:27-41:50).

A reverse ETL or activation project is useful when the portfolio needs to show operational consequences. Model a customer or account segment in the warehouse. Then push it to a mock CRM, support tool, or marketing destination. Document ownership, refresh cadence, and privacy assumptions. Also explain the consequence of a wrong segment.

Arpit Choudhury covers reverse ETL and product-led activation in Data-Led Growth Stack at 37:25-56:08. Natalie Kwong covers warehouse tables flowing back into operational systems in ETL, ELT, and the Modern Data Stack at 35:42.

A hiring-focused fundamentals project should go deep on SQL and modeling before adding tools. Jeff Katz places an analytics-engineering module around dbt, Snowflake, Mode, and Fivetran. He also emphasizes SQL mastery and window functions.

Katz also treats OLTP versus OLAP and sample database modeling practice as fundamentals (36:18-45:14). Junior candidates can win with a smaller project. Strong grain definitions, tests, docs and SQL explanations can beat a broad stack that hides the modeling decisions.

Portfolio Proof

Review the project as if another analyst must maintain it next month.

They should be able to find these answers from the repository, dashboard, and docs:

Anti-Patterns

Avoid a dashboard built directly from raw tables with metric logic hidden in charts. Victoria Perez Mola places analytics-engineering value in modeled data, dbt transformations, and Looker exposure, not in isolated charts (4:05-8:59).

Avoid a dbt repository with many models but no business definitions, tests, owners, or BI consumer. Juan Manuel Perafan argues that the work should map business reality and make the data safer. Tammy Liang shows that adoption, documentation, and trust matter after the models exist (Foundations episode, Building and Scaling a Data Team).

Avoid copying a public template without explaining grain, joins, slowly changing attributes, or incremental logic. Nikola Maksimovic grounds the role in practical data-modeling tradeoffs during a dbt migration, including wide versus narrow tables and incrementalization (30:28-33:46).

Avoid final KPI screenshots without source caveats, data-quality checks, or reconciliation notes. Barr Moses shows how silent failures, schema changes, freshness, and lineage break trust. Ownership matters too when teams only look at the final output (13:40-29:00).

Avoid treating analytics engineering as “SQL plus dashboard.” The archive returns to software practices and tests, then to docs and lineage. Version control, warehouse transformations and adoption matter too (Master Analytics Engineering, Mastering DataOps, Analytics Engineering).

Use these pages for the role, stack, and adjacent portfolio context: