A producer's curriculum · verified 2 Jul 2026

Orchestration Lessons

What every instrument of the data stack can and cannot do: its range, its cost model, and when it mixes. No implementation required. Every link checked live.


Listening progress 0.0 h of ~28 h

Core track is ~14 h; depth shelves take it to ~28 h. Sections appear in playing order: 0 · 2 · 1 · 3 · 4 · 5 · 6. Section 4 is the hinge between the data stack and the AI layer.

Section 0 · Play first

Foundations

Core ~1 h of video · budget 2.5 h with pauses

How to hear music at all. Why the ERP can't be queried directly, what a lakehouse actually is, and why open table formats prevent vendor lock-in.

  1. What is a database in under 4 minutes
    VideoLinux Academy4 min2019

    Say what a database is (organized, queryable, multi-user) versus a spreadsheet or loose files.

  2. SQL Explained in 100 Seconds
    VideoFireship2 min2021

    Explain SQL as a declarative language for asking questions of tables, and why every stack tool speaks it.

  3. 7 Database Paradigms
    VideoFireship10 min2020

    Hear "we use Postgres / Mongo / a graph DB" and judge fit to workload.

  4. OLAP vs OLTP
    VideoIBM Technology5 min2022

    The most load-bearing concept in the curriculum: why analysts can't query the live ERP, and why a separate analytical store exists.

  5. What is Object Storage?
    VideoIBM Technology10 min2021

    Say why the modern stack parks everything in cheap object storage, and the tradeoffs.

  6. Data Lakehouses Explained
    VideoIBM Technology9 min2023

    Narrate 30 years of analytical storage: warehouse, lake, lakehouse, and why the last isn't a buzzword mashup.

  7. Data Lake vs. Warehouse vs. Lakehouse: Which One to Choose?
    VideoIBM Technology8 min2025

    Ask a client "warehouse, lake, or lakehouse, and why?" and evaluate the answer.

  8. Apache Iceberg: What It Is and Why Everyone's Talking About It
    VideoConfluent Developer14 min2025

    Explain what Iceberg/Delta adds to a lake (transactions, schema evolution, time travel) and why open formats kill lock-in.

Depth shelf

  1. What is Apache Iceberg?
    VideoIBM Technology13 min2024

    A second angle: what broke in Hive-era data lakes.

  2. Apache Iceberg Overview (Jan 2024 Edition)
    TalkDremio · Alex Merced63 min2024

    The full ecosystem tour: catalogs, engines, lock-in risk, at conceptual altitude.

  3. Apache Iceberg Explained + Iceberg vs Delta Lake
    ReadDataCamp · free~16 min

    Fills the "files vs tables" gap no video covers well.

  4. CMU Intro to Database Systems · Lecture 1
    LectureAndy Pavlo84 min2024

    Optional academic grounding. Stop after lecture 1; the rest is engineering.

Section 2 · Play second

The Store and Its Engines

Core ~2.5 h

Where the tiers diverge most. The single most valuable idea here: most companies' data fits on one machine, and the industry sold them distributed systems anyway.

  1. Databases vs Data Warehouses vs Data Lakes
    VideoSeattle Data Guy15 min2022

    Say why the app's Postgres isn't a warehouse, and when "lake" is marketing.

  2. Big Data is Dead
    TalkJordan Tigani · Data Council26 min2023

    From a founding BigQuery engineer: push back when a vendor sizes a $10M-$50M company for petabytes. Written version.

  3. DuckDB and MotherDuck for Beginners: Your Ultimate Guide
    VideoMotherDuck36 min2025

    Describe when DuckDB/MotherDuck covers a mid-size company entirely, and its ceiling (concurrency, enterprise governance).

  4. What Is Snowflake: How Snowflake Credits Work and More
    VideoSeattle Data Guy20 min2025

    Explain 60-second minimum billing, idle warehouses, and why Snowflake bills surprise CFOs.

  5. How to Get Started with BigQuery
    DemoGoogle Cloud Tech17 min2022

    Contrast per-query-scan vs per-second-compute pricing, and each one's failure mode.

  6. Intro to Databricks: What Is Databricks
    VideoSeattle Data Guy12 min2022

    Say what Databricks adds over a warehouse, and why a BI-only company doesn't need it.

  7. Microsoft Fabric Explained in Less Than 10 Minutes
    VideoGuy in a Cube9 min2025

    Identify when a company picks Fabric for fit versus licensing inertia.

  8. Snowflake vs Databricks, and the Battle for Iceberg
    VideoSeattle Data Guy10 min2024

    Hold your own in a "Snowflake or Databricks" boardroom conversation; open table formats are erasing the moats.

Depth shelf

  1. Master Databricks and Apache Spark
    PlaylistBryan Cafferkyfirst 4-5 lessons ≈ 60 min2021-2023

    Distinguish Spark-the-engine from Databricks-the-business. Concept lessons only.

  2. The Death of Big Data · Jordan Tigani
    PodcastMAD Podcast · Matt Turck59 min2024

    The BigQuery-insider war stories the talk omits.

  3. Big Data is Dead: Long Live Hot Data
    TalkMotherDuck25 min2024

    The sequel: what matters is the hot recent slice, not total volume.

  4. PostgreSQL as a Data Warehouse: When It Works, When It Doesn't
    ReadDefinite~10 min2025

    Row vs column storage and the ~1-2 TB comfort zone. No good video exists for this.

  5. DuckDB in 100 Seconds + MotherDuck in 100 Seconds (by a duck)
    VideoFireship · MotherDuck4 min total

    Two-minute warm-ups before a client meeting.

Section 1 · Play third

Extraction: ETL and ELT

Core ~55 min

Getting the music out of the 20 systems. Connectors, batch versus streaming, and the loop back into operations.

  1. What is ETL?
    VideoIBM Technology5 min2021

    Define ETL and explain why data must move before analysis.

  2. ETL vs ELT: Powering Data Pipelines for AI and Analytics
    VideoIBM Technology7 min2025

    Say why the industry flipped to load-then-transform, and spot a vendor selling legacy architecture.

  3. What Is The Modern Data Stack
    VideoSeattle Data Guy9 min2022

    Draw a client's future stack and name each tool's category. Anchors Section 6 too; watch once.

  4. How Fivetran Works: A Look Under the Hood
    VideoFivetran2 min2023

    Explain what you pay Fivetran for: schema drift, incremental syncs, zero maintenance, and its price.

  5. Low Code Data Ingestion Tools Compared: Fivetran, Airbyte and More
    VideoThe Data and AI Guy12 min2025

    Tell a client when Fivetran's per-row pricing bites and when self-hosted Airbyte makes sense.

  6. Stream vs Batch Processing Explained with Examples
    VideoAndreas Kretz9 min2021

    Push back on "real-time" dashboard demands; batch is right for ~90% of mid-market reporting.

  7. What is Reverse ETL? Explained in 3 Minutes
    VideoHightouch4 min2022

    Explain "warehouse data pushed back into the CRM" and name Hightouch and Census.

  8. What Is A Reverse ETL, And Why Is The Modern Data Stack Obsessed With It?
    VideoSeattle Data Guy9 min2022

    The skeptical counterweight: judge whether a client needs reverse ETL or just a cleaner CRM.

Depth shelf

  1. dlt: the Python data extraction and loading tool
    VideoBugBytes24 min2025

    Watch for concepts, skim the code: a free library replaces a five-figure Fivetran bill if you have one engineer.

  2. If Extraction from SAP were easy…
    ReadDavid Richert · Snowflake blog~15 min2022

    Why SAP extraction is hard (licensing walls, 100k+ cryptic tables, delta logic). No video equivalent exists.

  3. Fivetran + dbt Labs Merger Explained
    VideoKestra3 min2025

    Market context: the extraction and transformation leaders are now one company.

Section 3 · Play fourth

Transformation and Modeling

Core ~1 h 20 min

The arrangement: dbt as the portable notation, and the Kimball discipline of translating a business into a queryable model. This is composition theory.

  1. What is dbt?
    Videodbt Labs2 min2024

    Say "dbt is the T in ELT: SQL transformations with software-engineering discipline."

  2. Intro to Data Build Tool (dbt)
    VideoKahan Data Solutions15 min2020 · timeless

    Recognize a dbt project; know what models, tests and docs mean there.

  3. dbt vs Stored Procedures: 3 Key Differences
    VideoKahan Data Solutions10 min2025

    Explain to a CFO why stored-procedure spaghetti is a liability, and what version control, testing and lineage buy.

  4. What is Apache Airflow? For Beginners
    VideoData with Marc12 min2023

    Explain orchestration as "the thing that runs pipeline steps in order and complains on failure."

  5. Dagster Data Orchestration: 10 Minute Walkthrough
    DemoDagster10 min2023

    Say why newer teams pick Dagster while enterprises stay on Airflow.

  6. Data Modeling Tutorial: Star Schema (the Kimball Approach)
    VideoKahan Data Solutions17 min2023

    Sketch a star schema for a client's sales process; use "fact" and "dimension" correctly.

  7. Master Dimensional Modeling · Lesson 01: Why Use a Dimensional Model?
    VideoBryan Cafferky10 min2024

    Argue the business case for dimensional modeling. Full 4-lesson playlist.

  8. What is Master Data Management
    VideoIBM Technology4 min2022

    Diagnose "the same customer exists five times" as an MDM problem, not a BI problem.

Depth shelf

  1. Entity Resolution Explained Step by Step
    VideoSenzing14 min2022

    Why fuzzy-matching customers is hard and "dedupe it in Excel" fails. 2-minute definition.

  2. Airflow vs Dagster: The Full Breakdown
    VideoThe Data and AI Guy15 min2023

    Task-centric-mature vs asset-centric-modern, with tradeoffs.

  3. Coalesce 2025 Opening Keynote: Rewrite
    Keynotedbt Labsfirst ~40 min carry it2025

    State of the transformation layer: post-Fivetran-merger, AI era.

  4. Comparing 3 Types of Data Modeling: Normalized vs Star Schema vs Data Vault
    VideoKahan Data Solutions4 min2023

    So "we use Data Vault" doesn't bluff you.

  5. BI Data Modeling Explained
    VideoData with Baraa2026

    Highly visual modeling explainer from an ex-Mercedes-Benz data platform lead. His free full courses (30 h SQL, data warehouse project) are the best "watch someone actually build a warehouse" material when a lab build is on the agenda.

Section 4 · Play fifth · the hinge

The Semantic Layer

Core ~2.5 h

Where the data stack meets the AI layer, and where the consulting differentiation lives. Memorize one number: raw text-to-SQL plateaus near 64% accuracy; semantic-layer queries hit 98-100%.

  1. What is a Semantic Layer? [Ask an Expert]
    VideoGoodData·AI20 min2023

    Define it in one sentence and explain the "revenue defined five ways" problem it kills.

  2. Semantic Layer vs Text-to-SQL: 2026 Benchmark Update
    Readdbt Labs11 minApr 2026

    Cite the exact accuracy gap when someone says "the model can just write the SQL." The decisive evidence; no video carries it.

  3. The New-Look dbt Semantic Layer, Powered by MetricFlow
    Talkdbt Labs · Coalesce30 min2023

    Judge whether a team's "metrics in the BI tool" setup is a governance liability.

  4. Building Agentic Analytics with Cube
    DemoCube founders39 minApr 2026

    Articulate Cube vs dbt Semantic Layer, and what "agents query the model, not the tables" means operationally.

  5. AI/BI Dashboards and Genie: End-to-End Demo
    DemoDatabricks21 minApr 2025

    Explain a Genie space, why it needs curated datasets, and how the approach differs from dbt and Cube.

  6. Demo: Snowflake Cortex Analyst
    DemoSnowflake Developers27 min · skip the code parts2024

    The killer diligence question: "who maintains the semantic model?" Accuracy comes from curation, not the LLM.

Depth shelf

  1. The Semantic Layer and AI Agents · David Jayatillake
    PodcastMLOps.community #34351 minJan 2026

    The strategic case, defensible against "context windows will fix it."

  2. Building an AI-Ready Semantic Layer in Snowflake Cortex
    VideoAimpoint Digital9 minMay 2026

    Third-party reality check on the modeling effort involved.

  3. What is a Semantic Layer?
    ReadDatabricks blog~8 min

    Spot how each vendor bends the definition toward its own architecture.

Study the original: Palantir

  1. Palantir Ontology Overview
    VideoPalantir official · Chad Wahlquist

    The $300B version of the intersection layer, explained by the company that built it: the business as objects, links and actions, not tables.

  2. Ontology docs: Overview + Why create an Ontology?
    ReadPalantir Foundry docs~20 min

    The design pattern to imitate at one-hundredth the price. Their sharpest line: the Ontology "represents the decisions in an enterprise, not simply the data."

  3. AIPCon 6 customer demos (Mar 2025 playlist)
    PlaylistPalantir officialpick 2-3 demos ≈ 60 min

    Real enterprises demoing AI over their ontology: use cases across manufacturing, healthcare and logistics. Watch a couple and translate each to a mid-size company. Index of editions: palantir.com/aipcon.

Section 5 · Play sixth

The AI Layer

Core ~2 h · kept thin: only what is data-specific

The release. MCP over data stores and the AI-analyst pattern: a frontier model exploring schemas, writing SQL, hitting errors, and self-correcting.

  1. MCP: Understand It, Set It Up, Use It
    WebinarMotherDuck~45 minFeb 2026

    Explain what an MCP server over a database exposes, and why remote managed MCP beats laptop-local for client rollouts.

  2. The MCP Sessions Vol. 1: Sports Analytics
    LivestreamMotherDuck~60 minJan 2026

    The best live footage of the AI-analyst loop over 50M+ rows, warts included. Describe it from having watched it, not from marketing copy.

  3. Inside Meta's Home-Grown AI Analytics Agent
    ReadAnalytics at Meta~10 minMar 2026

    The three ingredients beyond a frontier model: curated context, execution loop, verifiable output. Judge vendor demos against them.

  4. Databricks AI/BI Genie: Complete Features and Hands-On Demo
    DemoDataBeli · independent10 minSep 2025

    A non-Databricks second opinion: compare Genie's curated-space approach with the open MCP approach.

Depth shelf

  1. I Stress-Tested Cube's New AI Analytics Agent. Here's What Happened.
    VideoJoe Reis20 minJan 2026

    The skeptical counterweight; reuse his stress-test questions in any AI-analyst sales demo.

  2. Postgres MCP Guide
    ReadTailscale~10 min

    Postgres and Snowflake MCP have no quality video yet; the pattern transfers 1:1 from the MotherDuck sessions.

Section 6 · Play last · the finale

Ensembles and Anti-Patterns

Core ~3.5 h

Which combinations mix, which are the saxophone in the orchestra, and how each billing model punishes each usage pattern. Lands hardest after the tool sections.

  1. Future of the Modern Analytics Stack · Tristan Handy (dbt CEO)
    PodcastMAD Podcast · Matt Turck48 minFeb 2024

    The man who coined the movement conceding the term outlived its usefulness: why MDS was a zero-interest-rate phenomenon.

  2. The End of the Modern Data Stack · Benn Stancil
    Conversationdbt Labs46 min2024

    The core critique: too many tools, integration tax, too little value for mid-size companies.

  3. The 2025 MAD Landscape with Matt Turck
    PodcastJoe Reis49 minNov 2025

    Situate any vendor a client mentions on the map. The landscape itself.

  4. Do You REALLY Need a Data Warehouse
    VideoSeattle Data Guy15 min2023

    Give a client three concrete signals that they do (or don't yet) need a warehouse.

  5. Build a Data Stack That Lasts
    VideoSeattle Data Guy11 min2024

    Sketch a defensible stack for a $10M vs a $50M company and defend each component.

  6. Snowflake Costs Out of Control? Cut 84% Off Your Bill
    VideoAlex Kargin21 min2025

    Audit-question a client's Snowflake bill without touching a console.

  7. The Real Cost of Cloud Data Warehouses
    VideoClickHouse · vendor, bias noted19 minDec 2025

    Predict which billing model (credits, scan pricing, DBUs) punishes which usage pattern; discount the self-serving conclusion.

Depth shelf

  1. Matt Turck: The 2024 MAD Landscape
    PodcastJoe Reis58 min2024

    Heavier on the Databricks vs Snowflake vs Fabric platform war.

  2. HOW MUCH?! Microsoft Fabric Licensing and Pricing Explained + Your Fabric Capacity Strategy Determines Everything
    VideoLearn MS Fabric with Will · Guy in a Cube18 min total2023 / 2026

    The capacity-unit model; prices partly stale, mechanics current.

  3. Big Data is Dead, Analytics is Alive
    PodcastThe Changelog · Jordan Tigani50 min2024

    Connects "small data" to "smaller stack" in one narrative.

The permanent shelf

Foundational Books

Four books, each the canon of its layer. Read alongside the videos, not instead of them.

  1. The Data Warehouse Toolkit, 3rd Edition
    BookRalph Kimball & Margy Ross · Wiley2013

    The craft of translating a business into a queryable model. The single highest-leverage book for the logical side of data. Read the first ~6 chapters; the rest is a reference by industry.

  2. Fundamentals of Data Engineering
    BookJoe Reis & Matt Housley · O'Reilly2022

    The conceptual map of the whole territory, written for architecture thinking rather than tool operation. The spine text of this curriculum.

  3. Data Management at Scale, 2nd Edition
    BookPiethein Strengholt · O'Reilly2023

    Enterprise-wide data architecture and systems integration: domains, data products, master data. The logic of connecting 20 systems.

  4. AI Engineering
    BookChip Huyen · O'Reilly2024

    The canon of the AI layer: model selection, evaluation, RAG, agents. Most-read book on the O'Reilly platform in 2025.