Welcome to CDI Data Science Foundations in Python

Welcome to Complex Data Insights (CDI), a practical learning pathway designed to teach real data skills through clear explanations, structured lessons, and hands-on practice.

This guide is Python-first and Quarto-first.

Each chapter is written as a .qmd file. When you render the book, Quarto executes the Python chunks and embeds the results (tables, figures, and printed output) directly into the published pages. This keeps the guide reproducible and consistent.


What You Will Learn

Across this guide, you will learn how to:

  • Think analytically about data problems
  • Load, clean, and structure datasets
  • Explore data through visualization
  • Apply essential statistical reasoning
  • Prepare data for machine learning
  • Work with realistic datasets
  • Build reproducible, end-to-end workflows

Each lesson builds on the previous one, forming a clear progression.


Course Structure

The CDI Data Science Foundations path is divided into two parts.

Foundations Track (Lessons 01–06)

This track introduces the core foundations of data science with Python, including:

  • Environment setup and workflow
  • Loading and inspecting datasets
  • Data cleaning fundamentals
  • Introductory data wrangling
  • Visualization basics
  • Summary statistics and insight generation

These lessons are designed to build confidence with essential tools and concepts.

Extended Track (Premium)

The extended track builds on the foundations and moves into applied, real-world data science, including:

  • Intermediate and advanced data wrangling
  • Working with realistic, messy datasets
  • Advanced exploratory data analysis
  • Feature engineering
  • Machine learning (classification and regression)
  • Model evaluation and tuning
  • End-to-end projects and deployment workflows

About Figures and Reproducibility

All figures in this guide are generated programmatically during rendering. This ensures that plots are reproducible, consistent across lessons, and rendered reliably in the published materials.


References and Citations

When a lesson draws on external sources, in-line citations appear within the text, with the corresponding references listed at the end of that lesson page. A complete list of references used throughout the guide is also collected in the References section at the end of the guide.


How to Use This Guide

This guide is designed to be:

  • Self-paced
  • Hands-on
  • Modular
  • Accessible

Lesson Workflow

Each lesson follows a consistent structure:

  1. Concept explanation
  2. Guided code examples
  3. Output interpretation
  4. Short exercises
  5. Key takeaways
  6. Next steps

Data and Project Structure

Throughout the guide:

  • Datasets are stored in a consistent data/ directory
  • Early lessons generate data/iris.csv automatically
  • Additional datasets are introduced as needed in later lessons

About This Guide

This guide follows a human-first, carefully curated approach. AI tools were used as assistants during drafting and refinement, with all final content reviewed and approved by a human.