Welcome to CDI Data Science Foundations in Python

Welcome to Complex Data Insights (CDI) — a practical learning pathway designed to teach real data skills through clear explanations, structured lessons, and hands-on practice. This guide focuses entirely on Python, the most widely used language in data analytics, machine learning, and scientific computing.

Whether you are new to data science or returning to data work, this guide will help you build skills progressively — from foundational concepts to professional workflows.


What You Will Learn

Across this guide, you will learn how to:

  • Think analytically about data problems
  • Load, clean, and structure datasets
  • Explore data through visualization
  • Apply essential statistical reasoning
  • Prepare data for machine learning
  • Work with realistic, real-world datasets
  • Build reproducible, end-to-end workflows

Each lesson builds on the previous one, forming a clear and logical progression.


Course Structure

The CDI Data Science Foundations path is divided into two parts.

Free Track (Lessons 01–06)

The free track introduces the core foundations of data science with Python, including:

  • Environment setup and workflow
  • Loading and inspecting datasets
  • Data cleaning fundamentals
  • Introductory data wrangling
  • Visualization basics
  • Summary statistics and insight generation

These lessons build confidence with essential data tools and concepts.


Premium Track (Lessons 07–19)

The premium track builds on the foundations and moves into applied, real-world data science, including:

  • Intermediate and advanced data wrangling
  • Working with realistic, messy datasets
  • Advanced exploratory data analysis
  • Feature engineering
  • Machine learning (classification and regression)
  • Model evaluation and tuning
  • End-to-end projects and deployment workflows

Note: Files labeled 06x* and 19x* are transition or completion pages, not lessons.


How to Use This Guide

This guide is designed to be:

  • Self-paced — learn at your own speed
  • Hands-on — practice with real code and datasets
  • Modular — revisit lessons as needed
  • Accessible — explained step by step, with context

Lesson Workflow

Each lesson follows a consistent structure:

  1. Concept explanation
  2. Guided code examples
  3. Output interpretation
  4. Short exercises
  5. Key takeaways
  6. Next steps

Data and Project Structure

Throughout the guide:

  • All datasets are stored in a data/ directory
  • Lesson 01 generates the first dataset (iris.csv)
  • Additional datasets are introduced as needed in later lessons

This consistent structure supports reproducibility and clarity.


About This Guide

This guide follows a human-first, carefully curated approach.
AI tools were used as assistants during drafting and refinement,
with all final content reviewed and approved by a human.