Welcome to CDI Data Science Foundations in Python
Welcome to Complex Data Insights (CDI), a practical learning pathway designed to teach real data skills through clear explanations, structured lessons, and hands-on practice.
This guide is Python-first and Quarto-first.
Each chapter is written as a .qmd file. When you render the book, Quarto executes the Python chunks and embeds the results (tables, figures, and printed output) directly into the published pages. This keeps the guide reproducible and consistent.
What You Will Learn
Across this guide, you will learn how to:
- Think analytically about data problems
- Load, clean, and structure datasets
- Explore data through visualization
- Apply essential statistical reasoning
- Prepare data for machine learning
- Work with realistic datasets
- Build reproducible, end-to-end workflows
Each lesson builds on the previous one, forming a clear progression.
Course Structure
The CDI Data Science Foundations path is divided into two parts.
Foundations Track (Lessons 01–06)
This track introduces the core foundations of data science with Python, including:
- Environment setup and workflow
- Loading and inspecting datasets
- Data cleaning fundamentals
- Introductory data wrangling
- Visualization basics
- Summary statistics and insight generation
These lessons are designed to build confidence with essential tools and concepts.
About Figures and Reproducibility
All figures in this guide are generated programmatically during rendering. This ensures that plots are reproducible, consistent across lessons, and rendered reliably in the published materials.
References and Citations
When a lesson draws on external sources, in-line citations appear within the text, with the corresponding references listed at the end of that lesson page. A complete list of references used throughout the guide is also collected in the References section at the end of the guide.
How to Use This Guide
This guide is designed to be:
- Self-paced
- Hands-on
- Modular
- Accessible
Lesson Workflow
Each lesson follows a consistent structure:
- Concept explanation
- Guided code examples
- Output interpretation
- Short exercises
- Key takeaways
- Next steps
Data and Project Structure
Throughout the guide:
- Datasets are stored in a consistent
data/directory
- Early lessons generate
data/iris.csvautomatically
- Additional datasets are introduced as needed in later lessons