Appendix

  • ID: DS-APP
  • Type: Reference
  • Audience: Public
  • Theme: Environment, structure, and reproducibility reference

This appendix collects technical reference material that supports the Foundations Track.

It documents the project structure, environment setup, rendering workflow, and reusable patterns used throughout the guide.


Project Structure Overview

A standard CDI Quarto-first project looks like this:

data-science/
├── index.qmd
├── 00-preface.qmd
├── 01-setting-up-environment.qmd
├── ...
├── data/
├── figures/
├── cdi_viz/
├── scripts/
│   └── bash/
├── docs/
├── _quarto.yml
└── requirements.txt

Key Directories

  • data/
    Stores raw and cleaned datasets.

  • figures/
    Stores saved plots generated during rendering.

  • cdi_viz/
    Contains reusable visualization utilities (e.g., theme.py).

  • scripts/bash/
    Contains helper scripts such as setup-env.sh and build.sh.

  • docs/
    Contains the rendered Quarto website (GitHub Pages output).


Environment Setup Reference

This project uses a local virtual environment (.venv) for reproducibility.

Create environment manually

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Render the book

quarto render

Or use the helper script:

bash scripts/bash/build.sh

Quarto Workflow Summary

The workflow for this guide is:

Python code
    ↓
Quarto chapter (.qmd)
    ↓
quarto render
    ↓
docs/ (static site)

All figures and outputs are generated during rendering.


Reusable Pandas Patterns

Select columns

df[["col1", "col2"]]

Filter rows

df[df["col"] > value]

Group and aggregate

df.groupby("group_col").agg(
    metric=("value_col", "mean")
)

Handle missing values

df["col"] = df["col"].fillna(df["col"].median())

Convert dtype

df["category_col"] = df["category_col"].astype("category")

Plot Saving Pattern

All plots are saved using the CDI visualization utility:

from cdi_viz.theme import show_and_save_mpl

show_and_save_mpl(fig)

This ensures figures are saved consistently into the figures/ directory.


Reproducibility Checklist

Before finalizing any chapter:

  • Confirm no missing values remain (unless justified)
  • Confirm no unintended duplicates
  • Confirm correct data types
  • Confirm figures render correctly
  • Rebuild the book (quarto render)
  • Open docs/index.html to verify output

Closing Note

The goal of this Foundations Track is not just to teach syntax.

It is to teach structure, discipline, and reproducibility.

The same workflow you used here can be applied to:

  • New datasets
  • New domains
  • Larger analytical projects
  • Future CDI guides