Preface

Published

Jun 2026

  • ID: DS-000
  • Type: Preface
  • Audience: Beginner to Intermediate
  • Theme: From tidy data to reusable analytical thinking

System Role

The CDI Data Science Foundations System is a reusable parent system for tidy-table analysis across CDI pathways.

It provides a Python-based workflow for inspecting, cleaning, wrangling, visualizing, summarizing, and interpreting tidy, analysis-ready tables.

Any CDI pathway that produces structured tables can reuse this system, including:

  • Omics result tables
  • Clinical and medical cohort tables
  • AI evaluation tables
  • Decision-support tables
Tidy analysis-ready table
        ↓
Data Science Foundations System
        ↓
inspection, cleaning, wrangling, visualization, summary, interpretation

Why This System Exists

The Data Science Foundations System exists to provide a reusable analytical foundation for the wider Complex Data Insights ecosystem.

Across CDI pathways, the raw data may look very different:

  • sequencing files in omics
  • abundance tables in microbiome analysis
  • protein result tables in proteomics
  • cohort tables in clinical and medical data
  • evaluation tables in AI and decision systems

But once pathway-specific processing is complete, many workflows converge into the same starting point:

a tidy, analysis-ready table.

This system begins at that point.

It teaches the reusable skills needed to inspect, clean, transform, visualize, summarize, and interpret structured data tables.

The goal is not only to learn Python.

The goal is to develop a repeatable way of thinking about data so that the same foundations can be reused across CDI pathways.


The Role of Data Science Foundations in CDI

Within the CDI architecture, this system acts as the shared parent layer for table-based analysis.

CDI pathway-specific processing
        ↓
Tidy, analysis-ready tables
        ↓
Data Science Foundations System
        ↓
Analysis, visualization, summary, and interpretation

For example:

RNA-Seq differential expression table
        ↓
Data Science Foundations System

Microbiome diversity or abundance table
        ↓
Data Science Foundations System

Proteomics differential protein table
        ↓
Data Science Foundations System

Clinical cohort table
        ↓
Data Science Foundations System

AI evaluation table
        ↓
Data Science Foundations System

This makes the system reusable across domains.

It does not replace domain-specific systems. Instead, it supports them once they produce structured tables that are ready for analysis.


What You Will Learn

This system introduces a practical workflow for working with tidy data in Python.

You will learn how to:

  • set up a basic Python analysis environment
  • load and inspect datasets
  • understand rows, columns, variables, and values
  • identify missing values and inconsistent data
  • clean and prepare data for analysis
  • filter, select, group, and summarize data
  • create basic visualizations
  • interpret patterns carefully
  • communicate findings clearly
  • understand how analysis connects to later modeling workflows

The focus is on building confidence with the core operations that appear repeatedly in real analytical work.


What This System Does Not Cover

This is a foundations system, so it intentionally avoids going too deep into advanced modeling or production workflows.

It does not focus on:

  • advanced machine learning
  • model deployment
  • APIs
  • monitoring and drift
  • deep learning
  • production software engineering

Those topics belong in the Applied Data Science System and later CDI AI and decision systems.

Here, the priority is to build the reusable base: tidy data, inspection, cleaning, transformation, visualization, summary, and interpretation.


How This Guide Is Structured

Each chapter follows a practical pattern:

  1. Concept
    What the chapter is about and why it matters.

  2. Workflow
    The steps used to work through the data.

  3. Code
    A practical Python implementation.

  4. Interpretation
    What the output means.

  5. Summary
    The key ideas to retain.

This structure is intentional.

Data science is not only about writing code. It is about knowing what the code is doing, why it matters, and how to interpret the result responsibly.


How to Approach This System

Work through the chapters in order if you are new to structured data analysis.

If you already have experience, you can use the system as a reference for CDI-style table-based workflows.

As you work through the guide, focus on three habits:

  • inspect the data before analyzing it
  • document the choices you make
  • interpret outputs in the context of the original question

These habits are simple, but they are central to reliable data work.


The CDI Foundations Workflow

The core workflow in this system is:

Code
flowchart TB
  A[Question or Objective] --> B[Load Data]
  B --> C[Inspect Structure]
  C --> D[Clean & Prepare]
  D --> E[Wrangle & Transform]
  E --> F[Visualize Patterns]
  F --> G[Summarize Results]
  G --> H[Interpret Findings]
  H --> I[Communicate Clearly]

flowchart TB
  A[Question or Objective] --> B[Load Data]
  B --> C[Inspect Structure]
  C --> D[Clean & Prepare]
  D --> E[Wrangle & Transform]
  E --> F[Visualize Patterns]
  F --> G[Summarize Results]
  G --> H[Interpret Findings]
  H --> I[Communicate Clearly]

This workflow is intentionally simple.

It is the foundation that later supports more advanced work such as modeling, evaluation, decision-making, and domain-specific interpretation.


Looking Ahead

By the end of this system, you should be able to take a tidy, analysis-ready table and move through a complete foundational workflow:

load → inspect → clean → wrangle → visualize → summarize → interpret → communicate

That workflow becomes a reusable base for the rest of the CDI ecosystem.