In many datasets, scatter plots are among the fastest ways to detect group structure.
Grouped Histogram
A grouped histogram helps compare distributions across categories.
Code
fig, ax = plt.subplots(figsize=(8, 5.5))sns.histplot( data=df, x="petal_length", hue="species", bins=15, kde=True, ax=ax)ax.set_title("Petal Length Distribution by Species")ax.set_xlabel("Petal Length")ax.set_ylabel("Count")plt.show()
Iris — Petal Length Distribution by Species
Interpretation
This plot helps compare whether groups differ in:
location
spread
overlap
shape of the distribution
If one group’s values occupy a clearly different range, that variable may be useful for distinguishing between groups.
Pairplot Overview
Pairplots provide a compact view of multiple variable relationships at once.
Code
g = sns.pairplot( df, hue="species", corner=True, plot_kws={"alpha": 0.7})g.fig.suptitle("Iris — Pairplot by Species", y=1.02)plt.show()
Iris — Pairplot by Species
Interpretation
Pairplots help answer broader questions such as:
which variables best separate species?
which features appear strongly related?
which measurements appear redundant?
where groups overlap and where they separate clearly?
This is often one of the most useful first multivariate views of a dataset.
Reading Visual Evidence Carefully
A plot is only useful if it is interpreted carefully.
When reading a figure, ask:
what question does this plot help answer?
what pattern is visible?
how strong is the pattern?
is there overlap, uncertainty, or ambiguity?
does this align with earlier summaries?
Visualization should support reasoning, not replace it.
Visualization Principles
Strong foundational plots share a few key qualities:
clear titles and labels
readable axes
purposeful use of color
minimal clutter
plot choice matched to the question
A histogram is useful for one-variable distributions.
A boxplot is useful for comparing distributions across groups.
A scatter plot is useful for relationships between two variables.
A pairplot is useful for quick multivariate exploration.
Validation Through Visualization
Plots can also function as validation tools.
Use them to check:
whether distributions look plausible
whether outliers need review
whether grouped differences are real or mostly overlap
whether patterns are consistent with earlier cleaning and wrangling steps
Visualization often reveals issues that summary tables alone can miss.
Summary
you used foundational plot types to explore the Iris dataset
you compared distributions within and across species
you examined relationships between numeric variables
you used visual patterns to support interpretation
you practiced moving from chart creation to analytical reasoning
These are core visualization habits that carry forward into more advanced analysis.
Exercise
Try the following:
Create a scatter plot of sepal_width versus petal_width
Plot a histogram of petal_width
Create a boxplot of petal_width by species
Write one sentence describing which feature seems most useful for separating species
Solution
Code
fig, ax = plt.subplots(figsize=(8, 5.5))sns.scatterplot( data=df, x="sepal_width", y="petal_width", hue="species", s=70, alpha=0.8, ax=ax)ax.set_title("Sepal Width vs Petal Width")ax.set_xlabel("Sepal Width")ax.set_ylabel("Petal Width")plt.show()fig, ax = plt.subplots(figsize=(8, 5.5))sns.histplot( data=df, x="petal_width", bins=12, kde=True, ax=ax)ax.set_title("Distribution of Petal Width")ax.set_xlabel("Petal Width")ax.set_ylabel("Count")plt.show()fig, ax = plt.subplots(figsize=(8, 5.5))sns.boxplot( data=df, x="species", y="petal_width", ax=ax)ax.set_title("Petal Width by Species")ax.set_xlabel("Species")ax.set_ylabel("Petal Width")plt.show()print("Your interpretation:")print("Petal-related features appear more useful than sepal-related features for separating species because their group differences are more distinct.")
Your interpretation:
Petal-related features appear more useful than sepal-related features for separating species because their group differences are more distinct.
CDI Insight
Visualization is not about producing more plots.
It is about choosing the right view of the data to support understanding.
A clear plot reduces uncertainty. A poor plot can introduce it.