Goal: Master Seaborn, a high-level Python visualization library built on Matplotlib, to create sophisticated statistical plots. Learn to visualize distributions, relationships, and categorical data for deeper insights in AI/ML workflows.
1. What is Seaborn?
Seaborn simplifies complex data visualization by providing:
Statistical Plotting: Built-in functions for distributions, correlations, and regressions.
Aesthetic Defaults: Attractive themes and color palettes.
Pandas Integration: Directly plot from DataFrames.
Multi-Plot Grids: Create complex visualizations with minimal code.
Why Use Seaborn Over Matplotlib?
Fewer lines of code for complex plots.
Better handling of DataFrames and categorical data.
Advanced tools for exploring relationships (e.g., heatmaps, pair plots).
2. Installation & Setup
pip install seaborn
Import libraries:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
3. Key Features
1. Statistical Distributions
Distributions: Histograms, KDE plots, rug plots.
Regression: Visualize trends with confidence intervals.
Categorical Data: Box plots, violin plots, bar plots.
2. Built-in Themes
Set aesthetics globally:
sns.set_theme(style="whitegrid", palette="pastel", font_scale=1.2)
Themes: darkgrid
, whitegrid
, dark
, white
, ticks
.
3. Color Palettes
Use predefined or custom palettes:
sns.color_palette("husl", 8) # HUSL palette with 8 colors
sns.set_palette("viridis") # Set global palette
4. Advanced Visualization Techniques
1. Distribution Plots
KDE (Kernel Density Estimate) Plot:
sns.kdeplot(data=df, x="Age", hue="Survived", fill=True)
plt.title("Age Distribution by Survival")
Joint Plot:
sns.jointplot(data=df, x="Age", y="Fare", kind="hex")
2. Categorical Plots
Box Plot:
sns.boxplot(data=df, x="Pclass", y="Fare", hue="Survived")
Violin Plot:
sns.violinplot(data=df, x="Pclass", y="Age", split=True, hue="Sex")
Swarm Plot:
sns.swarmplot(data=df, x="Pclass", y="Age", hue="Sex", dodge=True)
3. Matrix Plots
Heatmap:
corr = df.corr()
sns.heatmap(corr, annot=True, cmap="coolwarm", vmin=-1, vmax=1)
Cluster Map:
sns.clustermap(corr, cmap="viridis", standard_scale=1)
4. Pair Plots
Visualize pairwise relationships:
sns.pairplot(df, hue="Survived", diag_kind="kde")
5. FacetGrid
Multi-plot grids based on data subsets:
g = sns.FacetGrid(df, col="Pclass", row="Sex", hue="Survived")
g.map(sns.scatterplot, "Age", "Fare").add_legend()
5. Customizing Plots
Titles & Labels
plt.title("Custom Title", fontsize=14, fontweight="bold")
plt.xlabel("X-Axis Label", fontsize=12)
plt.ylabel("Y-Axis Label", fontsize=12)
Annotations
Highlight specific data points:
ax = sns.barplot(data=df, x="Pclass", y="Fare")
ax.bar_label(ax.containers[0], fmt="%.1f") # Add values to bars
Themes & Context
sns.set_context("paper") # Options: "talk", "poster", "notebook"
6. Real-World Use Cases in AI/ML
Feature Analysis:
Visualize feature distributions (
sns.histplot
).Identify correlations (
sns.heatmap
).
Model Evaluation:
Plot ROC curves (
sns.lineplot
).Compare metrics across classes (
sns.barplot
).
Clustering:
Visualize clusters with pair plots (
sns.pairplot
).
Time Series:
Plot trends with confidence intervals (
sns.lineplot
).
7. Best Practices
Choose the Right Plot:
Relationships: Scatter plots, line plots.
Distributions: Histograms, KDE plots.
Categorical Data: Box plots, bar plots.
Simplify: Avoid clutter (e.g., too many hues or categories).
Color Wisely: Use palettes that are colorblind-friendly (e.g.,
"colorblind"
).
8. Practice Exercise
Load the Titanic dataset (
sns.load_dataset("titanic")
).Create a pair plot colored by survival.
Plot a heatmap of correlations between numeric features.
Use
FacetGrid
to visualize age distributions by class and gender.
Solution:
# 1. Load data
titanic = sns.load_dataset("titanic")
# 2. Pair plot
sns.pairplot(titanic, hue="survived", vars=["age", "fare", "pclass"])
# 3. Heatmap
corr = titanic.corr(numeric_only=True)
sns.heatmap(corr, annot=True, cmap="coolwarm")
# 4. FacetGrid
g = sns.FacetGrid(titanic, col="pclass", row="sex")
g.map(sns.histplot, "age", kde=True).add_legend()
Key Takeaways
Seaborn simplifies statistical visualization with minimal code.
Use FacetGrid and pair plots to explore multi-dimensional relationships.
Customize themes and palettes for professional-quality visuals.