Boxplots in Matplotlib
Boxplots, also known as box-and-whisker plots, are used to display the distribution of a dataset based on five summary statistics: minimum, first quartile (Q1), median, third quartile (Q3), and maximum. They are helpful in identifying outliers and understanding data variability.
Creating Boxplots
The boxplot()
function in Matplotlib is used to create boxplots.
Example: Basic Boxplot
import matplotlib.pyplot as plt
# Data
data = [7, 8, 5, 6, 9, 10, 6, 8, 7, 5]
# Create boxplot
plt.boxplot(data)
# Add title and labels
plt.title("Basic Boxplot")
plt.ylabel("Values")
# Display the plot
plt.show()
Customizing Boxplots
Matplotlib provides several parameters to customize boxplots:
Parameter | Description | Example Value |
---|---|---|
notch | Draw a notch to represent confidence intervals | True |
vert | Orientation of the boxplot (vertical/horizontal) | True , False |
patch_artist | Fill the box with color | True |
boxprops | Properties of the box | dict(color='blue') |
whiskerprops | Properties of the whiskers | dict(color='red') |
Example: Customized Boxplot
# Data
data = [7, 8, 5, 6, 9, 10, 6, 8, 7, 5]
# Create customized boxplot
plt.boxplot(data, notch=True, patch_artist=True, boxprops=dict(facecolor='lightblue'), whiskerprops=dict(color='green'))
# Add title and labels
plt.title("Customized Boxplot")
plt.ylabel("Values")
# Display the plot
plt.show()
Boxplots with Multiple Datasets
Boxplots can also be used to compare multiple datasets.
Example: Multiple Boxplots
# Data
data1 = [7, 8, 5, 6, 9, 10, 6, 8, 7, 5]
data2 = [6, 7, 8, 5, 6, 7, 6, 5, 7, 6]
# Create boxplots
plt.boxplot([data1, data2], labels=["Dataset 1", "Dataset 2"], patch_artist=True, boxprops=dict(facecolor='lightblue'))
# Add title and labels
plt.title("Boxplots for Multiple Datasets")
plt.ylabel("Values")
# Display the plot
plt.show()
Practical Examples
Example 1: Exam Scores
# Data
scores = [55, 65, 70, 75, 80, 85, 90, 95, 100]
# Create boxplot
plt.boxplot(scores, notch=True, patch_artist=True, boxprops=dict(facecolor='lightgreen'))
# Add title and labels
plt.title("Exam Scores Distribution")
plt.ylabel("Scores")
# Display the plot
plt.show()
Example 2: Monthly Rainfall
# Data
rainfall = [100, 120, 85, 90, 150, 130, 110, 140, 95, 105, 125, 115]
# Create boxplot
plt.boxplot(rainfall, notch=True, patch_artist=True, boxprops=dict(facecolor='lightcoral'))
# Add title and labels
plt.title("Monthly Rainfall Distribution")
plt.ylabel("Rainfall (mm)")
# Display the plot
plt.show()
Try It Yourself
Problem 1: Analyze Heights of Students
Create a boxplot to analyze the height distribution of students in your class.
Show Code
# Data
heights = [150, 160, 165, 170, 155, 180, 175, 165, 158, 162]
# Create boxplot
plt.boxplot(heights, notch=True, patch_artist=True, boxprops=dict(facecolor='skyblue'))
# Add title and labels
plt.title("Height Distribution")
plt.ylabel("Height (cm)")
# Display the plot
plt.show()
Problem 2: Compare Sales Data
Visualize the distribution of sales data for two products using boxplots.
Show Code
# Data
product_a = [200, 300, 400, 150, 250, 350, 450, 300, 220, 310]
product_b = [180, 280, 380, 130, 230, 330, 430, 290, 200, 300]
# Create boxplots
plt.boxplot([product_a, product_b], labels=["Product A", "Product B"], notch=True, patch_artist=True, boxprops=dict(facecolor='lightyellow'))
# Add title and labels
plt.title("Sales Data Distribution")
plt.ylabel("Sales (Units)")
# Display the plot
plt.show()
Boxplots are powerful tools for visualizing data variability and identifying outliers. Use the customization options to create meaningful and visually appealing boxplots.