Histograms in Matplotlib
Histograms are used to represent the frequency distribution of a dataset. They are helpful for understanding the shape, spread, and central tendency of data.
Creating Histograms
The hist()
function in Matplotlib is used to create histograms.
Example: Basic Histogram
import matplotlib.pyplot as plt
import numpy as np
# Data
data = [22, 87, 5, 42, 88, 30, 56, 78, 95, 42, 67, 89, 42]
# Create histogram
plt.hist(data, bins=5, color='blue', edgecolor='black')
# Add title and labels
plt.title("Basic Histogram")
plt.xlabel("Value Range")
plt.ylabel("Frequency")
# Display the plot
plt.show()
Customizing Histograms
Matplotlib provides several parameters to customize histograms:
Parameter | Description | Example Value |
---|---|---|
bins | Number of bins or intervals | 10 , [0, 20, 40] |
color | Color of the bars | 'blue' , 'green' |
edgecolor | Color of the edges of the bars | 'black' |
alpha | Transparency level (0 to 1) | 0.5 |
Example: Customized Histogram
# Data
data = np.random.normal(50, 10, 1000) # Generate random data
# Create customized histogram
plt.hist(data, bins=20, color='green', edgecolor='black', alpha=0.7)
# Add title and labels
plt.title("Customized Histogram")
plt.xlabel("Value Range")
plt.ylabel("Frequency")
# Display the plot
plt.show()
Comparing Multiple Histograms
Example: Overlapping Histograms
# Data
data1 = np.random.normal(60, 10, 1000)
data2 = np.random.normal(50, 15, 1000)
# Create overlapping histograms
plt.hist(data1, bins=20, alpha=0.5, label="Dataset 1", color='blue', edgecolor='black')
plt.hist(data2, bins=20, alpha=0.5, label="Dataset 2", color='orange', edgecolor='black')
# Add title, labels, and legend
plt.title("Overlapping Histograms")
plt.xlabel("Value Range")
plt.ylabel("Frequency")
plt.legend()
# Display the plot
plt.show()
Example: Side-by-Side Histograms
# Data
data1 = [22, 87, 5, 42, 88, 30, 56]
data2 = [32, 57, 15, 72, 48, 50, 66]
# Define bin edges
bins = [0, 20, 40, 60, 80, 100]
# Create side-by-side histograms
plt.hist([data1, data2], bins=bins, label=["Dataset 1", "Dataset 2"], color=['blue', 'green'], edgecolor='black')
# Add title, labels, and legend
plt.title("Side-by-Side Histograms")
plt.xlabel("Value Range")
plt.ylabel("Frequency")
plt.legend()
# Display the plot
plt.show()
Practical Examples
Example 1: Student Test Scores
# Data
scores = [56, 78, 45, 89, 90, 65, 76, 88, 92, 55, 69, 80, 77]
# Create histogram
plt.hist(scores, bins=5, color='purple', edgecolor='black')
# Add title and labels
plt.title("Student Test Scores")
plt.xlabel("Score Range")
plt.ylabel("Frequency")
# Display the plot
plt.show()
Example 2: Monthly Rainfall Data
# Data
rainfall = [100, 120, 85, 90, 150, 130, 110, 140, 95, 105, 125, 115]
# Create histogram
plt.hist(rainfall, bins=6, color='cyan', edgecolor='black', alpha=0.6)
# Add title and labels
plt.title("Monthly Rainfall Distribution")
plt.xlabel("Rainfall (mm)")
plt.ylabel("Frequency")
# Display the plot
plt.show()
Try It Yourself
Problem 1: Analyze Heights of Students
Input the heights of students in your class and create a histogram to analyze the height distribution.
Show Code
# Data
heights = [150, 160, 165, 170, 155, 180, 175, 165, 158, 162]
# Create histogram
plt.hist(heights, bins=5, color='orange', edgecolor='black')
# Add title and labels
plt.title("Height Distribution")
plt.xlabel("Height (cm)")
plt.ylabel("Frequency")
# Display the plot
plt.show()
Problem 2: Analyze Product Sales
Visualize the sales data of 10 products using a histogram. Group the data into 4 intervals.
Show Code
# Data
sales = [200, 300, 400, 150, 250, 350, 450, 300, 220, 310]
# Create histogram
plt.hist(sales, bins=4, color='blue', edgecolor='black')
# Add title and labels
plt.title("Product Sales Distribution")
plt.xlabel("Sales (Units)")
plt.ylabel("Frequency")
# Display the plot
plt.show()
Histograms are essential for understanding data distribution. Experiment with different customization options to create insightful visualizations.