Histograms in Matplotlib

Histograms are used to represent the frequency distribution of a dataset. They are helpful for understanding the shape, spread, and central tendency of data.


Creating Histograms

The hist() function in Matplotlib is used to create histograms.

Example: Basic Histogram

import matplotlib.pyplot as plt
import numpy as np
 
# Data
data = [22, 87, 5, 42, 88, 30, 56, 78, 95, 42, 67, 89, 42]
 
# Create histogram
plt.hist(data, bins=5, color='blue', edgecolor='black')
 
# Add title and labels
plt.title("Basic Histogram")
plt.xlabel("Value Range")
plt.ylabel("Frequency")
 
# Display the plot
plt.show()

Customizing Histograms

Matplotlib provides several parameters to customize histograms:

ParameterDescriptionExample Value
binsNumber of bins or intervals10, [0, 20, 40]
colorColor of the bars'blue', 'green'
edgecolorColor of the edges of the bars'black'
alphaTransparency level (0 to 1)0.5

Example: Customized Histogram

# Data
data = np.random.normal(50, 10, 1000)  # Generate random data
 
# Create customized histogram
plt.hist(data, bins=20, color='green', edgecolor='black', alpha=0.7)
 
# Add title and labels
plt.title("Customized Histogram")
plt.xlabel("Value Range")
plt.ylabel("Frequency")
 
# Display the plot
plt.show()

Comparing Multiple Histograms

Example: Overlapping Histograms

# Data
data1 = np.random.normal(60, 10, 1000)
data2 = np.random.normal(50, 15, 1000)
 
# Create overlapping histograms
plt.hist(data1, bins=20, alpha=0.5, label="Dataset 1", color='blue', edgecolor='black')
plt.hist(data2, bins=20, alpha=0.5, label="Dataset 2", color='orange', edgecolor='black')
 
# Add title, labels, and legend
plt.title("Overlapping Histograms")
plt.xlabel("Value Range")
plt.ylabel("Frequency")
plt.legend()
 
# Display the plot
plt.show()

Example: Side-by-Side Histograms

# Data
data1 = [22, 87, 5, 42, 88, 30, 56]
data2 = [32, 57, 15, 72, 48, 50, 66]
 
# Define bin edges
bins = [0, 20, 40, 60, 80, 100]
 
# Create side-by-side histograms
plt.hist([data1, data2], bins=bins, label=["Dataset 1", "Dataset 2"], color=['blue', 'green'], edgecolor='black')
 
# Add title, labels, and legend
plt.title("Side-by-Side Histograms")
plt.xlabel("Value Range")
plt.ylabel("Frequency")
plt.legend()
 
# Display the plot
plt.show()

Practical Examples

Example 1: Student Test Scores

# Data
scores = [56, 78, 45, 89, 90, 65, 76, 88, 92, 55, 69, 80, 77]
 
# Create histogram
plt.hist(scores, bins=5, color='purple', edgecolor='black')
 
# Add title and labels
plt.title("Student Test Scores")
plt.xlabel("Score Range")
plt.ylabel("Frequency")
 
# Display the plot
plt.show()

Example 2: Monthly Rainfall Data

# Data
rainfall = [100, 120, 85, 90, 150, 130, 110, 140, 95, 105, 125, 115]
 
# Create histogram
plt.hist(rainfall, bins=6, color='cyan', edgecolor='black', alpha=0.6)
 
# Add title and labels
plt.title("Monthly Rainfall Distribution")
plt.xlabel("Rainfall (mm)")
plt.ylabel("Frequency")
 
# Display the plot
plt.show()

Try It Yourself

Problem 1: Analyze Heights of Students

Input the heights of students in your class and create a histogram to analyze the height distribution.

Show Code
# Data
heights = [150, 160, 165, 170, 155, 180, 175, 165, 158, 162]
 
# Create histogram
plt.hist(heights, bins=5, color='orange', edgecolor='black')
 
# Add title and labels
plt.title("Height Distribution")
plt.xlabel("Height (cm)")
plt.ylabel("Frequency")
 
# Display the plot
plt.show()

Problem 2: Analyze Product Sales

Visualize the sales data of 10 products using a histogram. Group the data into 4 intervals.

Show Code
# Data
sales = [200, 300, 400, 150, 250, 350, 450, 300, 220, 310]
 
# Create histogram
plt.hist(sales, bins=4, color='blue', edgecolor='black')
 
# Add title and labels
plt.title("Product Sales Distribution")
plt.xlabel("Sales (Units)")
plt.ylabel("Frequency")
 
# Display the plot
plt.show()

Histograms are essential for understanding data distribution. Experiment with different customization options to create insightful visualizations.


Pyground

Play with Python!

Output: