Real-World Examples with Pandas
Pandas is a go-to tool for real-world data analysis, offering a wide array of functionalities to handle, analyze, and visualize data efficiently. This page showcases practical use cases and case studies to demonstrate the power of Pandas in action.
Pandas in Data Analysis
Example 1: Analyzing Sales Data
Scenario: A company wants to analyze its sales data to find trends and identify top-performing products.
import pandas as pd
# Sample sales data
data = {
"Product": ["A", "B", "C", "A", "B", "C"],
"Region": ["North", "South", "East", "West", "North", "South"],
"Sales": [150, 200, 300, 100, 250, 400]
}
df = pd.DataFrame(data)
# Analyze total sales by product
product_sales = df.groupby("Product")["Sales"].sum()
print("Total Sales by Product:\n", product_sales)
# Analyze total sales by region
region_sales = df.groupby("Region")["Sales"].sum()
print("\nTotal Sales by Region:\n", region_sales)
Output:
Total Sales by Product:
Product
A 250
B 450
C 700
Name: Sales, dtype: int64
Total Sales by Region:
Region
East 300
North 400
South 600
West 100
Name: Sales, dtype: int64
Example 2: Handling Missing Data in Weather Reports
Scenario: A meteorological department needs to clean and analyze temperature data with missing values.
# Sample temperature data with missing values
data = {
"City": ["Delhi", "Mumbai", "Chennai", "Kolkata"],
"Temperature": [40, None, 35, None]
}
df = pd.DataFrame(data)
# Fill missing values with the average temperature
df["Temperature"] = df["Temperature"].fillna(df["Temperature"].mean())
print(df)
Output:
City Temperature
0 Delhi 40.000000
1 Mumbai 37.500000
2 Chennai 35.000000
3 Kolkata 37.500000
Case Studies
Case Study 1: Customer Segmentation for Marketing
Problem: A retail company wants to segment its customers based on their purchasing behavior.
# Sample customer data
data = {
"CustomerID": [1, 2, 3, 4],
"Purchases": [500, 300, 700, 200],
"Region": ["North", "South", "North", "East"]
}
df = pd.DataFrame(data)
# Segment customers into high, medium, and low spenders
df["Segment"] = pd.cut(df["Purchases"], bins=[0, 300, 600, 1000], labels=["Low", "Medium", "High"])
print(df)
Output:
CustomerID Purchases Region Segment
0 1 500 North Medium
1 2 300 South Low
2 3 700 North High
3 4 200 East Low
Case Study 2: Time Series Analysis for Energy Usage
Problem: An energy company wants to analyze hourly energy usage data to detect peaks and valleys.
# Sample energy usage data
data = {
"Time": pd.date_range("2023-01-01", periods=6, freq="H"),
"Usage": [100, 120, 150, 90, 80, 200]
}
df = pd.DataFrame(data)
# Resample data to daily total usage
daily_usage = df.resample("D", on="Time").sum()
print(daily_usage)
Output:
Usage
Time
2023-01-01 740
Try It Yourself
Problem 1: Analyze Movie Ratings
Create a DataFrame containing movie names, their genres, and ratings. Group the movies by genre and find the average rating for each genre.
Show Code
import pandas as pd
# Sample movie data
data = {
"Movie": ["Movie1", "Movie2", "Movie3", "Movie4"],
"Genre": ["Action", "Comedy", "Action", "Drama"],
"Rating": [4.5, 3.8, 4.7, 4.0]
}
df = pd.DataFrame(data)
# Group by genre and calculate average rating
genre_ratings = df.groupby("Genre")["Rating"].mean()
print(genre_ratings)
Problem 2: Analyze Employee Salaries
Create a DataFrame with employee IDs, departments, and salaries. Calculate the total and average salary for each department.
Show Code
import pandas as pd
# Sample employee data
data = {
"EmployeeID": [1, 2, 3, 4],
"Department": ["HR", "Finance", "HR", "IT"],
"Salary": [50000, 70000, 55000, 65000]
}
df = pd.DataFrame(data)
# Group by department and calculate total and average salary
total_salary = df.groupby("Department")["Salary"].sum()
avg_salary = df.groupby("Department")["Salary"].mean()
print("Total Salary by Department:\n", total_salary)
print("\nAverage Salary by Department:\n", avg_salary)