Real-World Examples with Pandas

Pandas is a go-to tool for real-world data analysis, offering a wide array of functionalities to handle, analyze, and visualize data efficiently. This page showcases practical use cases and case studies to demonstrate the power of Pandas in action.

Pandas in Data Analysis

Example 1: Analyzing Sales Data

Scenario: A company wants to analyze its sales data to find trends and identify top-performing products.

import pandas as pd
 
# Sample sales data
data = {
    "Product": ["A", "B", "C", "A", "B", "C"],
    "Region": ["North", "South", "East", "West", "North", "South"],
    "Sales": [150, 200, 300, 100, 250, 400]
}
df = pd.DataFrame(data)
 
# Analyze total sales by product
product_sales = df.groupby("Product")["Sales"].sum()
print("Total Sales by Product:\n", product_sales)
 
# Analyze total sales by region
region_sales = df.groupby("Region")["Sales"].sum()
print("\nTotal Sales by Region:\n", region_sales)

Output:

Total Sales by Product:
Product
A    250
B    450
C    700
Name: Sales, dtype: int64

Total Sales by Region:
Region
East     300
North    400
South    600
West     100
Name: Sales, dtype: int64

Example 2: Handling Missing Data in Weather Reports

Scenario: A meteorological department needs to clean and analyze temperature data with missing values.

# Sample temperature data with missing values
data = {
    "City": ["Delhi", "Mumbai", "Chennai", "Kolkata"],
    "Temperature": [40, None, 35, None]
}
df = pd.DataFrame(data)
 
# Fill missing values with the average temperature
df["Temperature"] = df["Temperature"].fillna(df["Temperature"].mean())
print(df)

Output:

      City  Temperature
0    Delhi    40.000000
1   Mumbai    37.500000
2  Chennai    35.000000
3  Kolkata    37.500000

Case Studies

Case Study 1: Customer Segmentation for Marketing

Problem: A retail company wants to segment its customers based on their purchasing behavior.

# Sample customer data
data = {
    "CustomerID": [1, 2, 3, 4],
    "Purchases": [500, 300, 700, 200],
    "Region": ["North", "South", "North", "East"]
}
df = pd.DataFrame(data)
 
# Segment customers into high, medium, and low spenders
df["Segment"] = pd.cut(df["Purchases"], bins=[0, 300, 600, 1000], labels=["Low", "Medium", "High"])
print(df)

Output:

   CustomerID  Purchases Region Segment
0           1        500  North  Medium
1           2        300  South     Low
2           3        700  North    High
3           4        200   East     Low

Case Study 2: Time Series Analysis for Energy Usage

Problem: An energy company wants to analyze hourly energy usage data to detect peaks and valleys.

# Sample energy usage data
data = {
    "Time": pd.date_range("2023-01-01", periods=6, freq="H"),
    "Usage": [100, 120, 150, 90, 80, 200]
}
df = pd.DataFrame(data)
 
# Resample data to daily total usage
daily_usage = df.resample("D", on="Time").sum()
print(daily_usage)

Output:

            Usage
Time             
2023-01-01   740

Try It Yourself

Problem 1: Analyze Movie Ratings

Create a DataFrame containing movie names, their genres, and ratings. Group the movies by genre and find the average rating for each genre.

Show Code

import pandas as pd
 
# Sample movie data
data = {
    "Movie": ["Movie1", "Movie2", "Movie3", "Movie4"],
    "Genre": ["Action", "Comedy", "Action", "Drama"],
    "Rating": [4.5, 3.8, 4.7, 4.0]
}
df = pd.DataFrame(data)
 
# Group by genre and calculate average rating
genre_ratings = df.groupby("Genre")["Rating"].mean()
print(genre_ratings)

Problem 2: Analyze Employee Salaries

Create a DataFrame with employee IDs, departments, and salaries. Calculate the total and average salary for each department.

Show Code

import pandas as pd
 
# Sample employee data
data = {
    "EmployeeID": [1, 2, 3, 4],
    "Department": ["HR", "Finance", "HR", "IT"],
    "Salary": [50000, 70000, 55000, 65000]
}
df = pd.DataFrame(data)
 
# Group by department and calculate total and average salary
total_salary = df.groupby("Department")["Salary"].sum()
avg_salary = df.groupby("Department")["Salary"].mean()
 
print("Total Salary by Department:\n", total_salary)
print("\nAverage Salary by Department:\n", avg_salary)

Pyground

Play with Python!

Output:

Advanced Python Pandas Best Practices