Working with Time Series in Pandas
Pandas provides robust tools for working with time series data, making it ideal for analyzing temporal patterns and trends. This page covers creating time-based indices, resampling, frequency conversion, and handling date/time data.
Creating Time-Based Indices
Time-based indices allow you to organize and analyze data with timestamps effectively.
Example: Creating a Time-Based Index
import pandas as pd
# Create a time-based index
time_index = pd.date_range(start="2023-01-01", periods=5, freq="D")
data = [100, 200, 150, 300, 400]
df = pd.DataFrame(data, index=time_index, columns=["Sales"])
print(df)
Output:
Sales
2023-01-01 100
2023-01-02 200
2023-01-03 150
2023-01-04 300
2023-01-05 400
Resampling and Frequency Conversion
Resampling involves converting a time series from one frequency to another (e.g., daily to monthly). It is useful for aggregating or interpolating data.
Example: Downsampling
# Downsample to monthly frequency (sum of daily sales)
monthly_sales = df.resample("M").sum()
print(monthly_sales)
Output:
Sales
2023-01-31 1150
Example: Upsampling
# Upsample to hourly frequency (fill missing values with forward fill)
hourly_sales = df.resample("H").ffill()
print(hourly_sales.head())
Output (partial):
Sales
2023-01-01 00:00:00 100
2023-01-01 01:00:00 100
2023-01-01 02:00:00 100
2023-01-01 03:00:00 100
2023-01-01 04:00:00 100
Handling Date/Time Data
Pandas makes it easy to parse and manipulate date/time data.
Parsing Date Strings
# Parse a column of date strings
data = {"Date": ["2023-01-01", "2023-01-02", "2023-01-03"], "Sales": [100, 200, 150]}
df = pd.DataFrame(data)
# Convert 'Date' column to datetime
df["Date"] = pd.to_datetime(df["Date"])
print(df.dtypes)
Output:
Date datetime64[ns]
Sales int64
dtype: object
Extracting Components
# Extract year, month, and day
df["Year"] = df["Date"].dt.year
df["Month"] = df["Date"].dt.month
df["Day"] = df["Date"].dt.day
print(df)
Output:
Date Sales Year Month Day
0 2023-01-01 100 2023 1 1
1 2023-01-02 200 2023 1 2
2 2023-01-03 150 2023 1 3
Filtering by Date
# Filter rows by date range
filtered_df = df[(df["Date"] >= "2023-01-02") & (df["Date"] <= "2023-01-03")]
print(filtered_df)
Output:
Date Sales Year Month Day
1 2023-01-02 200 2023 1 2
2 2023-01-03 150 2023 1 3
Try It Yourself
Problem 1: Create a Time Series
Create a DataFrame with daily temperatures for a week starting from “2023-01-01”. Set the date as the index and calculate the weekly average temperature.
Show Code
import pandas as pd
# Create time series data
dates = pd.date_range(start="2023-01-01", periods=7, freq="D")
temperatures = [30, 32, 31, 29, 28, 27, 26]
df = pd.DataFrame(temperatures, index=dates, columns=["Temperature"])
# Calculate weekly average
weekly_avg = df.resample("W").mean()
print(weekly_avg)
Problem 2: Filter Time Series
Create a DataFrame with monthly sales data for the year 2023. Filter the data to show sales for Q2 (April to June).
Show Code
import pandas as pd
# Create monthly sales data
months = pd.date_range(start="2023-01-01", periods=12, freq="M")
sales = [500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600]
df = pd.DataFrame(sales, index=months, columns=["Sales"])
# Filter for Q2
q2_sales = df["2023-04-01":"2023-06-30"]
print(q2_sales)