Data Selection and Filtering in Pandas
Pandas offers versatile methods to select and filter data from Series and DataFrames, enabling you to work efficiently with your datasets.
Selecting Rows and Columns
1. Selecting Columns
Access columns using bracket notation or dot notation.
import pandas as pd
# Sample DataFrame
data = {
'Name': ['Anika', 'Rahul', 'Sneha'],
'Age': [25, 30, 22],
'City': ['Delhi', 'Mumbai', 'Bangalore']
}
df = pd.DataFrame(data)
# Selecting a column
print(df['Name']) # Bracket notation
print(df.Name) # Dot notation
Output:
0 Anika
1 Rahul
2 Sneha
Name: Name, dtype: object
2. Selecting Rows
Select rows using slicing or .iloc
and .loc
.
Using Slicing
# Select rows by slicing
print(df[0:2])
Output:
Name Age City
0 Anika 25 Delhi
1 Rahul 30 Mumbai
Using .iloc
(Position-Based)
# Select rows by position
print(df.iloc[1])
Output:
Name Rahul
Age 30
City Mumbai
Name: 1, dtype: object
Using .loc
(Label-Based)
# Select rows by label
print(df.loc[0])
Output:
Name Anika
Age 25
City Delhi
Name: 0, dtype: object
Filtering Data with Conditions
Single Condition
# Filter rows where Age > 25
filtered = df[df['Age'] > 25]
print(filtered)
Output:
Name Age City
1 Rahul 30 Mumbai
Multiple Conditions
Use &
for AND and |
for OR. Enclose conditions in parentheses.
# Filter rows where Age > 25 and City is 'Mumbai'
filtered = df[(df['Age'] > 25) & (df['City'] == 'Mumbai')]
print(filtered)
Output:
Name Age City
1 Rahul 30 Mumbai
Boolean Indexing
Boolean indexing allows you to select data based on the evaluation of conditions.
# Create a boolean mask
mask = df['Age'] > 25
print(mask)
# Use the mask to filter data
filtered = df[mask]
print(filtered)
Output:
0 False
1 True
2 False
Name: Age, dtype: bool
Name Age City
1 Rahul 30 Mumbai
Try It Yourself
Problem 1: Select Specific Data
Given the following DataFrame:
import pandas as pd
data = {
'Product': ['Laptop', 'Phone', 'Tablet'],
'Price': [80000, 30000, 20000],
'Stock': [50, 150, 100]
}
df = pd.DataFrame(data)
- Select the
Price
column. - Filter products with a price greater than 25000.
Show Code
# Select the Price column
print(df['Price'])
# Filter products with price > 25000
filtered = df[df['Price'] > 25000]
print(filtered)
Problem 2: Use Boolean Indexing
Given the same DataFrame:
- Create a mask for products with stock greater than 100.
- Use the mask to filter and display the result.
Show Code
# Create a mask
mask = df['Stock'] > 100
print(mask)
# Filter using the mask
filtered = df[mask]
print(filtered)