Python ModulesPandas TutorialData Selection and Filtering

Data Selection and Filtering in Pandas

Pandas offers versatile methods to select and filter data from Series and DataFrames, enabling you to work efficiently with your datasets.


Selecting Rows and Columns

1. Selecting Columns

Access columns using bracket notation or dot notation.

import pandas as pd
 
# Sample DataFrame
data = {
    'Name': ['Anika', 'Rahul', 'Sneha'],
    'Age': [25, 30, 22],
    'City': ['Delhi', 'Mumbai', 'Bangalore']
}
df = pd.DataFrame(data)
 
# Selecting a column
print(df['Name'])  # Bracket notation
print(df.Name)     # Dot notation

Output:

0    Anika
1    Rahul
2    Sneha
Name: Name, dtype: object

2. Selecting Rows

Select rows using slicing or .iloc and .loc.

Using Slicing

# Select rows by slicing
print(df[0:2])

Output:

    Name  Age    City
0  Anika   25   Delhi
1  Rahul   30  Mumbai

Using .iloc (Position-Based)

# Select rows by position
print(df.iloc[1])

Output:

Name     Rahul
Age         30
City    Mumbai
Name: 1, dtype: object

Using .loc (Label-Based)

# Select rows by label
print(df.loc[0])

Output:

Name     Anika
Age         25
City     Delhi
Name: 0, dtype: object

Filtering Data with Conditions

Single Condition

# Filter rows where Age > 25
filtered = df[df['Age'] > 25]
print(filtered)

Output:

    Name  Age    City
1  Rahul   30  Mumbai

Multiple Conditions

Use & for AND and | for OR. Enclose conditions in parentheses.

# Filter rows where Age > 25 and City is 'Mumbai'
filtered = df[(df['Age'] > 25) & (df['City'] == 'Mumbai')]
print(filtered)

Output:

    Name  Age    City
1  Rahul   30  Mumbai

Boolean Indexing

Boolean indexing allows you to select data based on the evaluation of conditions.

# Create a boolean mask
mask = df['Age'] > 25
print(mask)
 
# Use the mask to filter data
filtered = df[mask]
print(filtered)

Output:

0    False
1     True
2    False
Name: Age, dtype: bool

    Name  Age    City
1  Rahul   30  Mumbai

Try It Yourself

Problem 1: Select Specific Data

Given the following DataFrame:

import pandas as pd
 
data = {
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price': [80000, 30000, 20000],
    'Stock': [50, 150, 100]
}
df = pd.DataFrame(data)
  1. Select the Price column.
  2. Filter products with a price greater than 25000.
Show Code
# Select the Price column
print(df['Price'])
 
# Filter products with price > 25000
filtered = df[df['Price'] > 25000]
print(filtered)

Problem 2: Use Boolean Indexing

Given the same DataFrame:

  1. Create a mask for products with stock greater than 100.
  2. Use the mask to filter and display the result.
Show Code
# Create a mask
mask = df['Stock'] > 100
print(mask)
 
# Filter using the mask
filtered = df[mask]
print(filtered)

Pyground

Play with Python!

Output: