Data Structures in Pandas
Pandas offers three primary data structures to handle and manipulate data effectively. Each is designed for specific use cases, ensuring flexibility and ease of use.
1. Series: One-Dimensional Data
A Series is a one-dimensional labeled array capable of holding data of any type (integer, float, string, etc.). The labels (known as the index) provide easy access to data.
Creating a Series
import pandas as pd
# Create a Series from a list
data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)
Output:
0 10
1 20
2 30
3 40
dtype: int64
Adding an Index
# Create a Series with a custom index
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series)
Output:
a 10
b 20
c 30
d 40
dtype: int64
2. DataFrame: Two-Dimensional Data
A DataFrame is a two-dimensional table-like data structure with labeled axes (rows and columns). It is the most commonly used structure in Pandas.
Creating a DataFrame
# Create a DataFrame from a dictionary
data = {
'Name': ['Anika', 'Rahul', 'Sneha'],
'Age': [25, 30, 22],
'City': ['Delhi', 'Mumbai', 'Bangalore']
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 Anika 25 Delhi
1 Rahul 30 Mumbai
2 Sneha 22 Bangalore
3. Panel: Three-Dimensional Data (Deprecated)
A Panel was a three-dimensional data structure in Pandas, but it has been deprecated since version 1.0. Instead, multi-dimensional data can now be handled using hierarchical indexing or libraries like NumPy and xarray.
Alternative: Using MultiIndex DataFrames
# Multi-dimensional data using MultiIndex
data = {
('Math', 'Term1'): [90, 85, 80],
('Math', 'Term2'): [88, 89, 84],
('Science', 'Term1'): [92, 87, 85],
('Science', 'Term2'): [90, 91, 86]
}
df = pd.DataFrame(data, index=['Anika', 'Rahul', 'Sneha'])
print(df)
Output:
Math Science
Term1 Term2 Term1 Term2
Anika 90 88 92 90
Rahul 85 89 87 91
Sneha 80 84 85 86
Indexing and Slicing in Pandas
Pandas provides robust indexing and slicing capabilities for Series and DataFrames.
1. Indexing in a Series
# Accessing elements by index
series = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(series['b']) # Output: 20
2. Slicing in a Series
# Slice elements by index
print(series['b':'d'])
Output:
b 20
c 30
d 40
dtype: int64
3. Indexing in a DataFrame
# Accessing a column
data = {'Name': ['Anika', 'Rahul'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df['Name'])
Output:
0 Anika
1 Rahul
Name: Name, dtype: object
4. Slicing Rows in a DataFrame
# Accessing specific rows
print(df[0:1])
Output:
Name Age
0 Anika 25
5. Using .loc
and .iloc
.loc
: Access by labels..iloc
: Access by integer positions.
# Using loc for label-based indexing
print(df.loc[0])
# Using iloc for position-based indexing
print(df.iloc[0])
Try It Yourself
Problem 1: Create and Index a Series
Create a Pandas Series for the marks of 3 students in Math (Anika: 90, Rahul: 85, Sneha: 88). Display the marks of Rahul.
Show Code
import pandas as pd
marks = pd.Series([90, 85, 88], index=['Anika', 'Rahul', 'Sneha'])
print("Rahul's Marks:", marks['Rahul'])
Problem 2: Create and Slice a DataFrame
Create a DataFrame for 3 products with columns Product
, Price
, and Stock
. Display the details of the second product.
Show Code
import pandas as pd
data = {
'Product': ['Laptop', 'Phone', 'Tablet'],
'Price': [80000, 30000, 20000],
'Stock': [50, 150, 100]
}
df = pd.DataFrame(data)
print("Second Product Details:\n", df.iloc[1])