Python ModulesPandas TutorialData Structures in Pandas

Data Structures in Pandas

Pandas offers three primary data structures to handle and manipulate data effectively. Each is designed for specific use cases, ensuring flexibility and ease of use.


1. Series: One-Dimensional Data

A Series is a one-dimensional labeled array capable of holding data of any type (integer, float, string, etc.). The labels (known as the index) provide easy access to data.

Creating a Series

import pandas as pd
 
# Create a Series from a list
data = [10, 20, 30, 40]
series = pd.Series(data)
print(series)

Output:

0    10
1    20
2    30
3    40
dtype: int64

Adding an Index

# Create a Series with a custom index
series = pd.Series(data, index=['a', 'b', 'c', 'd'])
print(series)

Output:

a    10
b    20
c    30
d    40
dtype: int64

2. DataFrame: Two-Dimensional Data

A DataFrame is a two-dimensional table-like data structure with labeled axes (rows and columns). It is the most commonly used structure in Pandas.

Creating a DataFrame

# Create a DataFrame from a dictionary
data = {
    'Name': ['Anika', 'Rahul', 'Sneha'],
    'Age': [25, 30, 22],
    'City': ['Delhi', 'Mumbai', 'Bangalore']
}
df = pd.DataFrame(data)
print(df)

Output:

    Name  Age       City
0  Anika   25      Delhi
1  Rahul   30     Mumbai
2  Sneha   22  Bangalore

3. Panel: Three-Dimensional Data (Deprecated)

A Panel was a three-dimensional data structure in Pandas, but it has been deprecated since version 1.0. Instead, multi-dimensional data can now be handled using hierarchical indexing or libraries like NumPy and xarray.

Alternative: Using MultiIndex DataFrames

# Multi-dimensional data using MultiIndex
data = {
    ('Math', 'Term1'): [90, 85, 80],
    ('Math', 'Term2'): [88, 89, 84],
    ('Science', 'Term1'): [92, 87, 85],
    ('Science', 'Term2'): [90, 91, 86]
}
df = pd.DataFrame(data, index=['Anika', 'Rahul', 'Sneha'])
print(df)

Output:

        Math         Science      
       Term1 Term2   Term1 Term2 
Anika     90    88      92    90 
Rahul     85    89      87    91 
Sneha     80    84      85    86 

Indexing and Slicing in Pandas

Pandas provides robust indexing and slicing capabilities for Series and DataFrames.

1. Indexing in a Series

# Accessing elements by index
series = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(series['b'])  # Output: 20

2. Slicing in a Series

# Slice elements by index
print(series['b':'d'])

Output:

b    20
c    30
d    40
dtype: int64

3. Indexing in a DataFrame

# Accessing a column
data = {'Name': ['Anika', 'Rahul'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df['Name'])

Output:

0    Anika
1    Rahul
Name: Name, dtype: object

4. Slicing Rows in a DataFrame

# Accessing specific rows
print(df[0:1])

Output:

    Name  Age
0  Anika   25

5. Using .loc and .iloc

  • .loc: Access by labels.
  • .iloc: Access by integer positions.
# Using loc for label-based indexing
print(df.loc[0])
 
# Using iloc for position-based indexing
print(df.iloc[0])

Try It Yourself

Problem 1: Create and Index a Series

Create a Pandas Series for the marks of 3 students in Math (Anika: 90, Rahul: 85, Sneha: 88). Display the marks of Rahul.

Show Code
import pandas as pd
 
marks = pd.Series([90, 85, 88], index=['Anika', 'Rahul', 'Sneha'])
print("Rahul's Marks:", marks['Rahul'])

Problem 2: Create and Slice a DataFrame

Create a DataFrame for 3 products with columns Product, Price, and Stock. Display the details of the second product.

Show Code
import pandas as pd
 
data = {
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price': [80000, 30000, 20000],
    'Stock': [50, 150, 100]
}
df = pd.DataFrame(data)
print("Second Product Details:\n", df.iloc[1])

Pyground

Play with Python!

Output: