Introduction to Pandas

Pandas is one of the most powerful and widely used Python libraries for data manipulation and analysis. Its user-friendly data structures and rich set of functions make it a favorite among data scientists and analysts.

What is Pandas?

Pandas is an open-source Python library designed for:

Manipulating and analyzing structured data.
Handling large datasets efficiently.
Providing easy-to-use data structures such as Series (1D) and DataFrame (2D).

It builds on NumPy and seamlessly integrates with other popular libraries like Matplotlib and Scikit-learn.

Key Features and Use Cases

Key Features:

Data Structures: Offers Series and DataFrame for handling one-dimensional and two-dimensional data.
Indexing: Enables intuitive indexing and selection of data.
Data Cleaning: Simplifies handling missing data and duplicates.
Data Aggregation: Provides powerful GroupBy operations for summarizing data.
File Handling: Supports reading from and writing to multiple file formats (CSV, Excel, SQL, JSON, etc.).
Time Series: Offers tools for working with time-series data.
Visualization: Includes built-in support for basic plotting.

Use Cases:

Data Wrangling: Cleaning and reshaping raw data for analysis.
Exploratory Data Analysis (EDA): Summarizing datasets to uncover patterns.
Data Aggregation: Summarizing sales, user behavior, and other metrics.
Time Series Analysis: Stock market data, IoT data, etc.
ETL Pipelines: Extracting, transforming, and loading data efficiently.

Installing Pandas

To install Pandas, use the following command:

pip install pandas

If you’re using Anaconda, Pandas comes pre-installed. To update:

conda update pandas

Verifying Installation

After installation, verify the version using:

import pandas as pd
print(pd.__version__)

Importing and Basic Usage

To use Pandas, import it as pd (a convention in the Python community):

import pandas as pd

Creating a Series

A Series is a one-dimensional array-like object.

# Create a Series
s = pd.Series([10, 20, 30, 40])
print(s)

Output:

0    10
1    20
2    30
3    40
dtype: int64

Creating a DataFrame

A DataFrame is a two-dimensional table-like data structure.

# Create a DataFrame
data = {
    'Name': ['Anika', 'Rahul', 'Sneha'],
    'Age': [25, 30, 22],
    'City': ['Delhi', 'Mumbai', 'Bangalore']
}
df = pd.DataFrame(data)
print(df)

Output:

    Name  Age       City
0  Anika   25      Delhi
1  Rahul   30     Mumbai
2  Sneha   22  Bangalore

Reading a CSV File

Pandas makes it easy to load datasets from files.

# Reading a CSV file
df = pd.read_csv('data.csv')
print(df.head())  # Display the first 5 rows

Try It Yourself

Problem 1: Create a Series

Create a Pandas Series from a list of integers [1, 2, 3, 4, 5] and display it.

Show Code

import pandas as pd
 
# Create a Series
series = pd.Series([1, 2, 3, 4, 5])
print(series)

Problem 2: Create a DataFrame

Create a DataFrame with columns Product, Price, and Stock. Add sample data for 3 products and display the DataFrame.

Show Code

import pandas as pd
 
# Create a DataFrame
data = {
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price': [80000, 30000, 20000],
    'Stock': [50, 150, 100]
}
df = pd.DataFrame(data)
print(df)

Pyground

Play with Python!

Output:

Numpy Best Practices Data Structures in Pandas