Attributes and Methods of Pandas Data Structures
Pandas provides several attributes and methods for both Series and DataFrame objects. These tools help you understand and manipulate data efficiently.
Commonly Used Attributes
Attributes for Both Series and DataFrame
Attribute | Description |
---|---|
index | Returns the index (row labels) of the object. |
columns | Returns the column labels (only for DataFrame). |
dtypes | Returns the data types of each column. |
shape | Returns the dimensions (rows, columns) as a tuple. |
size | Returns the total number of elements. |
values | Returns the underlying data as a NumPy array. |
ndim | Returns the number of dimensions. |
head() | Displays the first few rows (default: 5). |
tail() | Displays the last few rows (default: 5). |
Examples
Example 1: Checking Attributes of a DataFrame
import pandas as pd
# Sample DataFrame
data = {
'Name': ['Anika', 'Rahul', 'Sneha'],
'Age': [25, 30, 22],
'City': ['Delhi', 'Mumbai', 'Bangalore']
}
df = pd.DataFrame(data)
print("Index:", df.index)
print("Columns:", df.columns)
print("Data Types:\n", df.dtypes)
print("Shape:", df.shape)
print("Total Elements:", df.size)
print("Data Values:\n", df.values)
Output:
Index: RangeIndex(start=0, stop=3, step=1)
Columns: Index(['Name', 'Age', 'City'], dtype='object')
Data Types:
Name object
Age int64
City object
dtype: object
Shape: (3, 3)
Total Elements: 9
Data Values:
[['Anika' 25 'Delhi']
['Rahul' 30 'Mumbai']
['Sneha' 22 'Bangalore']]
Commonly Used Methods
Methods for Data Exploration
Method | Description |
---|---|
info() | Provides a summary of the DataFrame. |
describe() | Generates summary statistics for numeric columns. |
isnull() | Checks for missing values. |
notnull() | Checks for non-missing values. |
count() | Returns the count of non-null elements. |
unique() | Returns unique values in a Series. |
nunique() | Returns the number of unique values. |
Methods for Data Manipulation
Method | Description |
---|---|
sort_values() | Sorts by values in a column or Series. |
sort_index() | Sorts by index labels. |
drop() | Removes specified rows or columns. |
fillna() | Replaces missing values with specified values. |
astype() | Converts data to a specified type. |
apply() | Applies a function to each element. |
groupby() | Groups data by a column or index for aggregation. |
Examples
Example 2: Exploring Data
# Check info and summary statistics
print(df.info())
print(df.describe())
# Check for missing values
print(df.isnull())
Example 3: Manipulating Data
# Add a new column and fill missing values
new_data = {
'Name': ['Amit', 'Sonal', 'Neha'],
'Age': [None, 28, None],
'City': ['Pune', 'Chennai', 'Hyderabad']
}
new_df = pd.DataFrame(new_data)
# Fill missing ages with 25
data_filled = new_df.fillna({'Age': 25})
print(data_filled)
# Sorting by Age
data_sorted = data_filled.sort_values(by='Age')
print(data_sorted)
Try It Yourself
Problem 1: Check Attributes
Create a DataFrame for student marks in Math, Science, and English. Use attributes like shape
, dtypes
, and values
to explore the data.
Show Code
import pandas as pd
data = {
'Math': [88, 92, 79],
'Science': [85, 90, 84],
'English': [91, 89, 76]
}
df = pd.DataFrame(data)
print("Shape:", df.shape)
print("Data Types:\n", df.dtypes)
print("Values:\n", df.values)
Problem 2: Manipulate DataFrame
Using the DataFrame from Problem 1:
- Add a new column
Total
that sums the marks for each student. - Sort the DataFrame by the
Total
column.
Show Code
# Add a Total column
df['Total'] = df['Math'] + df['Science'] + df['English']
print("With Total Column:\n", df)
# Sort by Total
df_sorted = df.sort_values(by='Total', ascending=False)
print("Sorted by Total:\n", df_sorted)