Merging, Joining, and Concatenation in Pandas
Pandas provides multiple methods to combine datasets, each suited to specific use cases. This page covers the concepts of concatenation, merging, and joining.
1. Concatenation
Concatenation combines multiple DataFrames or Series along a specific axis (rows or columns).
Example: Concatenating DataFrames
import pandas as pd
# Sample DataFrames
df1 = pd.DataFrame({
'Name': ['Anika', 'Rahul'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'Name': ['Sneha', 'Amit'],
'Age': [22, 28]
})
# Concatenate along rows (default axis=0)
result = pd.concat([df1, df2])
print(result)
Output:
Name Age
0 Anika 25
1 Rahul 30
0 Sneha 22
1 Amit 28
Concatenate Along Columns
# Concatenate along columns (axis=1)
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Name Age Name Age
0 Anika 25 Sneha 22
1 Rahul 30 Amit 28
2. Merging
The merge()
function combines DataFrames based on a common column or index. It is similar to SQL joins.
Example: Merging on a Common Column
# Sample DataFrames
df1 = pd.DataFrame({
'ID': [1, 2],
'Name': ['Anika', 'Rahul']
})
df2 = pd.DataFrame({
'ID': [1, 2],
'Age': [25, 30]
})
# Merge on 'ID'
result = pd.merge(df1, df2, on='ID')
print(result)
Output:
ID Name Age
0 1 Anika 25
1 2 Rahul 30
Types of Merges
- Inner Join: Default. Includes rows with matching keys in both DataFrames.
- Outer Join: Includes all rows, filling with NaN where necessary.
- Left Join: Includes all rows from the left DataFrame.
- Right Join: Includes all rows from the right DataFrame.
# Outer Join
result = pd.merge(df1, df2, on='ID', how='outer')
print(result)
Output:
ID Name Age
0 1 Anika 25.0
1 2 Rahul 30.0
3. Joining
The join()
method is used to combine DataFrames based on their index.
Example: Joining on Index
# Sample DataFrames
df1 = pd.DataFrame({
'Age': [25, 30]
}, index=['Anika', 'Rahul'])
df2 = pd.DataFrame({
'City': ['Delhi', 'Mumbai']
}, index=['Anika', 'Rahul'])
# Join on index
result = df1.join(df2)
print(result)
Output:
Age City
Anika 25 Delhi
Rahul 30 Mumbai
Try It Yourself
Problem 1: Concatenate DataFrames
Given the following DataFrames:
import pandas as pd
data1 = {
'Name': ['Sneha', 'Amit'],
'Score': [85, 90]
}
data2 = {
'Name': ['Anika', 'Rahul'],
'Score': [88, 92]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
- Concatenate
df1
anddf2
along rows. - Concatenate them along columns.
Show Code
# Concatenate along rows
result_rows = pd.concat([df1, df2])
print(result_rows)
# Concatenate along columns
result_columns = pd.concat([df1, df2], axis=1)
print(result_columns)
Problem 2: Merge DataFrames
Given the following DataFrames:
left = pd.DataFrame({
'ID': [1, 2, 3],
'Name': ['Anika', 'Rahul', 'Sneha']
})
right = pd.DataFrame({
'ID': [2, 3, 4],
'Score': [90, 85, 88]
})
- Perform an inner join.
- Perform an outer join.
Show Code
# Inner Join
result_inner = pd.merge(left, right, on='ID', how='inner')
print(result_inner)
# Outer Join
result_outer = pd.merge(left, right, on='ID', how='outer')
print(result_outer)
Problem 3: Join DataFrames on Index
Given the following DataFrames:
left = pd.DataFrame({
'Age': [25, 30]
}, index=['Anika', 'Rahul'])
right = pd.DataFrame({
'City': ['Delhi', 'Mumbai']
}, index=['Anika', 'Rahul'])
Join the two DataFrames and display the result.
Show Code
result = left.join(right)
print(result)