Python ModulesPandas TutorialMerging, Joining, and Concatenation

Merging, Joining, and Concatenation in Pandas

Pandas provides multiple methods to combine datasets, each suited to specific use cases. This page covers the concepts of concatenation, merging, and joining.


1. Concatenation

Concatenation combines multiple DataFrames or Series along a specific axis (rows or columns).

Example: Concatenating DataFrames

import pandas as pd
 
# Sample DataFrames
df1 = pd.DataFrame({
    'Name': ['Anika', 'Rahul'],
    'Age': [25, 30]
})
df2 = pd.DataFrame({
    'Name': ['Sneha', 'Amit'],
    'Age': [22, 28]
})
 
# Concatenate along rows (default axis=0)
result = pd.concat([df1, df2])
print(result)

Output:

    Name  Age
0  Anika   25
1  Rahul   30
0  Sneha   22
1   Amit   28

Concatenate Along Columns

# Concatenate along columns (axis=1)
result = pd.concat([df1, df2], axis=1)
print(result)

Output:

    Name  Age   Name  Age
0  Anika   25  Sneha   22
1  Rahul   30   Amit   28

2. Merging

The merge() function combines DataFrames based on a common column or index. It is similar to SQL joins.

Example: Merging on a Common Column

# Sample DataFrames
df1 = pd.DataFrame({
    'ID': [1, 2],
    'Name': ['Anika', 'Rahul']
})
df2 = pd.DataFrame({
    'ID': [1, 2],
    'Age': [25, 30]
})
 
# Merge on 'ID'
result = pd.merge(df1, df2, on='ID')
print(result)

Output:

   ID   Name  Age
0   1  Anika   25
1   2  Rahul   30

Types of Merges

  • Inner Join: Default. Includes rows with matching keys in both DataFrames.
  • Outer Join: Includes all rows, filling with NaN where necessary.
  • Left Join: Includes all rows from the left DataFrame.
  • Right Join: Includes all rows from the right DataFrame.
# Outer Join
result = pd.merge(df1, df2, on='ID', how='outer')
print(result)

Output:

   ID   Name   Age
0   1  Anika  25.0
1   2  Rahul  30.0

3. Joining

The join() method is used to combine DataFrames based on their index.

Example: Joining on Index

# Sample DataFrames
df1 = pd.DataFrame({
    'Age': [25, 30]
}, index=['Anika', 'Rahul'])
df2 = pd.DataFrame({
    'City': ['Delhi', 'Mumbai']
}, index=['Anika', 'Rahul'])
 
# Join on index
result = df1.join(df2)
print(result)

Output:

       Age    City
Anika    25   Delhi
Rahul    30  Mumbai

Try It Yourself

Problem 1: Concatenate DataFrames

Given the following DataFrames:

import pandas as pd
 
data1 = {
    'Name': ['Sneha', 'Amit'],
    'Score': [85, 90]
}
data2 = {
    'Name': ['Anika', 'Rahul'],
    'Score': [88, 92]
}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
  1. Concatenate df1 and df2 along rows.
  2. Concatenate them along columns.
Show Code
# Concatenate along rows
result_rows = pd.concat([df1, df2])
print(result_rows)
 
# Concatenate along columns
result_columns = pd.concat([df1, df2], axis=1)
print(result_columns)

Problem 2: Merge DataFrames

Given the following DataFrames:

left = pd.DataFrame({
    'ID': [1, 2, 3],
    'Name': ['Anika', 'Rahul', 'Sneha']
})
right = pd.DataFrame({
    'ID': [2, 3, 4],
    'Score': [90, 85, 88]
})
  1. Perform an inner join.
  2. Perform an outer join.
Show Code
# Inner Join
result_inner = pd.merge(left, right, on='ID', how='inner')
print(result_inner)
 
# Outer Join
result_outer = pd.merge(left, right, on='ID', how='outer')
print(result_outer)

Problem 3: Join DataFrames on Index

Given the following DataFrames:

left = pd.DataFrame({
    'Age': [25, 30]
}, index=['Anika', 'Rahul'])
right = pd.DataFrame({
    'City': ['Delhi', 'Mumbai']
}, index=['Anika', 'Rahul'])

Join the two DataFrames and display the result.

Show Code
result = left.join(right)
print(result)

Pyground

Play with Python!

Output: