Merge Two DataFrames by Column

Joining two dataframes on a key with pandas

By Chi Kit Yeung in Python Cookbook

August 2, 2024

Problem

I wanted to replicate the JOIN function in SQL with two dataframes containing different information. This can be done using pandas’ merge method as follows.

Solution

import pandas as pd

# create two sample dataframes
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value': [4, 5, 6]})

# merge the two dataframes on the key column
merged_df = df1.merge(df2, how='inner', on='key')

Examples

data = [[1, 100, 2008, 10, 5000], [2, 100, 2009, 12, 5000], [7, 200, 2011, 15, 9000]]
sales = pd.DataFrame(data, columns=['sale_id', 'product_id', 'year', 'quantity', 'price']).astype({'sale_id':'Int64', 'product_id':'Int64', 'year':'Int64', 'quantity':'Int64', 'price':'Int64'})
data = [[100, 'Nokia'], [200, 'Apple'], [300, 'Samsung']]
product = pd.DataFrame(data, columns=['product_id', 'product_name']).astype({'product_id':'Int64', 'product_name':'object'})

>>> sales
   sale_id  product_id  year  quantity  price
0        1         100  2008        10   5000
1        2         100  2009        12   5000
2        7         200  2011        15   9000
>>> product
   product_id product_name
0         100        Nokia
1         200        Apple
2         300      Samsung
>>> merge_df = sales.merge(product, on='product_id')
>>> merge_df[['product_name', 'year', 'price']]
  product_name  year  price
0        Nokia  2008   5000
1        Nokia  2009   5000
2        Apple  2011   9000