Merge Two DataFrames by Column
Joining two dataframes on a key with pandas
By Chi Kit Yeung in Python Cookbook
August 2, 2024
Problem
I wanted to replicate the JOIN
function in SQL with two dataframes containing different information. This can be done using pandas’
merge method as follows.
Solution
import pandas as pd
# create two sample dataframes
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value': [4, 5, 6]})
# merge the two dataframes on the key column
merged_df = df1.merge(df2, how='inner', on='key')
Examples
data = [[1, 100, 2008, 10, 5000], [2, 100, 2009, 12, 5000], [7, 200, 2011, 15, 9000]]
sales = pd.DataFrame(data, columns=['sale_id', 'product_id', 'year', 'quantity', 'price']).astype({'sale_id':'Int64', 'product_id':'Int64', 'year':'Int64', 'quantity':'Int64', 'price':'Int64'})
data = [[100, 'Nokia'], [200, 'Apple'], [300, 'Samsung']]
product = pd.DataFrame(data, columns=['product_id', 'product_name']).astype({'product_id':'Int64', 'product_name':'object'})
>>> sales
sale_id product_id year quantity price
0 1 100 2008 10 5000
1 2 100 2009 12 5000
2 7 200 2011 15 9000
>>> product
product_id product_name
0 100 Nokia
1 200 Apple
2 300 Samsung
>>> merge_df = sales.merge(product, on='product_id')
>>> merge_df[['product_name', 'year', 'price']]
product_name year price
0 Nokia 2008 5000
1 Nokia 2009 5000
2 Apple 2011 9000