Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (2024)

Pandas DataFrameis a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labelled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can join, merge, and concat dataframe using different methods.

In Dataframedf.merge(),df.join(), anddf.concat()methods help in joining, merging and concating different dataframe.

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (1)

Concatenating DataFrame

In order to concat dataframe, we useconcat()function which helps in concatenating a dataframe. We can concat a dataframe in many different ways, they are:

  • Concatenating DataFrame using.concat()
  • Concatenating DataFrame by setting logic on axes
  • Concatenating DataFrame using.append()
  • Concatenating DataFrame by ignoring indexes
  • Concatenating DataFrame with group keys
  • Concatenating with mixed ndims

Concatenating DataFrame using.concat():

In order to concat a dataframe, we use.concat()function this function concat a dataframe and returns a new dataframe.

Python
# importing pandas moduleimport pandas as pd# Define a dictionary containing employee datadata1 = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age': [27, 24, 22, 32], 'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification': ['Msc', 'MA', 'MCA', 'Phd']}# Define a dictionary containing employee datadata2 = {'Name': ['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'], 'Age': [17, 14, 12, 52], 'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification': ['Btech', 'B.A', 'Bcom', 'B.hons']}# Convert the dictionary into DataFramedf = pd.DataFrame(data1, index=[0, 1, 2, 3])# Convert the dictionary into DataFramedf1 = pd.DataFrame(data2, index=[4, 5, 6, 7])print(df, "\n\n", df1)

Now we apply.concatfunction in order to concat two dataframe.

Python
# using a .concat() methodframes = [df, df1]res1 = pd.concat(frames)res1

Output :
As shown in the output image, we have created two dataframe after concatenating we get one dataframe.

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (2)

Concatenating DataFrame by setting logic on axes :
In order to concat dataframe, we have to set different logic on axes. We can set axes in the following three ways:

  • Taking the union of them all,join='outer'. This is the default option as it results in zero information loss.
  • Taking the intersection,join='inner'.
  • Use a specific index, as passed to thejoin_axesargument.
Python
# importing pandas moduleimport pandas as pd# Define a dictionary containing employee datadata1 = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age': [27, 24, 22, 32], 'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification': ['Msc', 'MA', 'MCA', 'Phd'], 'Mobile No': [97, 91, 58, 76]}# Define a dictionary containing employee datadata2 = {'Name': ['Gaurav', 'Anuj', 'Dhiraj', 'Hitesh'], 'Age': [22, 32, 12, 52], 'Address': ['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'], 'Qualification': ['MCA', 'Phd', 'Bcom', 'B.hons'], 'Salary': [1000, 2000, 3000, 4000]}# Convert the dictionary into DataFramedf = pd.DataFrame(data1, index=[0, 1, 2, 3])# Convert the dictionary into DataFramedf1 = pd.DataFrame(data2, index=[2, 3, 6, 7])print(df, "\n\n", df1)

Now we set axesjoin = innerfor intersection of dataframe.

Python
# applying concat with axes# join = 'inner'res2 = pd.concat([df, df1], axis=1, join='inner')res2

Output :
As shown in the output image, we get the intersection of dataframe.

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (3)

Now we set axesjoin = outerfor union of dataframe.

Python
# using a .concat for# union of dataframeres2 = pd.concat([df, df1], axis=1, sort=False)res2

Output :
As shown in the output image, we get the union of dataframe

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (4)

Now we used a specific index, as passed to thejoin_axesargument

Python
# using join_axesres3 = pd.concat([df, df1], axis=1, join_axes=[df.index]) res3

Output :

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (5)

Concatenating DataFrame using.append()

In order to concat a dataframe, we use.append()function this function concatenate along axis=0, namely the index. This function exist before.concat.

Python
# importing pandas moduleimport pandas as pd# Define a dictionary containing employee datadata1 = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age': [27, 24, 22, 32], 'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification': ['Msc', 'MA', 'MCA', 'Phd']}# Define a dictionary containing employee datadata2 = {'Name': ['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'], 'Age': [17, 14, 12, 52], 'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification': ['Btech', 'B.A', 'Bcom', 'B.hons']}# Convert the dictionary into DataFramedf = pd.DataFrame(data1, index=[0, 1, 2, 3])# Convert the dictionary into DataFramedf1 = pd.DataFrame(data2, index=[4, 5, 6, 7])print(df, "\n\n", df1)

Now we apply.append()function inorder to concat to dataframe

Python
# using append functionres = df.append(df1)res

Output :

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (6)

Concatenating DataFrame by ignoring indexes :

In order to concat a dataframe by ignoring indexes, we ignore index which don’t have a meaningful meaning, you may wish to append them and ignore the fact that they may have overlapping indexes. In order to do that we useignore_indexas an argument.

Python
# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Msc', 'MA', 'MCA', 'Phd'], 'Mobile No': [97, 91, 58, 76]} # Define a dictionary containing employee data data2 = {'Name':['Gaurav', 'Anuj', 'Dhiraj', 'Hitesh'], 'Age':[22, 32, 12, 52], 'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'], 'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons'], 'Salary':[1000, 2000, 3000, 4000]} # Convert the dictionary into DataFrame df = pd.DataFrame(data1,index=[0, 1, 2, 3]) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2, index=[2, 3, 6, 7]) print(df, "\n\n", df1) 

Now we are going to applyignore_indexas an argument.

Python
# using ignore_indexres = pd.concat([df, df1], ignore_index=True) res

Output :

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (7)

Concatenating DataFrame with group keys :

In order to concat dataframe with group keys, we override the column names with the use of the keys argument. Keys argument is to override the column names when creating a new DataFrame based on existing Series.

Python
# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Msc', 'MA', 'MCA', 'Phd']} # Define a dictionary containing employee data data2 = {'Name':['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'], 'Age':[17, 14, 12, 52], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1,index=[0, 1, 2, 3]) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2, index=[4, 5, 6, 7]) print(df, "\n\n", df1) 

Now we use keys as an argument.

Python
# using keys frames = [df, df1 ] res = pd.concat(frames, keys=['x', 'y'])res

Output :

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (8)

Concatenating with mixed ndims :

User can concatenate a mix of Series and DataFrame. The Series will be transformed to DataFrame with the column name as the name of the Series.

Python
# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Msc', 'MA', 'MCA', 'Phd']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1,index=[0, 1, 2, 3]) # creating a seriess1 = pd.Series([1000, 2000, 3000, 4000], name='Salary') print(df, "\n\n", s1) 

Now we are going to mix Series and dataframe together.

Python
# combining series and dataframeres = pd.concat([df, s1], axis=1) res

Output :
Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (9)Merging DataFrame

Pandas have options for high-performance in-memory merging and joining. When we need to combine very large DataFrames, joins serve as a powerful way to perform these operations swiftly. Joins can only be done on two DataFrames at a time, denoted as left and right tables. The key is the common column that the two DataFrames will be joined on. It’s a good practice to use keys which have unique values throughout the column to avoid unintended duplication of row values. Pandas provide a single function, merge(), as the entry point for all standard database join operations between DataFrame objects.

There are four basic ways to handle the join (inner, left, right, and outer), depending on which rows must retain their data.
Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (10)Code #1 :Merging a dataframe with one unique key combination

Python
# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'key': ['K0', 'K1', 'K2', 'K3'], 'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32],} # Define a dictionary containing employee data data2 = {'key': ['K0', 'K1', 'K2', 'K3'], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2) print(df, "\n\n", df1) 

Now we are using.merge()with one unique key combination.

Python
# using .merge() functionres = pd.merge(df, df1, on='key') res

Output :
Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (11)
Code #2:Merging dataframe using multiple join keys.

Python
# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'key': ['K0', 'K1', 'K2', 'K3'], 'key1': ['K0', 'K1', 'K0', 'K1'], 'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32],} # Define a dictionary containing employee data data2 = {'key': ['K0', 'K1', 'K2', 'K3'], 'key1': ['K0', 'K0', 'K0', 'K0'], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2) print(df, "\n\n", df1) 

Now we merge dataframe using multiple keys.

Python
# merging dataframe using multiple keysres1 = pd.merge(df, df1, on=['key', 'key1']) res1

Output :
Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (12)Merging dataframe usinghowin an argument:

We use how argument to merge specifies how to determine which keys are to be included in the resulting table. If a key combination does not appear in either the left or right tables, the values in the joined table will be NA. Here is a summary of the how options and their SQL equivalent names:

MERGE METHODJOIN NAMEDESCRIPTION
leftLEFT OUTER JOINUse keys from left frame only
rightRIGHT OUTER JOINUse keys from right frame only
outerFULL OUTER JOINUse union of keys from both frames
innerINNER JOINUse intersection of keys from both frames
Python
# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'key': ['K0', 'K1', 'K2', 'K3'], 'key1': ['K0', 'K1', 'K0', 'K1'], 'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32],} # Define a dictionary containing employee data data2 = {'key': ['K0', 'K1', 'K2', 'K3'], 'key1': ['K0', 'K0', 'K0', 'K0'], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2) print(df, "\n\n", df1) 


Now we sethow = 'left'in order to use keys from left frame only.

Python
# using keys from left frameres = pd.merge(df, df1, how='left', on=['key', 'key1']) res

Output :

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (13)

Now we sethow = 'right'in order to use keys from right frame only.

Python
# using keys from right frameres1 = pd.merge(df, df1, how='right', on=['key', 'key1']) res1

Output :

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (14)

Now we sethow = 'outer'in order to get union of keys from dataframes.

Python
# getting union of keysres2 = pd.merge(df, df1, how='outer', on=['key', 'key1']) res2

Output :

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (15)

Now we sethow = 'inner'in order to get intersection of keys from dataframes.

Python
# getting intersection of keysres3 = pd.merge(df, df1, how='inner', on=['key', 'key1']) res3

Output :
Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (16)

Joining DataFrame

In order to join dataframe, we use.join()function this function is used for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

Python
# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32]} # Define a dictionary containing employee data data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'], 'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1,index=['K0', 'K1', 'K2', 'K3']) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4']) print(df, "\n\n", df1) 

Now we are use.join()method in order to join dataframes

Python
# joining dataframeres = df.join(df1) res

Output :

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (17)

Now we usehow = 'outer'in order to get union

Python
# getting unionres1 = df.join(df1, how='outer') res1

Output :

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (18)

Joining dataframe usingonin an argument :

In order to join dataframes we useonin an argument.join()takes an optional on argument which may be a column or multiple column names, which specifies that the passed DataFrame is to be aligned on that column in the DataFrame.

Python
# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Key':['K0', 'K1', 'K2', 'K3']} # Define a dictionary containing employee data data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'], 'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4']) print(df, "\n\n", df1) 

Now we are using.joinwith “on” argument.

Python
# using on argument in joinres2 = df.join(df1, on='Key') res2

Output :

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (19)


Joining singly-indexed DataFrame with multi-indexed DataFrame :

In order to join singly indexed dataframe with multi-indexed dataframe, the level will match on the name of the index of the singly-indexed frame against a level name of the multi-indexed frame.

Python
# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav'], 'Age':[27, 24, 22]} # Define a dictionary containing employee data data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kanpur'], 'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1, index=pd.Index(['K0', 'K1', 'K2'], name='key')) index = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'), ('K2', 'Y2'), ('K2', 'Y3')], names=['key', 'Y']) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2, index= index) print(df, "\n\n", df1)

Now we join singly indexed dataframe with multi-indexed dataframe.

Python
# joining singly indexed with# multi indexedresult = df.join(df1, how='inner') result

Output :

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (20)



T

tarunsarawgi_gfg

Improve

Previous Article

Python - Concatenation of two String Tuples

Next Article

Python | Pandas Working With Text Data

Please Login to comment...

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (2024)
Top Articles
Latest Posts
Article information

Author: Gov. Deandrea McKenzie

Last Updated:

Views: 6131

Rating: 4.6 / 5 (46 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Gov. Deandrea McKenzie

Birthday: 2001-01-17

Address: Suite 769 2454 Marsha Coves, Debbieton, MS 95002

Phone: +813077629322

Job: Real-Estate Executive

Hobby: Archery, Metal detecting, Kitesurfing, Genealogy, Kitesurfing, Calligraphy, Roller skating

Introduction: My name is Gov. Deandrea McKenzie, I am a spotless, clean, glamorous, sparkling, adventurous, nice, brainy person who loves writing and wants to share my knowledge and understanding with you.