Python | Pandas Merging, Joining, and Concatenating

Pandas DataFrameis a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labelled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can join, merge, and concat dataframe using different methods.

In Dataframedf.merge(),df.join(), anddf.concat()methods help in joining, merging and concating different dataframe.

Concatenating DataFrame

In order to concat dataframe, we useconcat()function which helps in concatenating a dataframe. We can concat a dataframe in many different ways, they are:

Concatenating DataFrame using.concat()
Concatenating DataFrame by setting logic on axes
Concatenating DataFrame using.append()
Concatenating DataFrame by ignoring indexes
Concatenating DataFrame with group keys
Concatenating with mixed ndims

Concatenating DataFrame using.concat():

In order to concat a dataframe, we use.concat()function this function concat a dataframe and returns a new dataframe.

Python

# importing pandas moduleimport pandas as pd# Define a dictionary containing employee datadata1 = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age': [27, 24, 22, 32], 'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification': ['Msc', 'MA', 'MCA', 'Phd']}# Define a dictionary containing employee datadata2 = {'Name': ['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'], 'Age': [17, 14, 12, 52], 'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification': ['Btech', 'B.A', 'Bcom', 'B.hons']}# Convert the dictionary into DataFramedf = pd.DataFrame(data1, index=[0, 1, 2, 3])# Convert the dictionary into DataFramedf1 = pd.DataFrame(data2, index=[4, 5, 6, 7])print(df, "\n\n", df1)

Now we apply.concatfunction in order to concat two dataframe.

Python

# using a .concat() methodframes = [df, df1]res1 = pd.concat(frames)res1

Output :
As shown in the output image, we have created two dataframe after concatenating we get one dataframe.

Concatenating DataFrame by setting logic on axes :
In order to concat dataframe, we have to set different logic on axes. We can set axes in the following three ways:

Taking the union of them all,join='outer'. This is the default option as it results in zero information loss.
Taking the intersection,join='inner'.
Use a specific index, as passed to thejoin_axesargument.

Python

# importing pandas moduleimport pandas as pd# Define a dictionary containing employee datadata1 = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age': [27, 24, 22, 32], 'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification': ['Msc', 'MA', 'MCA', 'Phd'], 'Mobile No': [97, 91, 58, 76]}# Define a dictionary containing employee datadata2 = {'Name': ['Gaurav', 'Anuj', 'Dhiraj', 'Hitesh'], 'Age': [22, 32, 12, 52], 'Address': ['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'], 'Qualification': ['MCA', 'Phd', 'Bcom', 'B.hons'], 'Salary': [1000, 2000, 3000, 4000]}# Convert the dictionary into DataFramedf = pd.DataFrame(data1, index=[0, 1, 2, 3])# Convert the dictionary into DataFramedf1 = pd.DataFrame(data2, index=[2, 3, 6, 7])print(df, "\n\n", df1)

Now we set axesjoin = innerfor intersection of dataframe.

Python

# applying concat with axes# join = 'inner'res2 = pd.concat([df, df1], axis=1, join='inner')res2

Output :
As shown in the output image, we get the intersection of dataframe.

Now we set axesjoin = outerfor union of dataframe.

Python

# using a .concat for# union of dataframeres2 = pd.concat([df, df1], axis=1, sort=False)res2

Output :
As shown in the output image, we get the union of dataframe

Now we used a specific index, as passed to thejoin_axesargument

Python

# using join_axesres3 = pd.concat([df, df1], axis=1, join_axes=[df.index]) res3

Output :

Concatenating DataFrame using.append()

In order to concat a dataframe, we use.append()function this function concatenate along axis=0, namely the index. This function exist before.concat.

Python

# importing pandas moduleimport pandas as pd# Define a dictionary containing employee datadata1 = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age': [27, 24, 22, 32], 'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification': ['Msc', 'MA', 'MCA', 'Phd']}# Define a dictionary containing employee datadata2 = {'Name': ['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'], 'Age': [17, 14, 12, 52], 'Address': ['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification': ['Btech', 'B.A', 'Bcom', 'B.hons']}# Convert the dictionary into DataFramedf = pd.DataFrame(data1, index=[0, 1, 2, 3])# Convert the dictionary into DataFramedf1 = pd.DataFrame(data2, index=[4, 5, 6, 7])print(df, "\n\n", df1)

Now we apply.append()function inorder to concat to dataframe

Python

# using append functionres = df.append(df1)res

Output :

Concatenating DataFrame by ignoring indexes :

In order to concat a dataframe by ignoring indexes, we ignore index which don’t have a meaningful meaning, you may wish to append them and ignore the fact that they may have overlapping indexes. In order to do that we useignore_indexas an argument.

Python

# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Msc', 'MA', 'MCA', 'Phd'], 'Mobile No': [97, 91, 58, 76]} # Define a dictionary containing employee data data2 = {'Name':['Gaurav', 'Anuj', 'Dhiraj', 'Hitesh'], 'Age':[22, 32, 12, 52], 'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'], 'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons'], 'Salary':[1000, 2000, 3000, 4000]} # Convert the dictionary into DataFrame df = pd.DataFrame(data1,index=[0, 1, 2, 3]) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2, index=[2, 3, 6, 7]) print(df, "\n\n", df1)

Now we are going to applyignore_indexas an argument.

Python

# using ignore_indexres = pd.concat([df, df1], ignore_index=True) res

Output :

Concatenating DataFrame with group keys :

In order to concat dataframe with group keys, we override the column names with the use of the keys argument. Keys argument is to override the column names when creating a new DataFrame based on existing Series.

Python

# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Msc', 'MA', 'MCA', 'Phd']} # Define a dictionary containing employee data data2 = {'Name':['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'], 'Age':[17, 14, 12, 52], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1,index=[0, 1, 2, 3]) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2, index=[4, 5, 6, 7]) print(df, "\n\n", df1)

Now we use keys as an argument.

Python

# using keys frames = [df, df1 ] res = pd.concat(frames, keys=['x', 'y'])res

Output :

Concatenating with mixed ndims :

User can concatenate a mix of Series and DataFrame. The Series will be transformed to DataFrame with the column name as the name of the Series.

Python

# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Msc', 'MA', 'MCA', 'Phd']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1,index=[0, 1, 2, 3]) # creating a seriess1 = pd.Series([1000, 2000, 3000, 4000], name='Salary') print(df, "\n\n", s1)

Now we are going to mix Series and dataframe together.

Python

# combining series and dataframeres = pd.concat([df, s1], axis=1) res

Output :
Merging DataFrame

Pandas have options for high-performance in-memory merging and joining. When we need to combine very large DataFrames, joins serve as a powerful way to perform these operations swiftly. Joins can only be done on two DataFrames at a time, denoted as left and right tables. The key is the common column that the two DataFrames will be joined on. It’s a good practice to use keys which have unique values throughout the column to avoid unintended duplication of row values. Pandas provide a single function, merge(), as the entry point for all standard database join operations between DataFrame objects.

There are four basic ways to handle the join (inner, left, right, and outer), depending on which rows must retain their data.
Code #1 :Merging a dataframe with one unique key combination

Python

# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'key': ['K0', 'K1', 'K2', 'K3'], 'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32],} # Define a dictionary containing employee data data2 = {'key': ['K0', 'K1', 'K2', 'K3'], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2) print(df, "\n\n", df1)

Now we are using.merge()with one unique key combination.

Python

# using .merge() functionres = pd.merge(df, df1, on='key') res

Output :

Code #2:Merging dataframe using multiple join keys.

Python

# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'key': ['K0', 'K1', 'K2', 'K3'], 'key1': ['K0', 'K1', 'K0', 'K1'], 'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32],} # Define a dictionary containing employee data data2 = {'key': ['K0', 'K1', 'K2', 'K3'], 'key1': ['K0', 'K0', 'K0', 'K0'], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2) print(df, "\n\n", df1)

Now we merge dataframe using multiple keys.

Python

# merging dataframe using multiple keysres1 = pd.merge(df, df1, on=['key', 'key1']) res1

Output :
Merging dataframe usinghowin an argument:

We use how argument to merge specifies how to determine which keys are to be included in the resulting table. If a key combination does not appear in either the left or right tables, the values in the joined table will be NA. Here is a summary of the how options and their SQL equivalent names:

MERGE METHOD	JOIN NAME	DESCRIPTION
left	LEFT OUTER JOIN	Use keys from left frame only
right	RIGHT OUTER JOIN	Use keys from right frame only
outer	FULL OUTER JOIN	Use union of keys from both frames
inner	INNER JOIN	Use intersection of keys from both frames

Python

# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'key': ['K0', 'K1', 'K2', 'K3'], 'key1': ['K0', 'K1', 'K0', 'K1'], 'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32],} # Define a dictionary containing employee data data2 = {'key': ['K0', 'K1', 'K2', 'K3'], 'key1': ['K0', 'K0', 'K0', 'K0'], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2) print(df, "\n\n", df1)

Now we sethow = 'left'in order to use keys from left frame only.

Joining DataFrame

In order to join dataframe, we use.join()function this function is used for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

Python

# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32]} # Define a dictionary containing employee data data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'], 'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1,index=['K0', 'K1', 'K2', 'K3']) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4']) print(df, "\n\n", df1)

Now we are use.join()method in order to join dataframes

Python

# joining dataframeres = df.join(df1) res

Output :

Now we usehow = 'outer'in order to get union

Python

# getting unionres1 = df.join(df1, how='outer') res1

Output :

Joining dataframe usingonin an argument :

In order to join dataframes we useonin an argument.join()takes an optional on argument which may be a column or multiple column names, which specifies that the passed DataFrame is to be aligned on that column in the DataFrame.

Python

# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Key':['K0', 'K1', 'K2', 'K3']} # Define a dictionary containing employee data data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'], 'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4']) print(df, "\n\n", df1)

Now we are using.joinwith “on” argument.

Python

# using on argument in joinres2 = df.join(df1, on='Key') res2

Output :

Joining singly-indexed DataFrame with multi-indexed DataFrame :

In order to join singly indexed dataframe with multi-indexed dataframe, the level will match on the name of the index of the singly-indexed frame against a level name of the multi-indexed frame.

Python

# importing pandas moduleimport pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav'], 'Age':[27, 24, 22]} # Define a dictionary containing employee data data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kanpur'], 'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1, index=pd.Index(['K0', 'K1', 'K2'], name='key')) index = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'), ('K2', 'Y2'), ('K2', 'Y3')], names=['key', 'Y']) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2, index= index) print(df, "\n\n", df1)

Now we join singly indexed dataframe with multi-indexed dataframe.

Python

# joining singly indexed with# multi indexedresult = df.join(df1, how='inner') result

Output :

tarunsarawgi_gfg

Improve

Python - Concatenation of two String Tuples

Python | Pandas Working With Text Data

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (2024)

Concatenating DataFrame

Output :
Merging DataFrame

Joining DataFrame

Please Login to comment...

Python | Pandas Merging, Joining, and Concatenating - GeeksforGeeks (2024)

Concatenating DataFrame

Output :Merging DataFrame

Joining DataFrame

Please Login to comment...

Output :
Merging DataFrame