Concatenating data - ( Pandas and NumPy )
Heading h2
Syntax
numpy.concatenate((a1, a2, ...), axis=0, out=None)
pandas.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
Example
import numpy as np
import pandas as pd
arr1 = np.array([[1,2],[3,4]])
arr2 = np.array([[5,6],[7,8]])
arr_concat = np.concatenate((arr1, arr2), axis=0)
print(arr_concat)
df1 = pd.DataFrame({'a': [1,2], 'b': [3,4]})
df2 = pd.DataFrame({'a': [5,6], 'c': [7,8]})
df_concat = pd.concat([df1, df2], axis=1)
print(df_concat)
Output
[[1 2]
[3 4]
[5 6]
[7 8]]
a b a c
0 1 3 5 7
1 2 4 6 8
Explanation
Concatenation is the process of combining two or more arrays or dataframes together to create a single array or dataframe. NumPy provides the numpy.concatenate()
function to concatenate arrays while Pandas provides the pd.concat()
function to concatenate dataframes.
In the above example, two NumPy arrays arr1
and arr2
are defined and concatenated using the numpy.concatenate()
function along the axis 0. The resulting array arr_concat
contains all the rows of both arr1
and arr2
.
Similarly, two Pandas dataframes df1
and df2
are defined and concatenated along axis 1 using the pd.concat()
function. The resulting dataframe df_concat
contains all the columns of both df1
and df2
.
Use
Concatenation is a very useful feature in data science and useful to combine different data sources together. By using NumPy and Pandas functions, we can concatenate arrays and dataframes together to produce relevant insights and analysis.
Important Points
- Concatenation is the process of combining two or more arrays or dataframes together to create a single array or dataframe
- NumPy provides the
numpy.concatenate()
function to concatenate arrays - Pandas provides the
pd.concat()
function to concatenate dataframes - Both functions take in optional arguments for axis, join, ignore_index, sort, etc.
Summary
In conclusion, concatenation is a useful feature in data science to join two or more arrays or dataframes together. NumPy and Pandas provide different functions for concatenating data with different optional arguments to customize the output. Concatenation is a crucial step to combine different data sources together to perform relevant insights and analysis.