pandas
  1. pandas-concatenating-data

Concatenating data - ( Pandas and NumPy )

Heading h2

Syntax

numpy.concatenate((a1, a2, ...), axis=0, out=None)
pandas.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)

Example

import numpy as np
import pandas as pd

arr1 = np.array([[1,2],[3,4]])
arr2 = np.array([[5,6],[7,8]])

arr_concat = np.concatenate((arr1, arr2), axis=0)
print(arr_concat)

df1 = pd.DataFrame({'a': [1,2], 'b': [3,4]})
df2 = pd.DataFrame({'a': [5,6], 'c': [7,8]})

df_concat = pd.concat([df1, df2], axis=1)
print(df_concat)

Output

[[1 2]
 [3 4]
 [5 6]
 [7 8]]

   a  b  a  c
0  1  3  5  7
1  2  4  6  8

Explanation

Concatenation is the process of combining two or more arrays or dataframes together to create a single array or dataframe. NumPy provides the numpy.concatenate() function to concatenate arrays while Pandas provides the pd.concat() function to concatenate dataframes.

In the above example, two NumPy arrays arr1 and arr2 are defined and concatenated using the numpy.concatenate() function along the axis 0. The resulting array arr_concat contains all the rows of both arr1 and arr2.

Similarly, two Pandas dataframes df1 and df2 are defined and concatenated along axis 1 using the pd.concat() function. The resulting dataframe df_concat contains all the columns of both df1 and df2.

Use

Concatenation is a very useful feature in data science and useful to combine different data sources together. By using NumPy and Pandas functions, we can concatenate arrays and dataframes together to produce relevant insights and analysis.

Important Points

  • Concatenation is the process of combining two or more arrays or dataframes together to create a single array or dataframe
  • NumPy provides the numpy.concatenate() function to concatenate arrays
  • Pandas provides the pd.concat() function to concatenate dataframes
  • Both functions take in optional arguments for axis, join, ignore_index, sort, etc.

Summary

In conclusion, concatenation is a useful feature in data science to join two or more arrays or dataframes together. NumPy and Pandas provide different functions for concatenating data with different optional arguments to customize the output. Concatenation is a crucial step to combine different data sources together to perform relevant insights and analysis.

Published on: