DataFrame.cut() - Pandas DataFrame Basics
The cut()
function is used to separate data into separate bins. It is a useful tool for data analysis and plotting. The cut()
function in Pandas can be applied to a DataFrame column to divide the data into equal-sized bins or custom-sized bins.
Syntax
The basic syntax to use the cut()
function on a DataFrame column is as follows:
pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates='raise', ordered=True)
Here, x
is the DataFrame column to be cut into bins, bins
is an integer or a sequence defining the number of bins or breaks for the data separation, and labels
is the bin labels to use as the returned categories.
Example
Consider the following example, where a DataFrame is created and the cut()
function is used to separate the data into three equal-sized bins:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(6,1), columns=['Data'])
df_cut = pd.cut(df['Data'], bins=3)
In this example, a DataFrame is created with random values using the NumPy rand()
function. The cut()
function is used to separate the Data
column of this DataFrame into three equal-sized bins.
Output
When the above program is executed, it creates a DataFrame with random values and applies the cut()
function to separate the data into three equal-sized bins. The output would look something like this:
0 (0.475, 0.706]
1 (0.475, 0.706]
2 (0.706, 0.9]
3 (0.001, 0.237]
4 (0.706, 0.9]
5 (0.237, 0.475]
Name: Data, dtype: category
Categories (3, interval[float64]): [(0.001, 0.237] < (0.237, 0.475] < (0.475, 0.706] < (0.706, 0.9]]
This output shows that the Data
column has been separated into three bins, with labels representing the ranges of each bin.
Explanation
The cut()
function in Pandas is used to separate data into equal-sized or custom-sized bins. The bins
parameter can be set to an integer value to specify the number of equal-sized bins to divide the data. Alternatively, it can be set to a sequence specifying the bin ranges.
In this example, the Data
column is separated into three equal-sized bins using the cut()
function. The output shows the resulting categories with the ranges for each bin.
Use
The cut()
function is useful for data analysis and plotting, as it provides a way to partition data into bins. It is commonly used to analyze data sets and visualize the distribution of data across multiple bins.
Important Points
- The
cut()
function is used to separate data into equal-sized or custom-sized bins. - The
bins
parameter can be set to an integer value to specify the number of equal-sized bins to divide the data. - The resulting bins can be accessed as a category data type.
Summary
The cut()
function in Pandas allows us to separate data into equal-sized or custom-sized bins. It is a useful tool for data analysis and plotting, as it provides a way to partition data and analyze the distribution of data across multiple bins.