pandas
  1. pandas-dataframecut

DataFrame.cut() - Pandas DataFrame Basics

The cut() function is used to separate data into separate bins. It is a useful tool for data analysis and plotting. The cut() function in Pandas can be applied to a DataFrame column to divide the data into equal-sized bins or custom-sized bins.

Syntax

The basic syntax to use the cut() function on a DataFrame column is as follows:

pandas.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates='raise', ordered=True)

Here, x is the DataFrame column to be cut into bins, bins is an integer or a sequence defining the number of bins or breaks for the data separation, and labels is the bin labels to use as the returned categories.

Example

Consider the following example, where a DataFrame is created and the cut() function is used to separate the data into three equal-sized bins:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(6,1), columns=['Data'])

df_cut = pd.cut(df['Data'], bins=3)

In this example, a DataFrame is created with random values using the NumPy rand() function. The cut() function is used to separate the Data column of this DataFrame into three equal-sized bins.

Output

When the above program is executed, it creates a DataFrame with random values and applies the cut() function to separate the data into three equal-sized bins. The output would look something like this:

0    (0.475, 0.706]
1    (0.475, 0.706]
2      (0.706, 0.9]
3    (0.001, 0.237]
4      (0.706, 0.9]
5    (0.237, 0.475]
Name: Data, dtype: category
Categories (3, interval[float64]): [(0.001, 0.237] < (0.237, 0.475] < (0.475, 0.706] < (0.706, 0.9]]

This output shows that the Data column has been separated into three bins, with labels representing the ranges of each bin.

Explanation

The cut() function in Pandas is used to separate data into equal-sized or custom-sized bins. The bins parameter can be set to an integer value to specify the number of equal-sized bins to divide the data. Alternatively, it can be set to a sequence specifying the bin ranges.

In this example, the Data column is separated into three equal-sized bins using the cut() function. The output shows the resulting categories with the ranges for each bin.

Use

The cut() function is useful for data analysis and plotting, as it provides a way to partition data into bins. It is commonly used to analyze data sets and visualize the distribution of data across multiple bins.

Important Points

  • The cut() function is used to separate data into equal-sized or custom-sized bins.
  • The bins parameter can be set to an integer value to specify the number of equal-sized bins to divide the data.
  • The resulting bins can be accessed as a category data type.

Summary

The cut() function in Pandas allows us to separate data into equal-sized or custom-sized bins. It is a useful tool for data analysis and plotting, as it provides a way to partition data and analyze the distribution of data across multiple bins.

Published on: