pandas
  1. pandas-dataframewhere

DataFrame.where() - ( Pandas DataFrame Basics )

Heading h2

Syntax

DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)

Example

import pandas as pd

# create a sample dataframe
data = {'fruit': ['apple', 'banana', 'cherry', 'kiwi'],
        'count': [3, 5, 1, 2]}
df = pd.DataFrame(data)

# replace count with 0 where count is less than 3
df['count'] = df['count'].where(df['count'] >= 3, 0)
print(df)

Output

    fruit  count
0   apple      3
1  banana      5
2  cherry      0
3    kiwi      0

Explanation

DataFrame.where() is a function in pandas that serves as a powerful tool for replacing values. It takes a condition and where that condition is true, it leaves the value unaltered, and where it is false, it replaces that particular instance with an alternate value.

In the above example, we have created a sample dataframe with the fruits as columns and their respective counts as rows. Using the where function on the count column, we check for values greater than or equal to 3. Where these conditions are True, the value is left unaltered, and where the condition is False, the value is replaced with 0.

Use

The DataFrame.where() function is widely used in Python data analysis for replacing values by a condition. It is useful in cleaning, preprocessing, and data wrangling in machine learning pipelines.

Important Points

  • DataFrame.where() is a function in pandas that replaces values based on a condition
  • It takes a condition and where that condition is True, it leaves the value unaltered, and where it is False, it replaces that particular value with an alternate value
  • The inplace parameter can be set to make the changes in-place within the dataframe object itself
  • Can be used in cleaning, preprocessing, and data wrangling in machine learning pipelines

Summary

In conclusion, the DataFrame.where() function in the pandas library is a powerful tool for replacing values based on a condition. By setting a threshold condition, this function replaces the dropped value with an alternate value. It is widely used in Python data analysis for cleaning, preprocessing, and data wrangling in machine learning pipelines.

Published on: