DataFrame.where() - ( Pandas DataFrame Basics )
Heading h2
Syntax
DataFrame.where(cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)
Example
import pandas as pd
# create a sample dataframe
data = {'fruit': ['apple', 'banana', 'cherry', 'kiwi'],
'count': [3, 5, 1, 2]}
df = pd.DataFrame(data)
# replace count with 0 where count is less than 3
df['count'] = df['count'].where(df['count'] >= 3, 0)
print(df)
Output
fruit count
0 apple 3
1 banana 5
2 cherry 0
3 kiwi 0
Explanation
DataFrame.where()
is a function in pandas that serves as a powerful tool for replacing values. It takes a condition and where that condition is true, it leaves the value unaltered, and where it is false, it replaces that particular instance with an alternate value.
In the above example, we have created a sample dataframe with the fruits as columns and their respective counts as rows. Using the where
function on the count
column, we check for values greater than or equal to 3. Where these conditions are True
, the value is left unaltered, and where the condition is False
, the value is replaced with 0.
Use
The DataFrame.where()
function is widely used in Python data analysis for replacing values by a condition. It is useful in cleaning, preprocessing, and data wrangling in machine learning pipelines.
Important Points
DataFrame.where()
is a function in pandas that replaces values based on a condition- It takes a condition and where that condition is
True
, it leaves the value unaltered, and where it isFalse
, it replaces that particular value with an alternate value - The
inplace
parameter can be set to make the changes in-place within the dataframe object itself - Can be used in cleaning, preprocessing, and data wrangling in machine learning pipelines
Summary
In conclusion, the DataFrame.where()
function in the pandas library is a powerful tool for replacing values based on a condition. By setting a threshold condition, this function replaces the dropped value with an alternate value. It is widely used in Python data analysis for cleaning, preprocessing, and data wrangling in machine learning pipelines.