Python SimpleImputer Module

The SimpleImputer module in Python is a part of the scikit-learn library that is used to preprocess data before the training of the model. It is used to handle the missing values of numeric data using various methods such as mean, median, most_frequent, and constant.

Syntax

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = 'NaN', strategy ='mean')
imputer.fit(data)
imputed_data = imputer.transform(data)

Example

Let's consider a dataset which contains some missing values.

import pandas as pd
from sklearn.impute import SimpleImputer

data = pd.read_csv('data.csv')
print(data)

Output:

   Age  Salary
0   25   25000
1   27   30000
2   30    NaN
3   33   35000
4   35    NaN
5   40   42000

Now, we can use SimpleImputer module to replace the missing values with the mean of remaining values.

imputer = SimpleImputer(missing_values = 'NaN', strategy ='mean')
imputer.fit(data)
imputed_data = imputer.transform(data)
print(imputed_data)

Output:

[[2.5000e+01 2.5000e+04]
 [2.7000e+01 3.0000e+04]
 [3.0000e+01 3.3333e+04]
 [3.3000e+01 3.5000e+04]
 [3.5000e+01 3.3333e+04]
 [4.0000e+01 4.2000e+04]]

Explanation

In the above example, we have imported SimpleImputer module from sklearn.impute, and read a dataset using pandas library. Then, by using SimpleImputer, we have replaced the missing values with the mean of remaining values. The missing values are denoted as NaN in the dataset.

Use

SimpleImputer module is used to preprocess the data before training the machine learning models. It helps in maintaining the consistency of the dataset, which leads to better output.

Important Points

SimpleImputer is a part of the scikit-learn library.
It is used to handle missing values in a dataset.
The missing values are replaced with selected strategy such as mean, median, mode or constant.
Missing values are denoted as NaN in the dataset.

Summary

SimpleImputer module is used to preprocess the data and replace the missing values with selected strategy such as mean, median, mode or constant. The missing values are denoted as NaN in the dataset. It is a part of scikit-learn library and is primarily used in machine learning models for handling missing values.

Python SimpleImputer Module

Syntax

Example

Explanation

Use

Important Points

Summary

Python

python Introduction

python Features

python Applications

python Installation

python Variables

python Data Types

python Keywords

python Literals

python Operators Overview

python Arithmetic Operators

python Comparison Operators

python Logical Operators

python If-Else Statements

python Loops

python For Loop

python While Loop

python Break

python Continue

python Pass

python Strings

python Lists

python Tuples

python List vs Tuple

python Sets

python Dictionaries

python Built-in Functions

python Lambda Functions

python Reading and Writing Files

python Working with Modules

python Error Handling

python Date and Time

python Introduction to Regex

python Sending Emails

python Read CSV File

python Write CSV File

python Read Excel File

python Write Excel File

python Assert Statement

python List Comprehension

python Collection Module

python Math Module

python OS Module

python Random Module

python Statistics Module

python Sys Module

python IDEs for Python

python Arrays

python Command Line Arguments

python Magic Methods

python Stack & Queue

python Introduction to PySpark MLlib

python Decorators

python Generators

python Web Scraping Using Python

python JSON Handling

python Itertools

python Multiprocessing

python Calculating Distance with GEOPY

python Gmail API

python Plotting Google Maps with Folium

python Grid Search

python High Order Function

python nsetools

python Fibonacci Number

python OpenCV Object Detection

python SimpleImputer Module

python Finding Second Largest Number

python OOPs Concepts

python Object Class

python Constructors

python Inheritance

python Abstraction