DataFrame.astype() - Pandas DataFrame Basics
DataFrame.astype()
is a method used to cast all or selected columns of a Pandas DataFrame to another data type. It is useful for converting the data type of columns to facilitate data analysis, data cleaning, and modeling.
Syntax
The basic syntax to cast the data type of columns using astype()
is as follows:
dataframe.astype(dtype, copy=True, errors='raise')
dtype
: The new data type to cast the columns to. For example, 'int', 'float', 'bool', etc.copy
: IfTrue
, returns a new DataFrame with the casted columns. IfFalse
, returns a view of the original DataFrame with the casted columns.errors
: If 'raise', raises an error for any column that cannot be casted to the specified data type. If 'ignore', keeps the original column data type for that column.
Example
Consider the following example, where we cast the data types of columns in a DataFrame using astype()
:
import pandas as pd
data = {'name': ['John', 'Jane', 'Mike', 'Emily'],
'age': [25, 19, 37, 42],
'salary': [50000, 35000, 75000, 90000],
'employed': [True, True, False, True]}
df = pd.DataFrame(data)
casted_df = df.astype({'age': 'float', 'salary': 'int64'})
print(casted_df.dtypes)
In this example, we define a dictionary of data and create a DataFrame from it. We then use the astype()
method to cast the 'age' column to float
and 'salary' column to int64
. We print the resulting data types of the casted DataFrame.
Output
name object
age float64
salary int64
employed bool
dtype: object
In this example, name
and employed
columns are unchanged since the data types of these columns were not specified in the astype()
method.
Explanation
The astype()
method in Pandas allows us to cast the data type of columns in a DataFrame. It takes a dictionary of column names to new data types. By default, it returns a new copy of the DataFrame with the casted columns, but this can be changed with the copy
parameter.
Use
The astype()
method is useful for converting the data types of columns to facilitate data analysis, data cleaning, and modeling. For example, it can be used to convert string values to numeric values for mathematical operations, or to convert boolean columns to integer columns for machine learning models.
Important Points
astype()
is used to cast the data type of columns in a Pandas DataFrame.- The method takes a dictionary of column names to new data types.
- By default, the method returns a new copy of the DataFrame with the casted columns.
Summary
The astype()
method in Pandas is a powerful tool for converting the data types of columns in a DataFrame. It provides a flexible way to cast the data types of selected or all columns of a DataFrame, which is useful for data cleaning, analysis, and modeling tasks.