DataFrame.describe() - Pandas DataFrame Basics
The describe()
method in the Pandas DataFrame module provides a summary of the central tendency, dispersion, and shape of the distribution of a dataset. It computes a variety of descriptive statistics and does so using the numerical columns in the DataFrame.
Syntax
The basic syntax for the describe()
method is:
DataFrame.describe()
Here, DataFrame
refers to the Pandas DataFrame that we want to summarize.
Example
Consider the following example:
import pandas as pd
df = pd.DataFrame({'name':['Alice', 'Bob', 'Charlie', 'Dave', 'Edith'],
'age':[25, 32, 18, 47, 33],
'gender':['F', 'M', 'F', 'M', 'F'],
'income':[40000, 60000, 35000, 80000, 55000],
'education':['Bachelors', 'Masters', 'High School', 'PhD', 'Masters']})
print(df.describe())
In this example, we first create a Pandas DataFrame with five rows and five columns. Then we use the describe()
method to summarize the DataFrame.
Output
When we run the above code, we get the following output:
age income
count 5.000000 5.000000
mean 31.000000 55000.000000
std 11.401754 17702.177567
min 18.000000 35000.000000
25% 25.000000 40000.000000
50% 32.000000 55000.000000
75% 33.000000 60000.000000
max 47.000000 80000.000000
This output displays the count, mean, standard deviation, minimum, maximum, and the percentiles of the data in the DataFrame.
Explanation
The describe()
method provides a summary of the central tendency and the dispersion of the data in the DataFrame. It also gives a summary of the distribution of the data based on the five-point summary (minimum, first quartile, median, third quartile, and maximum) and the interquartile range (IQR).
Use
The describe()
method is used to quickly summarize the data stored in a Pandas DataFrame. It provides analysts and developers with a quick reference to the general properties of the DataFrame.
Important Points
- The
describe()
method computes various summary statistics, including the count, mean, standard deviation, minimum, maximum, and percentiles of the data in the DataFrame. - The
describe()
method applies only to the numerical columns in the DataFrame. - The
describe()
method gives a quick reference to the central tendency, dispersion, and distribution of the data in the DataFrame.
Summary
The describe()
method in Pandas provides a summary of the central tendency, dispersion, and shape of the distribution of a dataset. It is a convenient way for developers and analysts to quickly reference the central characteristics of the DataFrame in a tabular format.