pandas
  1. pandas-dataframedescribe

DataFrame.describe() - Pandas DataFrame Basics

The describe() method in the Pandas DataFrame module provides a summary of the central tendency, dispersion, and shape of the distribution of a dataset. It computes a variety of descriptive statistics and does so using the numerical columns in the DataFrame.

Syntax

The basic syntax for the describe() method is:

DataFrame.describe()

Here, DataFrame refers to the Pandas DataFrame that we want to summarize.

Example

Consider the following example:

import pandas as pd

df = pd.DataFrame({'name':['Alice', 'Bob', 'Charlie', 'Dave', 'Edith'], 
                   'age':[25, 32, 18, 47, 33], 
                   'gender':['F', 'M', 'F', 'M', 'F'], 
                   'income':[40000, 60000, 35000, 80000, 55000], 
                   'education':['Bachelors', 'Masters', 'High School', 'PhD', 'Masters']})

print(df.describe())

In this example, we first create a Pandas DataFrame with five rows and five columns. Then we use the describe() method to summarize the DataFrame.

Output

When we run the above code, we get the following output:

             age        income
count   5.000000      5.000000
mean   31.000000  55000.000000
std    11.401754  17702.177567
min    18.000000  35000.000000
25%    25.000000  40000.000000
50%    32.000000  55000.000000
75%    33.000000  60000.000000
max    47.000000  80000.000000

This output displays the count, mean, standard deviation, minimum, maximum, and the percentiles of the data in the DataFrame.

Explanation

The describe() method provides a summary of the central tendency and the dispersion of the data in the DataFrame. It also gives a summary of the distribution of the data based on the five-point summary (minimum, first quartile, median, third quartile, and maximum) and the interquartile range (IQR).

Use

The describe() method is used to quickly summarize the data stored in a Pandas DataFrame. It provides analysts and developers with a quick reference to the general properties of the DataFrame.

Important Points

  • The describe() method computes various summary statistics, including the count, mean, standard deviation, minimum, maximum, and percentiles of the data in the DataFrame.
  • The describe() method applies only to the numerical columns in the DataFrame.
  • The describe() method gives a quick reference to the central tendency, dispersion, and distribution of the data in the DataFrame.

Summary

The describe() method in Pandas provides a summary of the central tendency, dispersion, and shape of the distribution of a dataset. It is a convenient way for developers and analysts to quickly reference the central characteristics of the DataFrame in a tabular format.

Published on: