pandas
  1. pandas

Pandas - Pandas Tutorial

Pandas is a powerful data analysis toolkit for Python that provides a fast and easy way to manipulate and analyze data. It is built on top of the NumPy library, and provides tools for working with tabular data, time-series analysis, and data visualization.

Syntax

Pandas introduces two primary classes for working with data - Series and DataFrame. A Series represents a single column of data, while a DataFrame represents a collection of columns that can be used to store and manipulate data in tabular form. The following is the basic syntax for creating a DataFrame:

import pandas as pd

df = pd.DataFrame(data, columns=[<list of column names>])

where data can be a dictionary with column names as keys and corresponding values as a list of data, or a list of dictionaries with keys representing column names and values representing corresponding data.

Example

Consider the following example, where we create and manipulate a simple DataFrame:

import pandas as pd

data = {'Name': ['John', 'Emma', 'Mark', 'Oliver'],
        'Age': [30, 25, 33, 28],
        'Gender': ['Male', 'Female', 'Male', 'Male']}
df = pd.DataFrame(data, columns=['Name', 'Age', 'Gender'])

print(df)

In this example, we create a DataFrame from the dictionary data, with columns 'Name', 'Age', and 'Gender'. The resulting DataFrame is then printed to the console.

Output

When the above program is executed, it displays the following output:

     Name  Age  Gender
0    John   30    Male
1    Emma   25  Female
2    Mark   33    Male
3  Oliver   28    Male

Explanation

The DataFrame class in Pandas provides a powerful way to store and manipulate data in a tabular format. The pd.DataFrame constructor is used to create a DataFrame from a given dataset. Columns are specified as a list of column names. In this example, we used a dictionary to create the DataFrame, with column names as keys and corresponding data as values.

Use

Pandas can be used for a wide variety of data analysis tasks, including data cleaning and manipulation, data aggregation, time-series analysis, and data visualization. It provides a fast and easy way to work with large datasets, and is widely used in the data science community.

Important Points

  • Pandas introduces two primary classes for working with data - Series and DataFrame.
  • Series represents a single column of data, while DataFrame represents a collection of columns that can be used to store and manipulate data in tabular form.
  • Columns can be specified as a list of column names.

Summary

Pandas is a powerful data analysis toolkit for Python that provides a fast and easy way to manipulate and analyze data. The primary classes in Pandas are Series and DataFrame, which are used to work with tabular data. Pandas provides a variety of tools for data cleaning and manipulation, data aggregation, time-series analysis, and data visualization, making it a popular choice for data analysis tasks.

Published on:
Pandas Cheat Sheet
Pandas Plotting