DataFrame.sample() - ( Pandas DataFrame Basics )

Heading h2

Syntax

DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)

Example

import pandas as pd

df = pd.read_csv('data.csv')

# Sample random 5 rows from the dataframe
sample_df1 = df.sample(n=5)

# Sample random 20% of rows from the dataframe
sample_df2 = df.sample(frac=0.2)

print(sample_df1)
print(sample_df2)

Output

          name  age  gender
2        Alice   21  Female
7          Bob   28    Male
11       Sarah   25  Female
5         John   35    Male
9   Elizabeth   31  Female

        name  age  gender
7        Bob   28    Male
0      Alice   24  Female
10     Sarah   28  Female
4       John   27    Male

Explanation

The sample() function in Pandas is used to generate a random sample of rows from a DataFrame. It can be used to randomly select a number of rows from the DataFrame based on a specified number (n) or a fraction (frac) of the total number of rows.

The n parameter is used to specify the number of rows we want to sample from the DataFrame, while the frac parameter is used to specify the fraction of rows we want to sample. They cannot be used together.

The replace parameter can be set to True or False depending on whether we want to sample with replacement or not.

The weights parameter can be used to specify a list of weight values for each row in the DataFrame, which will influence the probability of selecting each row in the sample.

Use

The sample() function can be used in various scenarios where we need to randomly sample a subset of rows from a large DataFrame. This can be useful for data exploration and analysis, as well as for training and testing machine learning models.

Important Points

The sample() function in Pandas is used to generate a random sample of rows from a DataFrame.
The n parameter is used to specify the number of rows we want to sample from the DataFrame, while the frac parameter is used to specify the fraction of rows we want to sample. They cannot be used together.
The replace parameter can be set to True or False depending on whether we want to sample with replacement or not.
The weights parameter can be used to specify a list of weight values for each row in the DataFrame, which will influence the probability of selecting each row in the sample.

Summary

In conclusion, the sample() function in Pandas is a useful tool for randomly sampling a subset of rows from a large DataFrame. It can be used to randomly select a number of rows from the DataFrame based on a specified number or a fraction of the total number of rows. It can also be used to control whether the sample is selected with or without replacement, and to weight the probability of selecting each row in the sample. This function is particularly useful in data exploration and analysis, as well as for the training and testing of machine learning models.

DataFrame.sample() - ( Pandas DataFrame Basics )

Heading h2

Syntax

Example

Output

Explanation

Use

Important Points

Summary

Pandas

pandas Introduction

pandas Features

pandas Introduction to Pandas Series

pandas Series.map()

pandas Series.std()

pandas Series.to_frame()

pandas Series.unique()

pandas Series.value_counts()

pandas Introduction to Pandas DataFrame

pandas DataFrame.append()

pandas DataFrame.apply()

pandas DataFrame.aggregate()

pandas DataFrame.assign()

pandas DataFrame.astype()

pandas DataFrame.count()

pandas DataFrame.cut()

pandas DataFrame.describe()

pandas DataFrame.drop_duplicates()

pandas DataFrame.groupby()

pandas DataFrame.head()

pandas DataFrame.hist()

pandas DataFrame.iterrows()

pandas DataFrame.join()

pandas DataFrame.mean()

pandas DataFrame.melt()

pandas DataFrame.merge()

pandas DataFrame.pivot_table()

pandas DataFrame.query()

pandas DataFrame.rename()

pandas DataFrame.sample()

pandas DataFrame.shift()

pandas DataFrame.sort()

pandas DataFrame.sum()

pandas DataFrame.to_excel()

pandas DataFrame.transform()

pandas DataFrame.transpose()

pandas DataFrame.where()

pandas Add column to DataFrame columns

pandas DataFrame to Numpy Array

pandas DataFrame to CSV

pandas Reading and Writing with Pandas

pandas Concatenation

pandas Data Operations Overview

pandas Data Processing Techniques

pandas DataFrame.corr()

pandas DataFrame.dropna()

pandas DataFrame.fillna()

pandas DataFrame.replace()

pandas DataFrame.iloc[]

pandas DataFrame.isin()

pandas DataFrame.loc[]

pandas loc vs iloc

pandas Cheat Sheet

pandas Introduction to Pandas Indexing

pandas Multiple Index

pandas Pandas Reindex

pandas Reset Index

pandas Set Index

pandas Introduction to Pandas and NumPy

pandas Boolean indexing

pandas Concatenating data

pandas Pandas vs NumPy

pandas Introduction to Pandas Time Series

pandas Datetime

pandas Time Offset

pandas Time Periods

pandas Convert string to date

pandas Plotting

pandas Sorting Methods

pandas Drop Columns in pandas