Perceptron: Create Dataset - ( Perceptron in PyTorch )
Heading h2
Syntax
import torch
from torch.utils.data import Dataset, DataLoader
class CustomDataset(Dataset):
def __init__(self, data, targets):
self.data = data
self.targets = targets
def __len__(self):
return len(self.targets)
def __getitem__(self, index):
data_sample = self.data[index]
target_sample = self.targets[index]
return data_sample, target_sample
Example
import torch
from torch.utils.data import Dataset, DataLoader
class CustomDataset(Dataset):
def __init__(self, data, targets):
self.data = data
self.targets = targets
def __len__(self):
return len(self.targets)
def __getitem__(self, index):
data_sample = self.data[index]
target_sample = self.targets[index]
return data_sample, target_sample
data = [[0.2, 0.3], [0.4, 0.5], [0.6, 0.1], [0.7, 0.2]]
targets = [0, 0, 1, 1]
dataset = CustomDataset(data, targets)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True, num_workers=2)
for batch_idx, (data, target) in enumerate(dataloader):
print("Batch index {}, data {}, target {}".format(batch_idx, data, target))
Output
Batch index 0, data tensor([[0.2000, 0.3000],
[0.4000, 0.5000]]), target tensor([0, 0])
Batch index 1, data tensor([[0.6000, 0.1000],
[0.7000, 0.2000]]), target tensor([1, 1])
Explanation
Creating a dataset is the first step in training a machine learning model. In this example, we create a custom dataset class by inheriting from PyTorch's Dataset
class. The __init__
method takes the data and target arrays as input arguments and stores them as object attributes. The __len__
method returns the length of the target array, and the __getitem__
method returns the data sample and target sample at a given index.
We then create a DataLoader
object by passing the custom dataset object, batch size, and number of workers. The DataLoader
object is used to load the data in batches during training. In this example, we loop through the DataLoader
object and print the batch index, data, and target values.
Use
Custom datasets in PyTorch can be created for various machine learning tasks, such as image classification, object detection, and natural language processing. By customizing the __getitem__
method, specific data augmentations can also be applied to the input data.
Important Points
- Creating a dataset is the first step in training a machine learning model using PyTorch
- Custom datasets can be created by inheriting from PyTorch's
Dataset
class - The
__init__
,__len__
, and__getitem__
methods must be defined in the custom dataset class - A
DataLoader
object is used to load the data in batches during training
Summary
To summarize, creating a custom dataset is essential in PyTorch for training a machine learning model. By creating a custom dataset class and using a DataLoader
object, we can efficiently load the data in batches during training. Custom datasets can be created for various machine learning tasks and can also be customized to apply specific data augmentations.