pytorch
  1. pytorch-perceptron-create-dataset

Perceptron: Create Dataset - ( Perceptron in PyTorch )

Heading h2

Syntax

import torch
from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
    def __init__(self, data, targets):
        self.data = data
        self.targets = targets
    
    def __len__(self):
        return len(self.targets)
    
    def __getitem__(self, index):
        data_sample = self.data[index]
        target_sample = self.targets[index]
        return data_sample, target_sample

Example

import torch
from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset):
    def __init__(self, data, targets):
        self.data = data
        self.targets = targets
    
    def __len__(self):
        return len(self.targets)
    
    def __getitem__(self, index):
        data_sample = self.data[index]
        target_sample = self.targets[index]
        return data_sample, target_sample

data = [[0.2, 0.3], [0.4, 0.5], [0.6, 0.1], [0.7, 0.2]]
targets = [0, 0, 1, 1]

dataset = CustomDataset(data, targets)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True, num_workers=2)

for batch_idx, (data, target) in enumerate(dataloader):
    print("Batch index {}, data {}, target {}".format(batch_idx, data, target))

Output

Batch index 0, data tensor([[0.2000, 0.3000],
        [0.4000, 0.5000]]), target tensor([0, 0])
Batch index 1, data tensor([[0.6000, 0.1000],
        [0.7000, 0.2000]]), target tensor([1, 1])

Explanation

Creating a dataset is the first step in training a machine learning model. In this example, we create a custom dataset class by inheriting from PyTorch's Dataset class. The __init__ method takes the data and target arrays as input arguments and stores them as object attributes. The __len__ method returns the length of the target array, and the __getitem__ method returns the data sample and target sample at a given index.

We then create a DataLoader object by passing the custom dataset object, batch size, and number of workers. The DataLoader object is used to load the data in batches during training. In this example, we loop through the DataLoader object and print the batch index, data, and target values.

Use

Custom datasets in PyTorch can be created for various machine learning tasks, such as image classification, object detection, and natural language processing. By customizing the __getitem__ method, specific data augmentations can also be applied to the input data.

Important Points

  • Creating a dataset is the first step in training a machine learning model using PyTorch
  • Custom datasets can be created by inheriting from PyTorch's Dataset class
  • The __init__, __len__, and __getitem__ methods must be defined in the custom dataset class
  • A DataLoader object is used to load the data in batches during training

Summary

To summarize, creating a custom dataset is essential in PyTorch for training a machine learning model. By creating a custom dataset class and using a DataLoader object, we can efficiently load the data in batches during training. Custom datasets can be created for various machine learning tasks and can also be customized to apply specific data augmentations.

Published on: