Smart Parking Inspection System

Using AI to Revolutionize Parking Management

Developed by Nyabiosi Sydiney Nyabiosi - 2024 Capstone Project

Your browser does not support the video tag.


Training the model - Finetuning Mask R-CNN

We will be finetuning a pre-trained Mask R-CNN model on the vehicle dataset with visible number plates for number plate detection Detection and Segmentation. It contains 150 images with 200 instances of number plates, and we will use it to illustrate how to use the new features in torchvision in order to train an object detection and instance segmentation model on a custom dataset.

Run the download_functions.py script to download helper functions for the training process.

Imports

  1. os: Operating system operations
  2. torch: PyTorch library
  3. read_image: Function to read an image from a file
  4. masks_to_boxes: Function to convert masks to bounding boxes
  5. tv_tensors: Tensor utilities from torchvision
  6. F: Functional utilities from torchvision transforms
  7. torchvision: Main torchvision library
  8. FastRCNNPredictor, FasterRCNN, AnchorGenerator, MaskRCNNPredictor: Various components for Faster R-CNN
  9. T: More utilities for transforms in torchvision
  10. train_one_epoch, evaluate: Functions for training and evaluation from a custom module called `engine`
  11. utils: Utility functions, presumably for data preprocessing

VehiclesDataset Class

  1. __init__: Initializes the dataset with root directory and transforms. It stores paths to image and mask files, enabling easy access during data loading.
  2. __getitem__: Loads an image and its corresponding mask, preprocesses them, and returns them along with bounding boxes, masks, labels, etc. It handles data loading and preprocessing for each sample in the dataset.
  3. __len__: Returns the total number of images in the dataset. It provides the length of the dataset, facilitating iteration over the dataset.

LicenseDetectionModel Class

  1. __init__: Initializes the model with the number of classes and device (GPU or CPU). Sets up the model architecture, allowing customization of the number of classes and device selection.
  2. _initialize_model: Initializes the Faster R-CNN model with a ResNet-50 backbone and modifies the classification head according to the number of classes. This function sets up the model architecture by selecting a backbone and modifying the final layer for the specific task.
  3. get_transform: Returns a composition of transforms for data augmentation. It provides a set of transformations to be applied to input images, facilitating data augmentation during training.
  4. train: Trains the model using the provided training and test datasets for a specified number of epochs. It also handles optimization and learning rate scheduling, encapsulating the training loop and related operations.

Execute script

  1. Instantiates an instance of LicenseDetectionModel, initializing the model for training.
  2. Loads the training and test datasets, preparing data for training and evaluation.
  3. Splits the training dataset into training and validation subsets, ensuring separate subsets for training and evaluation.
  4. Calls the train method to train the model on the training dataset, orchestrating the training process.

Overall, this script demonstrates a complete pipeline for training a vehicle detection model using Faster R-CNN with PyTorch and torchvision It includes dataset handling, model definition, training loop, and model saving.