Smart Parking Inspection System

Using AI to Revolutionize Parking Management

Developed by Nyabiosi Sydiney Nyabiosi - 2024 Capstone Project

Your browser does not support the video tag.

Training the model - Finetuning Mask R-CNN

We will be finetuning a pre-trained Mask R-CNN model on the vehicle dataset with visible number plates for number plate detection Detection and Segmentation. It contains 150 images with 200 instances of number plates, and we will use it to illustrate how to use the new features in torchvision in order to train an object detection and instance segmentation model on a custom dataset.

Run the download_functions.py script to download helper functions for the training process.

Imports

os: Operating system operations
torch: PyTorch library
read_image: Function to read an image from a file
masks_to_boxes: Function to convert masks to bounding boxes
tv_tensors: Tensor utilities from torchvision
F: Functional utilities from torchvision transforms
torchvision: Main torchvision library
FastRCNNPredictor, FasterRCNN, AnchorGenerator, MaskRCNNPredictor: Various components for Faster R-CNN
T: More utilities for transforms in torchvision
train_one_epoch, evaluate: Functions for training and evaluation from a custom module called `engine`
utils: Utility functions, presumably for data preprocessing

VehiclesDataset Class

__init__: Initializes the dataset with root directory and transforms. It stores paths to image and mask files, enabling easy access during data loading.
__getitem__: Loads an image and its corresponding mask, preprocesses them, and returns them along with bounding boxes, masks, labels, etc. It handles data loading and preprocessing for each sample in the dataset.
__len__: Returns the total number of images in the dataset. It provides the length of the dataset, facilitating iteration over the dataset.

LicenseDetectionModel Class

__init__: Initializes the model with the number of classes and device (GPU or CPU). Sets up the model architecture, allowing customization of the number of classes and device selection.
_initialize_model: Initializes the Faster R-CNN model with a ResNet-50 backbone and modifies the classification head according to the number of classes. This function sets up the model architecture by selecting a backbone and modifying the final layer for the specific task.
get_transform: Returns a composition of transforms for data augmentation. It provides a set of transformations to be applied to input images, facilitating data augmentation during training.
train: Trains the model using the provided training and test datasets for a specified number of epochs. It also handles optimization and learning rate scheduling, encapsulating the training loop and related operations.

Execute script

Instantiates an instance of LicenseDetectionModel, initializing the model for training.
Loads the training and test datasets, preparing data for training and evaluation.
Splits the training dataset into training and validation subsets, ensuring separate subsets for training and evaluation.
Calls the train method to train the model on the training dataset, orchestrating the training process.

Overall, this script demonstrates a complete pipeline for training a vehicle detection model using Faster R-CNN with PyTorch and torchvision It includes dataset handling, model definition, training loop, and model saving.