FastAI Course Lecture 6 Part 2 Notes

Computer Vision

Kanav Sharma


May 3, 2024

In Lecture6-Part1 we build our model using resent26d architecture with best accuracy 73%. In this notebook we will be using ConvNeXt model and aiming towards higher accuracy.

Install FastAI

!pip install -Uqq fastbook
!pip install timm

import fastbook
import timm

from fastbook import *
from import *
from import *
Let’s download food data from FastAI

path = Path('/content')
untar_data(URLs.FOOD, data=path)

# actual path to train image folder
train_path = '/content/food-101/images'
test_path = '/content/food-101/test'

# Create Test folder

import os
import random
import shutil

def move_images_to_test(source_folder, test_folder, percentage=0.1):
    # Create the test folder if it doesn't exist
    os.makedirs(test_folder, exist_ok=True)

    # Iterate through each subfolder in the source folder
    for subfolder in os.listdir(source_folder):
        subfolder_path = os.path.join(source_folder, subfolder)

        # Check if it's a directory
        if os.path.isdir(subfolder_path):
            # Get a list of all image files in the subfolder
            image_files = [f for f in os.listdir(subfolder_path) if f.endswith('.jpg')]

            # Calculate the number of images to move
            num_images_to_move = int(len(image_files) * percentage)

            # Randomly select images to move
            images_to_move = random.sample(image_files, num_images_to_move)

            # Move selected images to the test folder
            for image in images_to_move:
                source_path = os.path.join(subfolder_path, image)
                dest_path = os.path.join(test_folder, image)
                shutil.move(source_path, dest_path)

if __name__ == "__main__":
    move_images_to_test(train_path, test_path, percentage=0.15)

Let’s make it faster

Last time, we encountered a problem even though we were using the fastest architecture. It was still taking too long. For a change, let’s resize the images to 256 pixels. This will decrease the size of each pixel and allow us to train our model faster.

dls = ImageDataLoaders.from_folder(path, valid_pct=0.2, seed=42,
    item_tfms=Resize(256, method='squish'),
    batch_tfms=aug_transforms(size=128, min_scale=0.75))


Make it into a function

In this notebook, we will be experimenting with lots of models, data(image) augmentation, and other techniques. So instead of repeating the same code every time, let’s create a function that can be called whenever needed.

def train(arch, item, batch, epochs=4, learning_rate=0.0002):
    dls = ImageDataLoaders.from_folder(path, seed=42, valid_pct=0.2, item_tfms=item, batch_tfms=batch)
    learn = vision_learner(dls, arch, metrics=error_rate)
    learn.fine_tune(epochs, learning_rate)
    return learn

To ensure consistent function behavior, we have rigidly set the number of epochs and the learning rate value.

Call the function

learn = train('resnet26d', item=Resize(256),batch=aug_transforms(size=128, min_scale=0.75))
epoch train_loss valid_loss error_rate time
0 3.727076 2.929649 0.664554 04:58
epoch train_loss valid_loss error_rate time
0 3.250653 2.632962 0.618911 05:03
1 2.837729 2.394110 0.584109 05:09
2 2.680722 2.286321 0.563812 05:10
3 2.653321 2.266845 0.560149 05:08

Though we were able to reduce the time but we got higher error rate.

ConvNeXt model

In our previous notebook, we discussed how the convnext_tiny_in22k is go-to model. We would like to reduce image size to 192(multiple 32) & use squish method for data augmentation.

arch = 'convnext_tiny_in22k'

learn_squish = train(arch, item=Resize(192, method='squish'),batch=aug_transforms(size=128, min_scale=0.75))
epoch train_loss valid_loss error_rate time
0 2.464850 1.882828 0.412178 17:27
epoch train_loss valid_loss error_rate time
0 2.082745 1.638774 0.375693 21:36
1 1.828319 1.418455 0.350842 21:36
2 1.719477 1.355143 0.342327 21:34
3 1.633262 1.351079 0.340000 21:38

Data Augmentation


Squish has already been performed in the last scenario, so let’s check on crop method, which is default in Fastai.

learn_crop = train(arch, item=Resize(192),batch=aug_transforms(size=128, min_scale=0.75))
epoch train_loss valid_loss error_rate time
0 2.491307 1.843047 0.401188 17:40
epoch train_loss valid_loss error_rate time
0 2.034851 1.604407 0.364554 21:32
1 1.804160 1.384419 0.342921 21:39
2 1.620802 1.324999 0.334158 21:28
3 1.583603 1.313850 0.331733 21:29


It keeps all the original images without transforming them, unlike squish & crop, which change size of the images.

learn_padding = train(arch, item=Resize((192), method=ResizeMethod.Pad, pad_mode=PadMode.Zeros),
      batch=aug_transforms(size=(128), min_scale=0.75))
epoch train_loss valid_loss error_rate time
0 2.619984 1.938507 0.432228 17:25
epoch train_loss valid_loss error_rate time
0 2.151396 1.702708 0.392723 21:27
1 1.879488 1.477281 0.369158 21:37
2 1.783989 1.412986 0.354554 21:28
3 1.738043 1.404341 0.354356 21:25
Best among these three is Padding method. Let’s obtain it’s learning rate & see if that needs to be changed.
learn_crop.lr_find(suggest_funcs=(valley, slide))
SuggestedLRs(valley=4.365158383734524e-05, slide=0.019054606556892395)

As far as learning rate is concerned we are good to go.

Test time augmentation

Instead of making predictions on the original validation image, the model makes predictions on multiple augmented versions of the test image and combines these predictions.

For more refer : Test Time Augmentation.

tta_preds,targs = learn_crop.tta(dl=learn_crop.dls.valid)
error_rate(tta_preds, targs)

Scaling Up

Now that we have identified the best possible model and data augmentation type, let’s scale it up by increasing the image size back to 512 & number of epoch to 6.

With more than 5 epochs, we are in danger of overfitting. (10 is certainly overfitting because our model has seen every image 10 times by now).

learn = train(arch,item=Resize(512),batch=aug_transforms(size=(256), min_scale=0.75), epochs=10)
epoch train_loss valid_loss error_rate time
0 2.119423 1.545519 0.324604 31:25
epoch train_loss valid_loss error_rate time
0 1.857518 1.425361 0.301139 35:32
1 1.573771 1.231033 0.280644 35:34
2 1.429572 1.093103 0.265149 35:35
3 1.246169 1.031862 0.256931 35:35
4 1.202549 1.000573 0.249356 35:35
5 1.172318 0.974019 0.246089 35:34
6 1.095220 0.961513 0.242673 35:36
7 1.071022 0.952826 0.242079 35:40
8 1.101088 0.950712 0.241139 35:38
9 1.096281 0.950866 0.240545 35:35

This is far more accurate than our previous model.


We achieved higher accuracy compared to our Part 1 version, but it put too much strain on our GPU. It even crashed at times during the execution of the ‘Scale Up!’ part. In the next notebook, we will learn how to optimize the GPU for better performance.