In Lecture6-Part1 we build our model using resent26d
architecture with best accuracy 73%
. In this notebook we will be using ConvNeXt
model and aiming towards higher accuracy.
Install FastAI
#hide
!pip install -Uqq fastbook
!pip install timm
import fastbook
fastbook.setup_book()import timm
#hide
from fastbook import *
from fastai.vision.widgets import *
from fastai.vision.all import *
Requirement already satisfied: timm in /opt/conda/lib/python3.10/site-packages (0.9.16)
Requirement already satisfied: torch in /opt/conda/lib/python3.10/site-packages (from timm) (2.1.2)
Requirement already satisfied: torchvision in /opt/conda/lib/python3.10/site-packages (from timm) (0.16.2)
Requirement already satisfied: pyyaml in /opt/conda/lib/python3.10/site-packages (from timm) (6.0.1)
Requirement already satisfied: huggingface_hub in /opt/conda/lib/python3.10/site-packages (from timm) (0.20.3)
Requirement already satisfied: safetensors in /opt/conda/lib/python3.10/site-packages (from timm) (0.4.2)
Requirement already satisfied: filelock in /opt/conda/lib/python3.10/site-packages (from huggingface_hub->timm) (3.13.1)
Requirement already satisfied: fsspec>=2023.5.0 in /opt/conda/lib/python3.10/site-packages (from huggingface_hub->timm) (2024.2.0)
Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from huggingface_hub->timm) (2.31.0)
Requirement already satisfied: tqdm>=4.42.1 in /opt/conda/lib/python3.10/site-packages (from huggingface_hub->timm) (4.66.1)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.10/site-packages (from huggingface_hub->timm) (4.9.0)
Requirement already satisfied: packaging>=20.9 in /opt/conda/lib/python3.10/site-packages (from huggingface_hub->timm) (21.3)
Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch->timm) (1.12)
Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch->timm) (3.2.1)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.10/site-packages (from torch->timm) (3.1.2)
Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from torchvision->timm) (1.26.4)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /opt/conda/lib/python3.10/site-packages (from torchvision->timm) (9.5.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.10/site-packages (from packaging>=20.9->huggingface_hub->timm) (3.1.1)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2->torch->timm) (2.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests->huggingface_hub->timm) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->huggingface_hub->timm) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests->huggingface_hub->timm) (1.26.18)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->huggingface_hub->timm) (2024.2.2)
Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch->timm) (1.3.0)
Let’s download food data from FastAI
= Path('/content')
path =path)
untar_data(URLs.FOOD, data
# actual path to train image folder
= '/content/food-101/images'
train_path = '/content/food-101/test'
test_path
# Create Test folder
import os
import random
import shutil
def move_images_to_test(source_folder, test_folder, percentage=0.1):
# Create the test folder if it doesn't exist
=True)
os.makedirs(test_folder, exist_ok
# Iterate through each subfolder in the source folder
for subfolder in os.listdir(source_folder):
= os.path.join(source_folder, subfolder)
subfolder_path
# Check if it's a directory
if os.path.isdir(subfolder_path):
# Get a list of all image files in the subfolder
= [f for f in os.listdir(subfolder_path) if f.endswith('.jpg')]
image_files
# Calculate the number of images to move
= int(len(image_files) * percentage)
num_images_to_move
# Randomly select images to move
= random.sample(image_files, num_images_to_move)
images_to_move
# Move selected images to the test folder
for image in images_to_move:
= os.path.join(subfolder_path, image)
source_path = os.path.join(test_folder, image)
dest_path
shutil.move(source_path, dest_path)
if __name__ == "__main__":
=0.15) move_images_to_test(train_path, test_path, percentage
100.00% [5686607872/5686607260 02:31<00:00]
Let’s make it faster
Last time, we encountered a problem even though we were using the fastest architecture. It was still taking too long. For a change, let’s resize the images to 256
pixels. This will decrease the size of each pixel and allow us to train our model faster.
= ImageDataLoaders.from_folder(path, valid_pct=0.2, seed=42,
dls =Resize(256, method='squish'),
item_tfms=aug_transforms(size=128, min_scale=0.75))
batch_tfms
=4) dls.show_batch(max_n
Make it into a function
In this notebook, we will be experimenting with lots of models, data(image) augmentation, and other techniques. So instead of repeating the same code every time, let’s create a function that can be called whenever needed.
def train(arch, item, batch, epochs=4, learning_rate=0.0002):
= ImageDataLoaders.from_folder(path, seed=42, valid_pct=0.2, item_tfms=item, batch_tfms=batch)
dls = vision_learner(dls, arch, metrics=error_rate)
learn
learn.fine_tune(epochs, learning_rate)return learn
To ensure consistent function behavior, we have rigidly set the number of epochs and the learning rate value.
Call the function
= train('resnet26d', item=Resize(256),batch=aug_transforms(size=128, min_scale=0.75)) learn
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 3.727076 | 2.929649 | 0.664554 | 04:58 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 3.250653 | 2.632962 | 0.618911 | 05:03 |
1 | 2.837729 | 2.394110 | 0.584109 | 05:09 |
2 | 2.680722 | 2.286321 | 0.563812 | 05:10 |
3 | 2.653321 | 2.266845 | 0.560149 | 05:08 |
Though we were able to reduce the time but we got higher error rate.
ConvNeXt model
In our previous notebook, we discussed how the convnext_tiny_in22k is go-to model. We would like to reduce image size to 192(multiple 32) & use squish method for data augmentation.
= 'convnext_tiny_in22k'
arch
= train(arch, item=Resize(192, method='squish'),batch=aug_transforms(size=128, min_scale=0.75)) learn_squish
/opt/conda/lib/python3.10/site-packages/timm/models/_factory.py:117: UserWarning: Mapping deprecated model name convnext_tiny_in22k to current convnext_tiny.fb_in22k.
model = create_fn(
model.safetensors: 0%| | 0.00/178M [00:00<?, ?B/s]
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 2.464850 | 1.882828 | 0.412178 | 17:27 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 2.082745 | 1.638774 | 0.375693 | 21:36 |
1 | 1.828319 | 1.418455 | 0.350842 | 21:36 |
2 | 1.719477 | 1.355143 | 0.342327 | 21:34 |
3 | 1.633262 | 1.351079 | 0.340000 | 21:38 |
Data Augmentation
Crop
Squish has already been performed in the last scenario, so let’s check on crop method, which is default in Fastai.
= train(arch, item=Resize(192),batch=aug_transforms(size=128, min_scale=0.75)) learn_crop
/opt/conda/lib/python3.10/site-packages/timm/models/_factory.py:117: UserWarning: Mapping deprecated model name convnext_tiny_in22k to current convnext_tiny.fb_in22k.
model = create_fn(
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 2.491307 | 1.843047 | 0.401188 | 17:40 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 2.034851 | 1.604407 | 0.364554 | 21:32 |
1 | 1.804160 | 1.384419 | 0.342921 | 21:39 |
2 | 1.620802 | 1.324999 | 0.334158 | 21:28 |
3 | 1.583603 | 1.313850 | 0.331733 | 21:29 |
'Lecture6_Part2_Food_Convnext_Tiny_Crop.pkl') learn_crop.export(
Padding
It keeps all the original images without transforming them, unlike squish & crop, which change size of the images.
= train(arch, item=Resize((192), method=ResizeMethod.Pad, pad_mode=PadMode.Zeros),
learn_padding =aug_transforms(size=(128), min_scale=0.75)) batch
/opt/conda/lib/python3.10/site-packages/timm/models/_factory.py:117: UserWarning: Mapping deprecated model name convnext_tiny_in22k to current convnext_tiny.fb_in22k.
model = create_fn(
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 2.619984 | 1.938507 | 0.432228 | 17:25 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 2.151396 | 1.702708 | 0.392723 | 21:27 |
1 | 1.879488 | 1.477281 | 0.369158 | 21:37 |
2 | 1.783989 | 1.412986 | 0.354554 | 21:28 |
3 | 1.738043 | 1.404341 | 0.354356 | 21:25 |
'Lecture6_Part2_Food_Convnext_Tiny_Padding.pkl') learn_padding.export(
Best among these three is Padding method. Let’s obtain it’s learning rate & see if that needs to be changed.
=(valley, slide)) learn_crop.lr_find(suggest_funcs
SuggestedLRs(valley=4.365158383734524e-05, slide=0.019054606556892395)
As far as learning rate is concerned we are good to go.
Test time augmentation
Instead of making predictions on the original validation image, the model makes predictions on multiple augmented versions of the test image and combines these predictions.
For more refer : Test Time Augmentation.
= learn_crop.tta(dl=learn_crop.dls.valid)
tta_preds,targs error_rate(tta_preds, targs)
TensorBase(0.3120)
Scaling Up
Now that we have identified the best possible model and data augmentation type, let’s scale it up by increasing the image size back to 512
& number of epoch to 6
.
With more than 5 epochs, we are in danger of overfitting. (10
is certainly overfitting because our model has seen every image 10 times by now).
= train(arch,item=Resize(512),batch=aug_transforms(size=(256), min_scale=0.75), epochs=10) learn
/opt/conda/lib/python3.10/site-packages/timm/models/_factory.py:117: UserWarning: Mapping deprecated model name convnext_tiny_in22k to current convnext_tiny.fb_in22k.
model = create_fn(
model.safetensors: 0%| | 0.00/178M [00:00<?, ?B/s]
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 2.119423 | 1.545519 | 0.324604 | 31:25 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 1.857518 | 1.425361 | 0.301139 | 35:32 |
1 | 1.573771 | 1.231033 | 0.280644 | 35:34 |
2 | 1.429572 | 1.093103 | 0.265149 | 35:35 |
3 | 1.246169 | 1.031862 | 0.256931 | 35:35 |
4 | 1.202549 | 1.000573 | 0.249356 | 35:35 |
5 | 1.172318 | 0.974019 | 0.246089 | 35:34 |
6 | 1.095220 | 0.961513 | 0.242673 | 35:36 |
7 | 1.071022 | 0.952826 | 0.242079 | 35:40 |
8 | 1.101088 | 0.950712 | 0.241139 | 35:38 |
9 | 1.096281 | 0.950866 | 0.240545 | 35:35 |
This is far more accurate than our previous model.
Conclusion
We achieved higher accuracy compared to our Part 1 version, but it put too much strain on our GPU. It even crashed at times during the execution of the ‘Scale Up!’ part. In the next notebook, we will learn how to optimize the GPU for better performance.