Automating Classification Tasks My Way

3 min readJan 5, 2024

I’m working on a project that requires training a machine learning model and I have 50,000 image files. I have to categorize and annotate these images and get them ready in two weeks. This would mean sacrificing my sleep and other things I have to do, so I went searching for implementations of machine learning models that has already done this. I found one implementation with the final model in a h5 file, though it doesn’t have all the classes I want, but it will go a long way to cut the manual task in half.

Steps that I took

Step One: Reading through code implementation

I read through the implementation to understand how the model works — everything about the model including its training was in the repository. I checked to see if there was any requirements.txt file, but there was none as it is a relatively simple model.

Step Two: Importing the files

The files were in a folder that was zipped, what I did was to open Google Colab, uploaded the zip file from my computer to the notebook.

Step Three: Unzipping, loading and preprocessing files

The next thing I did was to unzip the file.

import zipfile
import os
import tensorflow as tf

data_zip = "the-zipped-file"
extract_path = "extraction-path"

# Function to extract zip file
def unzip_file(zip_path, extract_path):
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(extract_path)

# Unzip the file
unzip_file(data_zip, extract_path)

Wrote a script to preprocess the file using the code snippet from the repository itself.

IMG_SIZE = 224

def process_image(image_path,img_size=IMG_SIZE):
  try:
    """
    Take an image file path and turn image into a Tensor.
    """
    image = tf.io.read_file(image_path) # Read image file
    image = tf.image.decode_jpeg(image,channels=3) # Turn the image into 3 channels RGB
    image = tf.image.convert_image_dtype(image,tf.float32) # Turn the value 0-255 to 0-1
    image = tf.image.resize(image,size=[img_size,img_size]) # Resize the image to 224x224
    return image # Return the image
  except Exception as e:
        print(f"Error loading image {image_path}: {e}")
        return None

I had to add an exception handling in the event there were any empty images or corrupted images.

Step Four: Loading the pretrained model:

The next thing I did was to load their pretrained model (I had to download the model to my local machine first) using tensorflow.

from tensorflow import keras
import tensorflow_hub as hub

model_path = "model.h5"

my_model = tf.keras.models.load_model(
       (model_path),
       custom_objects={'KerasLayer':hub.KerasLayer}
)

# Register the custom object
@keras.utils.custom_object_scope
def KerasLayer(x):
  return x ** 2

I had to register the model first as it was a custom object.

Step Five: Classification

Here is where I performed my classification tasks.

# List of labels
object_list = ['Label1', 'Label2', 'Label3', 'Label4']
threshold=0.75

# Function to classify images using a pre-trained model
def classify_images(my_model, output_folder, object_list, threshold=0.75):
    misc_folder = os.path.join(output_folder, 'misc')
    os.makedirs(misc_folder, exist_ok=True)

    for label in object_list:
        label_folder = os.path.join(output_folder, label)
        os.makedirs(label_folder, exist_ok=True)

    for filename in os.listdir(data):
        if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
            img_path = os.path.join(data, filename)
            img = process_image(img_path)
            if img is None:
                print(f"Skipping {img_path} due to loading error.")
                continue
            img = np.expand_dims(img, axis=0)  # Add batch dimension
            prediction = my_model.predict(img)

            # Check if any of the predicted labels is in the object_list
            predicted_label_index = np.argmax(prediction)
            predicted_label = object_list[predicted_label_index]

            # Check if the prediction exceeds the threshold
            if prediction[0][predicted_label_index] >= threshold:
                category_folder = os.path.join(output_folder, predicted_label)
            else:
                category_folder = misc_folder

            destination_path = os.path.join(category_folder, filename)
            os.rename(img_path, destination_path)

I listed out the labels associated with the model, then put my threshold at 75%. So, the model will predict the input images according to the labels, then those with 75% and above certainty, it sorts to that label. But those with certainty of below 75%, it sorts to a folder called ‘misc’ which it creates if the folder does not exist. It loops through the image file to sort files ending with .jpeg, .jpg, and .png. If it gets to a corrupted image file, it skips it and then continues until it does.

# Classify images and move to appropriate folders
classify_images(my_model, "output_folder", object_list, threshold=threshold)

Then I call the function ‘classify_images’ with the defined arguments.

Final Step: Zipping and downloading

After the ‘classify_images’ was done running, I zip the folder using this script

# prompt: zip output folder

!zip -r output_folder.zip output_folder

After zipping, I clicked on the three dots beside the file on colab and selected the download option.

Caveat: You have to manually go through the categories to make sure that the data is right.

This is less time consuming and faster for me than having to build everything from scratch. It saved me time!

Automating Classification Tasks My Way

Steps that I took

Caveat: You have to manually go through the categories to make sure that the data is right.

Written by Nwosu Rosemary

No responses yet