COIL20

COIL20 (Columbia Object Image LIbrary - 20) is a dataset involving 20 images that are black and white and each have 72 pictures of each object taken at different angles. The dataset is available at Columbia university’s website.

We will build a QCML model that will train on COIL20 and then classify the images.

Uploading the Dataset

First we need to get our hands on the data and upload it to the qognitive servers. We’ll download COIL20 as a zip file, unzip it and construct a dataframe. We’ll define each pixel in our image to be an operator along with an operator for every category (so our categorization will be between 20 operators where we will take 1 to be the image is of that category and 0 to be that it is not of that category).

We’ll be using some extra packages here such as PIL, requests, scikit-learn and pytorch. You can install these with the following command:

(venv)$ pip install pillow requests torch scikit-learn

Let’s download the data and format it into a dataframe suitable for training and inference.

import numpy as np
import pandas as pd
import torch
import os
import requests
import re
import zipfile
from PIL import Image
import tempfile

from sklearn.model_selection import train_test_split

test_fraction = 0.2

# checks whether data has been downloaded already

with tempfile.TemporaryDirectory() as temp_dir:
    zip_file = os.path.join(temp_dir, 'coil20.zip')

    results = requests.get(
        'http://www.cs.columbia.edu/CAVE/databases/SLAM_coil-20_coil-100/coil-20/coil-20-proc.zip'
    )
    with open(zip_file, "wb") as code:
        code.write(results.content)

    # unzip image files
    images_zip = zipfile.ZipFile(zip_file)
    mylist = images_zip.namelist()
    filelist = list(filter(re.compile(r".*\.png$").match, mylist))
    filelist = [os.path.join(temp_dir, f) for f in filelist]

    with zipfile.ZipFile(zip_file, 'r') as zip_ref:
        zip_ref.extractall(temp_dir)

    labels = pd.Series(filelist).str.extract("obj([0-9]+)", expand=False).values.astype('int') - 1

    images = []
    for file in filelist:
        im = Image.open(file).convert('L')
        images.append(np.array(im).flatten())

    data = np.array(images) / 255  # We scale our grayscale values from 0->255 to 0->1

# one-hot encoding of labels

targets = torch.nn.functional.one_hot(
    torch.tensor(labels), num_classes=20
).numpy()

# split data into train/test

train_data, test_data, train_target, test_target = train_test_split(data,
                                                                    targets,
                                                                    test_size=test_fraction,
                                                                    stratify=labels)

# Convert to DataFrame
pixel_operators = [f"pixel_{x}" for x in range(128*128)]
label_operators = [f"label_{i+1}" for i in range(20)]

df_train = pd.DataFrame(
    np.concatenate([train_data, train_target], axis=1),
    columns=pixel_operators + label_operators,
)
df_test = pd.DataFrame(test_data, columns=pixel_operators)
df_target = pd.DataFrame(test_target, columns=label_operators)

Let’s instantiate a client object and set the dataset to COIL20. We’re only going to upload the df_train dataframe as the test data is only used for evaluation.

from qcog_python_client import QcogClient
qcml = QcogClient.create(token=API_TOKEN)
qcml.data(df_train)

Parameterizing our Model

Let’s pick a Pauli model to run.

qcml = qcml.pauli(
    operators=df_train.columns.tolist(),
    qbits=5,
    pauli_weight=2
)

Here we remember our operators have to match the dataset that we are going to run.

Training the Model

Now set some training specific parameters and execute the training.

from qcog_python_client.schema.parameters import AnalyticOptimizationParameters, LOBPCGFastStateParameters

qcml = qcml.train(
    batch_size=len(df_train),
    num_passes=10,
    weight_optimization=AnalyticOptimizationParameters(),
    get_states_extra=LOBPCGFastStateParameters(
        iterations=20
    )
)
qcml.wait_for_training()
print(qcml.trained_model["guid"])

Here we are using our analytic solver which is avaliable for the Pauli model. As per the documentation for the analytic optimization method we set our batch size to the number of samples in our dataset so we process all data in a single batch.

Note

The training process may take a while to complete, here we call wait_for_training which will block until training is complete.

Note

We print out the trained model guid so we can use it in a different interpreter session if needed.

Executing Inference

If you are running in the same session you can skip the next step, but if you are running in a different session you can load the model using the guid we printed out.

qcml = qcml.preloaded_model(MODEL_GUID)

With our trained model loaded into the client, we can now run inference on the dataset.

result_df = qcml.inference(
    data=df_test,
    parameters=LOBPCGFastStateParameters(
        iterations=20,
        tol=1e-6
    )
)
num_correct = (
    result_df.idxmax(axis=1) == df_target.idxmax(axis=1)
).sum()
print(f"Correct: {num_correct * 100 / len(df_test):.2f}% out of {len(df_test)}")

Results

Some example results for various qubit counts and Pauli weights are shown below. The accuracy score is calculated for each case.

Sample Results
Qubits	Pauli Weight	Accuracy
5	2	0.867
6	2	0.894
6	3	0.919