COIL20 ====== COIL20 (Columbia Object Image LIbrary - 20) is a dataset involving 20 images that are black and white and each have 72 pictures of each object taken at different angles. The dataset is available at `Columbia university's website <http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php>`_. We will build a QCML model that will train on COIL20 and then classify the images. Uploading the Dataset ---------------------- First we need to get our hands on the data and upload it to the qognitive servers. We'll download COIL20 as a zip file, unzip it and construct a dataframe. We'll define each pixel in our image to be an operator along with an operator for every category (so our categorization will be between 20 operators where we will take 1 to be the image is of that category and 0 to be that it is not of that category). We'll be using some extra packages here such as ``PIL``, ``requests``, ``scikit-learn`` and ``pytorch``. You can install these with the following command: .. code-block:: bash (venv)$ pip install pillow requests torch scikit-learn Let's download the data and format it into a dataframe suitable for training and inference. .. code-block:: python import numpy as np import pandas as pd import torch import os import requests import re import zipfile from PIL import Image import tempfile from sklearn.model_selection import train_test_split test_fraction = 0.2 # checks whether data has been downloaded already with tempfile.TemporaryDirectory() as temp_dir: zip_file = os.path.join(temp_dir, 'coil20.zip') results = requests.get( 'http://www.cs.columbia.edu/CAVE/databases/SLAM_coil-20_coil-100/coil-20/coil-20-proc.zip' ) with open(zip_file, "wb") as code: code.write(results.content) # unzip image files images_zip = zipfile.ZipFile(zip_file) mylist = images_zip.namelist() filelist = list(filter(re.compile(r".*\.png$").match, mylist)) filelist = [os.path.join(temp_dir, f) for f in filelist] with zipfile.ZipFile(zip_file, 'r') as zip_ref: zip_ref.extractall(temp_dir) labels = pd.Series(filelist).str.extract("obj([0-9]+)", expand=False).values.astype('int') - 1 images = [] for file in filelist: im = Image.open(file).convert('L') images.append(np.array(im).flatten()) data = np.array(images) / 255 # We scale our grayscale values from 0->255 to 0->1 # one-hot encoding of labels targets = torch.nn.functional.one_hot( torch.tensor(labels), num_classes=20 ).numpy() # split data into train/test train_data, test_data, train_target, test_target = train_test_split(data, targets, test_size=test_fraction, stratify=labels) # Convert to DataFrame pixel_operators = [f"pixel_{x}" for x in range(128*128)] label_operators = [f"label_{i+1}" for i in range(20)] df_train = pd.DataFrame( np.concatenate([train_data, train_target], axis=1), columns=pixel_operators + label_operators, ) df_test = pd.DataFrame(test_data, columns=pixel_operators) df_target = pd.DataFrame(test_target, columns=label_operators) Let's instantiate a client object and set the dataset to COIL20. We're only going to upload the ``df_train`` dataframe as the test data is only used for evaluation. .. code-block:: python from qcog_python_client import QcogClient qcml = QcogClient.create(token=API_TOKEN) qcml.data(df_train) Parameterizing our Model ------------------------ Let's pick a Pauli model to run. .. code-block:: python qcml = qcml.pauli( operators=df_train.columns.tolist(), qbits=5, pauli_weight=2 ) Here we remember our operators have to match the dataset that we are going to run. Training the Model ------------------ Now set some training specific parameters and execute the training. .. code-block:: python from qcog_python_client.schema.parameters import AnalyticOptimizationParameters, LOBPCGFastStateParameters qcml = qcml.train( batch_size=len(df_train), num_passes=10, weight_optimization=AnalyticOptimizationParameters(), get_states_extra=LOBPCGFastStateParameters( iterations=20 ) ) qcml.wait_for_training() print(qcml.trained_model["guid"]) Here we are using our analytic solver which is avaliable for the Pauli model. As per the documentation for the analytic optimization method we set our batch size to the number of samples in our dataset so we process all data in a single batch. .. note:: The training process may take a while to complete, here we call ``wait_for_training`` which will block until training is complete. .. note:: We print out the trained model ``guid`` so we can use it in a different interpreter session if needed. Executing Inference ------------------- If you are running in the same session you can skip the next step, but if you are running in a different session you can load the model using the ``guid`` we printed out. .. code-block:: python qcml = qcml.preloaded_model(MODEL_GUID) With our trained model loaded into the client, we can now run inference on the dataset. .. code-block:: python result_df = qcml.inference( data=df_test, parameters=LOBPCGFastStateParameters( iterations=20, tol=1e-6 ) ) num_correct = ( result_df.idxmax(axis=1) == df_target.idxmax(axis=1) ).sum() print(f"Correct: {num_correct * 100 / len(df_test):.2f}% out of {len(df_test)}") Results ------- Some example results for various qubit counts and Pauli weights are shown below. The accuracy score is calculated for each case. .. list-table:: Sample Results :header-rows: 1 * - Qubits - Pauli Weight - Accuracy * - 5 - 2 - 0.867 * - 6 - 2 - 0.894 * - 6 - 3 - 0.919