Wisconsin Breast Cancer

The Wisconsin Breast Cancer dataset is a standard training dataset that is used to classify if a breast cancer tumor is benign or malignant. The dataset contains 569 samples with 30 features each. The features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. You can read some more about the dataset here.

Uploading the Dataset

Let’s pull the dataset from scikit-learn and upload it to the Qcog platform. We’ll split the dataset into training and testing sets, and scale the data using a standard scaler.

First let’s make sure we install some extra dependencies

(venv)$ pip install scikit-learn torch

import numpy as np

import pandas as pd

from sklearn import datasets as sk_datasets
from sklearn.preprocessing import StandardScaler

import torch

test_fraction = 0.2

data = sk_datasets.load_breast_cancer()
n_data = data.data.shape[0]
train_size = int(n_data * (1 - test_fraction))
test_size = n_data - train_size

# Randomly sample data
train_idx = np.random.choice(n_data, train_size, replace=False)
test_idx = np.random.choice(
    np.setdiff1d(np.arange(n_data), train_idx, assume_unique=True),
    test_size,
    replace=False,
)

targets = torch.nn.functional.one_hot(
    torch.tensor(data.target), num_classes=2
).numpy()

train_data = data.data[train_idx]
train_target = targets[train_idx]
test_data = data.data[test_idx]
test_target = targets[test_idx]

# Scale data
scaler = StandardScaler()
train_data = scaler.fit_transform(train_data)
test_data = scaler.transform(test_data)

# Convert to DataFrame
df_train = pd.DataFrame(
    np.concatenate([train_data, train_target], axis=1),
    columns=data.feature_names.tolist() + data.target_names.tolist(),
)

df_test = pd.DataFrame(test_data, columns=data.feature_names)
df_target = pd.DataFrame(test_target, columns=data.target_names.tolist())

Let’s instantiate a client object and set the dataset to the train dataframe we built. We’re only going to upload the df_train dataframe as the test data is only used for evaluation.

from qcog_python_client import QcogClient
qcml = QcogClient.create(token=API_TOKEN)
qcml.data(df_train)

Parameterizing our Model

Let’s pick an Ensemble model to run.

qcml = qcml.ensemble(
    operators=df_train.columns.tolist(),
    dim=64,
    num_axes=64
)

Here we remember our operators have to match the dataset that we are going to run.

Training the Model

Now set some training specific parameters and execute the training.

from qcog_python_client.schema.parameters import GradOptimizationParameters, LOBPCGFastStateParameters

qcml = qcml.train(
    batch_size=64,
    num_passes=10,
    weight_optimization=GradOptimizationParameters(
        iterations=5,
        learning_rate=1e-3,
    ),
    get_states_extra=LOBPCGFastStateParameters(
        iterations=10,
        learning_rate_axes=1e-3
    )
)


qcml.wait_for_training()
print(qcml.trained_model["guid"])

Note

The training process may take a while to complete, here we call wait_for_training which will block until training is complete. It should take about 4 minutes to train the model from a cold start.

Note

We print out the trained model guid so we can use it in a different interpreter session if needed.

Executing Inference

If you are running in the same session you can skip the next step, but if you are running in a different session you can load the model using the guid we printed out.

qcml = qcml.preloaded_model(MODEL_GUID)

With our trained model loaded into the client, we can now run inference on the dataset.

lobpcg_fast_state_params = LOBPCGFastStateParameters(
    iterations=10,
    learning_rate_axes=1e-3
)

result_df = qcml.inference(
    data=df_test,
    parameters={
        "state_parameters": lobpcg_fast_state_params
    }
)

num_correct = (
    result_df.idxmax(axis=1) == df_target.idxmax(axis=1)
).sum()

print(f"Correct: {num_correct * 100 / len(df_test):.2f}% out of {len(df_test)}")

Results

Some example results for various dimensionalities and axes numbers are shown below.

Sample Results
Dimensionality	Num of Axes	Accuracy
64	64	87.72 %
64	256	88.60 %
256	512	88.60 %