Wisconsin Breast Cancer
The Wisconsin Breast Cancer dataset is a standard training dataset that is used to classify if a breast cancer tumor is benign or malignant. The dataset contains 569 samples with 30 features each. The features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. You can read some more about the dataset here.
Uploading the Dataset
Let’s pull the dataset from scikit-learn and upload it to the Qcog platform. We’ll split the dataset into training and testing sets, and scale the data using a standard scaler.
First let’s make sure we install some extra dependencies
(venv)$ pip install scikit-learn torch
import numpy as np
import pandas as pd
from sklearn import datasets as sk_datasets
from sklearn.preprocessing import StandardScaler
import torch
test_fraction = 0.2
data = sk_datasets.load_breast_cancer()
n_data = data.data.shape[0]
train_size = int(n_data * (1 - test_fraction))
test_size = n_data - train_size
# Randomly sample data
train_idx = np.random.choice(n_data, train_size, replace=False)
test_idx = np.random.choice(
np.setdiff1d(np.arange(n_data), train_idx, assume_unique=True),
test_size,
replace=False,
)
targets = torch.nn.functional.one_hot(
torch.tensor(data.target), num_classes=2
).numpy()
train_data = data.data[train_idx]
train_target = targets[train_idx]
test_data = data.data[test_idx]
test_target = targets[test_idx]
# Scale data
scaler = StandardScaler()
train_data = scaler.fit_transform(train_data)
test_data = scaler.transform(test_data)
# Convert to DataFrame
df_train = pd.DataFrame(
np.concatenate([train_data, train_target], axis=1),
columns=data.feature_names.tolist() + data.target_names.tolist(),
)
df_test = pd.DataFrame(test_data, columns=data.feature_names)
df_target = pd.DataFrame(test_target, columns=data.target_names.tolist())
Let’s instantiate a client object and set the dataset to the train dataframe we built. We’re only going to upload the df_train
dataframe as the test data is only used for evaluation.
from qcog_python_client import QcogClient
qcml = QcogClient.create(token=API_TOKEN)
qcml.data(df_train)
Parameterizing our Model
Let’s pick an Ensemble model to run.
qcml = qcml.ensemble(
operators=df_train.columns.tolist(),
dim=64,
num_axes=64
)
Here we remember our operators have to match the dataset that we are going to run.
Training the Model
Now set some training specific parameters and execute the training.
from qcog_python_client.schema.parameters import GradOptimizationParameters, LOBPCGFastStateParameters
qcml = qcml.train(
batch_size=64,
num_passes=10,
weight_optimization=GradOptimizationParameters(
iterations=5,
learning_rate=1e-3,
),
get_states_extra=LOBPCGFastStateParameters(
iterations=10,
learning_rate_axes=1e-3
)
)
qcml.wait_for_training()
print(qcml.trained_model["guid"])
Note
The training process may take a while to complete, here we call wait_for_training
which will block until training is complete. It should take about 4 minutes to train the model from a cold start.
Note
We print out the trained model guid
so we can use it in a different interpreter session if needed.
Executing Inference
If you are running in the same session you can skip the next step, but if you are running in a different session you can load the model using the guid
we printed out.
qcml = qcml.preloaded_model(MODEL_GUID)
With our trained model loaded into the client, we can now run inference on the dataset.
lobpcg_fast_state_params = LOBPCGFastStateParameters(
iterations=10,
learning_rate_axes=1e-3
)
result_df = qcml.inference(
data=df_test,
parameters={
"state_parameters": lobpcg_fast_state_params
}
)
num_correct = (
result_df.idxmax(axis=1) == df_target.idxmax(axis=1)
).sum()
print(f"Correct: {num_correct * 100 / len(df_test):.2f}% out of {len(df_test)}")
Results
Some example results for various dimensionalities and axes numbers are shown below.
Dimensionality |
Num of Axes |
Accuracy |
---|---|---|
64 |
64 |
87.72 % |
64 |
256 |
88.60 % |
256 |
512 |
88.60 % |