Introduction to Datasets

A dataset is what we call a collection of data that is used for training a model.

A model is defined by both the data that it was trained on (the dataset) as well as the parameterization of the model (specific values that define the behavior of the model) as well as some hyperparameters for how to train (such as batch sizes and number of passes that the model will do over the dataset).

We’ll need to upload some data in order to train a model on it.

Binary XY = Z

As a simple example, we create a small synthetic dataset which we call XY=Z. Super simple, it is a dataset that has two input features X and Y and one output feature Z. The output feature Z is just X times Y.

For simplicity, we confine ourselves to the values of 1 and -1 so we can be even smaller. Our complete dataset then looks like this:

import numpy as np
import pandas as pd

xs = np.array([1, 1, -1 , -1])
ys = np.array([1, -1, -1, 1])
zs = xs * ys
dataset = pd.DataFrame(
    data=(np.vstack([xs, ys, zs])).T,
    index=range(4),
    columns=["X", "Y", "Z"]
)
print(dataset)
# X  Y  Z
# 0  1  1  1
# 1  1 -1 -1
# 2 -1 -1  1
# 3 -1  1 -1

We want to now send this to Qognitive to store so we can train on this later

Uploading to the API

First we’ll instantiate our client object.

from qcog_python_client import QcogClient
qcml = QcogClient.create(token=API_TOKEN)

Now we’ll upload our data to the API.

dataset = qcml.data(dataset)

That’s in, now we just need to get our parameters and we can train a model on this dataset.

Using Async

We can do the same operations but using our Async client for those whose infrastructure is built for async.

from qcog_python_client import AsyncQcogClient
qcml = await AsyncQcogClient.create(token=API_TOKEN)
await qcml.data(dataset)

Asyncronous programming is very powerful when your applications become io bound, either by the network (APIs, Databases, etc) or through file operations. This allows you to keep the CPU busy while waiting for the IO to complete. For more detail about using asyncronous programming in python the python asyncio docs are a good place to start.