[TRANSLATION ERROR] There are no messages for this key

Tutorial 2: Loading Massive Data

Quickstart Guide

In this tutorial, we provide examples about loading files including multiple users, items, or interactions, the three cases being quite similar. As stated in Tutorial 1, both users and items can be introduced, one by one, into the recommender system. However, it can be helpful to load multiple entities by using a single command. For instance, that would be the case when migrating an existing database.

In these examples, we assume that the structure has already been created (see Tutorial 1 for step-by-step instructions). We have sampled three of the files from the dataset Data Science for Good: DonorsChoose.org, in order to simplify the examples:

Note that we have omitted the headers required to authenticate the API user, for the sake of clarity. Please, refer to the documentation for the corresponding instructions.

File Formats

Currently, the Sherpa.ai Custom Content Recommendation API is capable of interpreting two file formats: JSON and CSV. You can choose the one that best fits your needs, as long as the contents are correctly structured:

Type of dataMandatory fieldsOptional fieldsObservations
ItemsitemIdAny item attributeItem attributes have to be previously defined. Items have to be new; updates are not allowed.
UsersuserIdAny user attributeUser attributes have to be previously defined. Users have to be new; updates are not allowed.
InteractionsitemId, userId, interactionId, timestampvalueThe interaction types have to be previously defined. User-item interactions have to be new; updates are not allowed.

Register an Upload Order

First, a batch upload order is required for each of the files. The files Projects.csv, Donors.csv and Donations.csv, which are already uploaded on https://recommender-tutorial-data-set.s3-eu-west-1.amazonaws.com/, contain the data we want to upload.

POST /v2/recomm/projects/items/batch HTTP/1.1

{
    "url": "https://recommender-tutorial-data-set.s3-eu-west-1.amazonaws.com/Projects.csv",
    "format": "csv"
}
POST /v2/recomm/users/batch HTTP/1.1

{
    "url": "https://recommender-tutorial-data-set.s3-eu-west-1.amazonaws.com/Donors.csv",
    "format": "csv"
}
POST /v2/recomm/projects/interactions/batch HTTP/1.1

{
    "url": "https://recommender-tutorial-data-set.s3-eu-west-1.amazonaws.com/Donations.csv",
    "format": "csv"
}

The JSONs returned in the response to these commands include two fields: requestId and status. The former is the needed to check the order status, which is indicated by the latter:

HTTP/1.1 202 Accepted
Content-Type: application/json
{
    "requestId": "d9edf72a-3b0c-4ee1-a38a-eb8ecdbcc05e",
    "status": "queued"
}
HTTP/1.1 202 Accepted
Content-Type: application/json
{
    "requestId": "6ee4e1dd-f988-492f-8224-4575d95c7fb9",
    "status": "queued"
}
HTTP/1.1 202 Accepted
Content-Type: application/json
{
    "requestId": "a2ff670a-bfa9-4eb0-b9e7-420fd0e02bf9",
    "status": "queued"
}

Check Upload Status

The work orders are processed on-demandi, but if those commands are run immediately after posting the orders, the response will be the same as above:

GET /v2/recomm/projects/items/batch/d9edf72a-3b0c-4ee1-a38a-eb8ecdbcc05e HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
    "requestId": "d9edf72a-3b0c-4ee1-a38a-eb8ecdbcc05e",
    "status": "queued"
}

When the system starts to upload the files, the status will change to processing:

GET /v2/recomm/users/batch/6ee4e1dd-f988-492f-8224-4575d95c7fb9 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
    "requestId": "6ee4e1dd-f988-492f-8224-4575d95c7fb9",
    "status": "processing"
}

Depending on the file sizes, the process can last anywhere from minutes to hours.

Verify Completed Uploads

After the uploading operation finishes, if no fatal error has occurred, there are two possible statuses: completed and completed with errors. The former asserts that all the elements in the files were uploaded correctly, whereas the latter is returned when some of them could not be saved correctly. This last one will be the case for three of the files:

  • The donations file includes the one saved in Add a New Item, so that duplicate is detected.
GET /v2/recomm/projects/items/batch/d9edf72a-3b0c-4ee1-a38a-eb8ecdbcc05e HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
    "requestId": "d9edf72a-3b0c-4ee1-a38a-eb8ecdbcc05e",
    "status": "completed_with_errors",
    "entitiesNotAdded": [
        {
            "errorId": "69da11b15b82cf59c389ba81c444731e",
            "code": "EntityAlreadyExistsException",
            "message": "Item [69da11b15b82cf59c389ba81c444731e] could not be created because it already exists."
        }
    ]
}
  • Additionally, the donor from Add a New User is also detected as a duplicate.
GET /v2/recomm/users/batch/6ee4e1dd-f988-492f-8224-4575d95c7fb9 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
    "requestId": "6ee4e1dd-f988-492f-8224-4575d95c7fb9",
    "status": "completed_with_errors",
    "entitiesNotAdded": [
        {
            "errorId": "5f24f7ece308e11c9e31a6b9ad53cf68",
            "code": "EntityAlreadyExistsException",
            "message": "User [5f24f7ece308e11c9e31a6b9ad53cf68] could not be created because it already exists."
        }
    ]
}
  • The interactions file contains more errors. There are many duplicates (including the donation that had already been saved in Register a New Interaction), due to the fact that there can only be one interaction per timestamp (with millisecond precision):
GET /v2/recomm/projects/interactions/batch/d9edf72a-3b0c-4ee1-a38a-eb8ecdbcc05e HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
    "requestId": "a2ff670a-bfa9-4eb0-b9e7-420fd0e02bf9",
    "status": "completed_with_errors",
    "entitiesNotAdded": [
        {
            "id": "146f776eee7976f925afd40acad560e2_ee2bebfe4f2d2f080614026b18b63d5a_donate_1503506976000",
            "errorId": "EntityAlreadyExistsException",
            "message": "Interaction [donate on Item [ee2bebfe4f2d2f080614026b18b63d5a] of User [146f776eee7976f925afd40acad560e2] could not be created because it already exists."
        },
        {
            "id": "1b24af21c9b9968d393bd943bf92eb32_012dd04d073c46ccff32c793e09cbdac_donate_1515358963000",
            "errorId": "EntityAlreadyExistsException",
            "message": "Interaction [donate on Item [012dd04d073c46ccff32c793e09cbdac] of User [1b24af21c9b9968d393bd943bf92eb32] could not be created because it already exists."
        },
.................
...[continues]...
.................
        {
            "id": "f071b0ae20413ac9c8c28ca529899ac6_720a8c9c3b2d90921757debc2402e0f1_donate_1508052224000",
            "errorId": "EntityAlreadyExistsException",
            "message": "Interaction [donate on Item [720a8c9c3b2d90921757debc2402e0f1] of User [f071b0ae20413ac9c8c28ca529899ac6] could not be created because it already exists."
        }
    ]
}