Aurresan eta Gomendatzeko Adimen Artifiziala

Tutorial 4: Making Recommendations

Quickstart Guide

The Sherpa.ai Custom Content Recommendation API is meant to provide you with relevant recommendations based on the characteristics, tastes, and interactions of both users and items. The goal of this tutorial is to provide some insight into the options that the API offers.

The examples provided are based on two datasets. On the one hand, we will use Data Science for Good: DonorsChoose.org, regarding school projects and people that make donations to fund them (in particular, the files Projects.csv, Donors.csv and Donations.csv presented in Tutorial 1). On the other hand, we will use MovieLens 100K and, more specifically, the files u.item (for movies), u.user (for users), and u.data (for ratings given to movies by users). In the following examples, we will assume that the whole of the datasets has been imported into the API. Please, refer to Tutorial 1 and Tutorial 2 for step-by-step instructions.

Note that we have omitted the headers required to authenticate the API user, for the sake of clarity. Please, refer to the documentation for the corresponding instructions.

Choose a Recommender Engine

Building useful recommendations is the main objective of the API. To achieve this, it is very important to choose the recommender engine that best fits the characteristics of the dataset.

This selection can be made when creating the table (see Tutorial 1), and can also be changed at any time:

PATCH /v2/recomm/tables/projects HTTP/1.1

{
    "engine": "content_based"
}

If the update is successful, the server response should be:

HTTP/1.1 204 No Content

The examples included in this tutorial cover the use of both engines. Since the DonorsChoose.org dataset has some long text fields (such as description and needStatement), the most appropriate choice seems to be the Content-based engine. However, the several numerical and categorical fields of the MovieLens 100K dataset make it suitable for the Hybrid engine.

Note: The API is meant to work with a unique set of users, regardless of the number of item catalogs. Therefore, we recommend deleting any existing datasets, before trying a new one.

Content-based Recommender Engine

At this point, we have already created a catalog of projects from DonorsChoose.org and saved information and donations made by many users. Thus, the system is ready to recommend new projects to donors. Let us recall that we have chosen the Content-based recommender engine to build the recommendations.

There are two ways of recommending new items to users: general recommendations and filtered recommendations.

General Recommendations

Let us consider user 5f24f7ece308e11c9e31a6b9ad53cf68 again. After the batch import done in Tutorial 2, we have 74 donations made by this user to 45 different projects:

GET /v2/recomm/projects/users/5f24f7ece308e11c9e31a6b9ad53cf68/donate HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
[
    {
        "itemId": "69da11b15b82cf59c389ba81c444731e",
        "timestamp": 1523797561000,
        "value": 25.0
    },
    {
        "itemId": "1cd153715d60fa4bb3757990b1af585b",
        "timestamp": 1415211879000,
        "value": 10.0
    },
.................
...[continues]...
.................
    {
        "itemId": "34b78a29a4727f3a2d04e4a05709f97a",
        "timestamp": 1521895854000,
        "value": 25.0
    }
]

The first and most general way of recommending simply consists of returning the most suitable projects, without any further restrictions:

GET /v2/recomm/projects/users/5f24f7ece308e11c9e31a6b9ad53cf68/items?limit=5 HTTP/1.1

The response contains five projects which might be of interest to the donor, taking into account the contents of the previous interactions.

HTTP/1.1 200 OK
Content-Type: application/json
[
    "283de7735f2be62e93b248d2ce2af3ce",
    "879a7d80e9e1d87357f6fa64b1ba739e",
    "4fc733a388d45e22b59f058ec1a91d7d",
    "085ac5165bf69edc4802309e2b0c80c4",
    "8ae4fe65d01070305c4e7783b3b299cb"
]

Note: The limit parameter is optional, but preferable. If it is not used, the API can return up to 500 items. For pagination, use the afterId parameter.

For instance, note that six of the donations (a total of $190) made by the user were made to the project 879a7d80e9e1d87357f6fa64b1ba739e:

GET /v2/recomm/projects/items/879a7d80e9e1d87357f6fa64b1ba739e HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
    "name": "Modern Technology",
    "description": "A typical day in my classroom is starting with ELA, walking past the unused projector repeatedly, trying to get the computers working (crossing my fingers by this point), teaching math, science and S.S., walking past the unused projector again, and then have the children pack up to go home. My students want to learn. They come to school to see what else is out there for them to know. Many of them come from low-income families and technology is limited in their households. Unfortunately, technology is also very limited in our school. It is a Title 1 school so we have limited supplies and resources. Our computers do not consistently work. Our projectors work, but without laptops, they are useless. Half of us don't have Promethean boards and the ones that do, the boards don't work. We are very low on the technology scale. If we are fortunate enough to get this laptop, my students will be able to connect to the world. We will use it to show pictures of places that we are reading about in Social Studies. It will help our classroom engage more in math because the math website will be available to us. I will be able to show educational videos to them about our solar system, the American symbols and many other things. The students would be able to use sites such as BrainPopJr, ABCmouse and Learning A-Z. We are extremely hindered by the lack of a laptop in our classroom because without it, our projector is useless. Every one is using some form of modern technology in this world. Unfortunately, my classroom does not have that privilege. We have a projector that cannot be used and computers that sometimes work. This laptop will be able to help us use the Internet for learning in a positive and engaging way. My students deserve to be part of the modern world. ",
    "needStatement": "My students need a laptop during their lessons. We have a class projector but no laptop. This laptop will be easy for the students to carry and use.",
    "category": [
        "History & Civics",
        " Literacy & Language"
    ],
    "level": "Grades PreK-2",
    "resource": "Technology",
    "cost": 1229.4,
    "status": "Fully Funded"
}

It seems that the goal of the project is to acquire a laptop to put the class projector to good use. That way, the teacher will be able to engage and motivate students by means of modern technological tools. Therefore, the donor can also find other interesting projects like 4fc733a388d45e22b59f058ec1a91d7d:

GET /v2/recomm/projects/items/4fc733a388d45e22b59f058ec1a91d7d HTTP/1.1
{
    "name": "Empowered To Think Through The Power Of Technology",
    "description": "In our ever changing education system driven by Common Core the goal that I embrace was best stated by Margaret Mead, 'Children must be taught how to think, not what to think.' The challenge Common Core poses is to empower my students to think! The majority of my students are self-motivated, curious scholars. But I have a few that I have to push along to have the confidence to become intrinsic learners. My class is a general education class at a Title 1 school. I have empowered these 5th grade upper classmen to be the scholarly role models for our school. Some volunteer to help out the primary students during their recess and lunch, some work the recycling program, and some tutor other students in our class who need extra help. Our class works as a team, and we will triumph together and learn from one another. One of the items we need in our classroom is a laptop. An Apple MacBook Pro laptop is portable. It can be used at the back kidney table for guided reading, at a table group of students, and as a whole class. It will address the needs of my students to learn concepts through videos, images, diagrams, and how to do research and find accurate sources using Microsoft Office. These are a few examples and there are hundreds of lessons and ideas that come to mind with the power of technology at our fingertips. The other item that we need in conjunction with the laptop is a projector. I can attach the laptop to the projector and I can show videos, images, demos, power points, and persuade them to use Google Docs to share presentations. One of the standards that the students have to attain is to include multi-media components and visual displays in presentations. In preparation for middle school, it is imperative that my students can navigate and have the ability to access the web knowledgeably, with etiquette and with good intentions. My job as a teacher is to empower my students to become self-motivated learners. In a world that can be accessed from the palm of our hand through technology, one of my jobs is teach my students the etiquette of using and having this power available to us. The laptop, word program, and projector will bring magic, inspiration, amazement, strife, challenges, wonder, and a sense of 'awe' in our classroom. I want them to be inspired and motivated to think! ",
    "needStatement": "My students need need and Apple Mac book Pro Computer, Microsoft Office and class projector.",
    "category": [
        "Literacy & Language",
        " Math & Science"
    ],
    "level": "Grades 3-5",
    "resource": "Technology",
    "cost": 2904.34,
    "status": "Expired"
}

Filtered Recommendations

When making recommendations, we can also impose restrictions on the set of recommendable elements. Using RSQL, we can define conditions to be fulfilled by the items to be recommended. It is important for the attributes used here to be indexed, in order to obtain recommendations in an acceptable amount of time (refer to the documentation for further details).

Let's consider user 5f24f7ece308e11c9e31a6b9ad53cf68 again. After the batch import done in Tutorial 2, we have 74 donations made by this user to 45 different projects:

In our Create a Catalog example, we defined three indexed attributes: level, resource and status. Let us consider, for instance, projects of "Grades 3-5" level.

GET /v2/recomm/projects/users/5f24f7ece308e11c9e31a6b9ad53cf68/items?limit=5&filter=level=="Grades PreK-2" HTTP/1.1

The response obviously contains the project shown above; if it is interesting without restrictions, it is even more so if it satisfies them. But the response includes other recommendations:

HTTP/1.1 200 OK
Content-Type: application/json
[
    "4fc733a388d45e22b59f058ec1a91d7d",
    "3818a2b2a4603269a1dd48740050de7a",
    "544533ba75f043705223fe34864798e5",
    "1e074ee869079dcd6d0f71aec4bc8437",
    "671a1382c5071cc9261773ea390d7188"
]

For instance, the user made three donations (a total of $50) to the project 40b0395c9fa8c0caa3f7378bd58b9cf2:

GET /v2/recomm/projects/items/40b0395c9fa8c0caa3f7378bd58b9cf2 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
    "name": "Science & Social Studies",
    "description": "The joy of non-fiction reading is real and I want my students to experience that. They have enjoyed learning about habitats around the world. I want them to discover science and social studies in new and exciting ways. My students are the best little learners in the world. They have a great love for school and they are always asking questions. It is gratifying to see how they are able to relate experiences they learn about in books to their real life experiences. I want them to continue their exploration of books, especially non-fiction ones. My students will use these books to enhance their knowledge about the different groups of animals. They will learn about mammals, birds, etc. The books will be used during read-a-louds and centers. The holiday and festival books will be used to show my students how we celebrate various festivities and special days in our country. These books will enable the students to get a better understanding of the different cultures and customs that have contributed to our American society. These donations will help improve our classroom because my students love to read. They are sponges that constantly soak up every bit of knowledge that they can. I want them to be able to read non-fiction and understand the information that is found in non-fiction books.",
    "needStatement": "My students need non-fiction books. We chose books that will help them learn the animal groups and we chose books about the holidays and festivals.",
    "category": [
        "Literacy & Language"
    ],
    "level": "Grades PreK-2",
    "resource": "Books",
    "cost": 202.21,
    "status": "Fully Funded"
}

The funds are to cover the costs of new non-fiction books. Thus, project 3818a2b2a4603269a1dd48740050de7a could also be of interest to them:

GET /v2/recomm/projects/items/3818a2b2a4603269a1dd48740050de7a HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
    "name": "Read Any Good Non-Fiction Books Lately?",
    "description": " When you walk into my classroom, visualize students who are thoughtful and reflective learners browsing books of their choice in the classroom library . They love to sit down and read a good book, whether it is on a tablet accessing MYon, or flipping the pages at their seat. I see them as our leaders of the future with a thirst for reading good literature. Our school, CE, is a Leader in Me Model school; and part of the Detroit Public Community School District. Our school's focus is preparing students to be the leaders of tomorrow by inspiring all to be student leaders using the Leader in Me Model. My 3rd grade students love to read. The books will assist in developing and enhancing their reading and comprehension skills. As a Leader in Me Academy, my students will apply their leadership skills to accomplish their personal reading goals. Also as honor academy students, having access to books they love will help them continue their growth in literacy which is a key factor in being selected for the class. Children also need to have multiple opportunities and a variety of literature to read. An area in reading comprehension my students need to improve their reading skills in is informational text. They really love reading animal books. but our library lacks enough informational books that the class likes, and having a variety of resources that their interested in will improve their skills and increase the amount of informational text reading. Can you visualize kids that like to read about animals? I see it every school day. Additionally, most students have access to a tablet whereby they can easily access an e-book. Not all of my students have this type of technology at home. Increasing the number of informational books in the class library, allows them to have access to this type of independent reading books in and outside of the classroom. ",
    "needStatement": "My students need non-fiction books about animals to read for information.",
    "category": [
        "Literacy & Language"
    ],
    "level": "Grades 3-5",
    "resource": "Books",
    "cost": 586.72,
    "status": "Expired"
}

Hybrid Recommender Engine

For the examples below, we will assume that the catalog of movies from MovieLens 100K is already created and ratings sent by users are already saved. Thus, the system is ready to recommend new movies. Let us recall that we have chosen the Hybrid Recommender Engine, to build the recommendations.

Let us consider user 147, a 40-year-old, female librarian that lives within zip code 02143:

GET /v2/recomm/users/147 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
    "age": 40,
    "job": "librarian",
    "gender": "F",
    "zipCode": "02143"
}

This user has rated 20 movies:

GET /v2/recomm/movies/users/147/rating HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
[
    {
        "itemId": "690",
        "timestamp": 885593965000,
        "value": 4.0
    },
    {
        "itemId": "750",
        "timestamp": 885593812000,
        "value": 5.0
    },
.................
...[continues]...
.................
    {
        "itemId": "937",
        "timestamp": 885593997000,
        "value": 3.0
    }
]

There are two ways of recommending new items to users: general recommendations and filtered recommendations. Both of them include a percent match score, a number between 0 and 100 that indicates the degree of affinity between the recommended entity and the recipient.

General Recommendations

The first and most general way of recommending simply consists of returning the most suitable movies, without any further restrictions:

GET /v2/recomm/movies/users/147/items?limit=5 HTTP/1.1

The response contains five movies which might be of interest to the user, taking into account the previous interactions.

HTTP/1.1 200 OK
[
    {
        "itemId": "64",
        "match": 92.94
    },
    {
        "itemId": "603",
        "match": 92.85
    },
    {
        "itemId": "483",
        "match": 92.51
    },
    {
        "itemId": "513",
        "match": 92.37
    },
    {
        "itemId": "98",
        "match": 92.34
    }
]

Note: The limit parameter is optional, but preferable. If it is not used, the API can return up to 500 items. For pagination, use the afterId parameter.

For instance, note that the genre of 14 of the rated movies is drama, with an average rating of 4.286. Therefore, it is reasonable that the engine would take this fact into account to build a suitable recommendation, and it does: three out of five movies are dramas. For example, take a look at movie 64:

GET /v2/recomm/movies/items/64 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
    "year": 1994,
    "title": "Shawshank Redemption, The (1994)",
    "link": "http://us.imdb.com/M/title-exact?Shawshank%20Redemption,%20The%20(1994)",
    "genre_unknown": false,
    "genre_action": false,
    "genre_adventure": false,
    "genre_animation": false,
    "genre_children": false,
    "genre_comedy": false,
    "genre_crime": false,
    "genre_documentary": false,
    "genre_drama": true,
    "genre_fantasy": false,
    "genre_filmnoir": false,
    "genre_horror": false,
    "genre_musical": false,
    "genre_mystery": false,
    "genre_romance": false,
    "genre_scifi": false,
    "genre_thriller": false,
    "genre_war": false,
    "genre_western": false
}

The movie "The Shawshank Redemption" is a drama that has been rated by 283 users:

GET /v2/recomm/movies/items/64/rating HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
[
    {
        "userId": "1",
        "timestamp": 875072404000,
        "value": 5.0
    },
    {
        "userId": "6",
        "timestamp": 883600597000,
        "value": 4.0
    },
.................
...[continues]...
.................
    {
        "userId": "498",
        "timestamp": 881956575000,
        "value": 4.0
    }
]

The average rating of those interactions is 4.45, out of 5, which is quite high. So it really seems to be a movie that fits the taste of user 147.

Filtered Recommendations

When making recommendations, we can also impose restrictions on the set of recommendable elements. Using RSQL, we can define conditions to be fulfilled by the items to be recommended. It is important for the attributes used here to be indexed, in order to obtain recommendations in an acceptable amount of time (refer to the documentation for further details).

In our MovieLens 100K example, we have defined the year attribute to be indexed. So, let's imagine that user 147 would like to watch a movie, but an old one (from before 1970, for instance). These are the five movies recommended by the API:

GET /v2/recomm/movies/users/147/items?limit=5&filter=year=le=1970 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
[
    {
        "itemId": 603",
        "match": 92.85
    },
    {
        "itemId": "483",
        "match": 92.51
    },
    {
        "itemId": "513",
        "match": 92.37
    },
    {
        "itemId": "479",
        "match": 92.13
    },
    {
        "itemId": "427",
        "match": 92.05
    }
]

Note that if it is interesting without restrictions, it is even more so if it satisfies them. This is the case for movies 603, 483, and 513. But the response includes other recommendations, such as the last movie, 427:

GET /v2/recomm/movies/items/427 HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
    "year": 1962,
    "title": "To Kill a Mockingbird (1962)",
    "link": "http://us.imdb.com/M/title-exact?To%20Kill%20a%20Mockingbird%20(1962)",
    "genre_unknown": false,
    "genre_action": false,
    "genre_adventure": false,
    "genre_animation": false,
    "genre_children": false,
    "genre_comedy": false,
    "genre_crime": false,
    "genre_documentary": false,
    "genre_drama": true,
    "genre_fantasy": false,
    "genre_filmnoir": false,
    "genre_horror": false,
    "genre_musical": false,
    "genre_mystery": false,
    "genre_romance": false,
    "genre_scifi": false,
    "genre_thriller": false,
    "genre_war": false,
    "genre_western": false
}

The movie "To Kill a Mockingbird (1962)" has 219 ratings by users, with an average rating of 4.292:

GET /v2/recomm/movies/items/64/rating HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
[
    {
        "userId": "503",
        "timestamp": 880472216000,
        "value": 5.0
    },
    {
        "userId": "5",
        "timestamp": 875721167000,
        "value": 3.0
    },
.................
...[continues]...
.................
    {
        "userId": "499",
        "timestamp": 885599474000,
        "value": 5.0
    }
]

Of course, the movie was filmed before 1970, so it also fulfills the filtering requirement that we imposed at the beginning.