rec-sys : : HTML | Rmd - R Markdown | md - Markdown
We will explore building a recommender system.
First step is to load in some sample data. The data was collected by survey from cs1000 students in Fall 2013 and is available here
dat1 <- read.csv("~/Desktop/TechHacks-12-6-13/data/cs1000_f13_data.csv")
Note, you will need to set the data to load from your working directory or where it is located on your computer.
We will make use of some available R functions.
If you haven't already installed the Recommender Lab package, do so:
install.packages("recommenderlab", dependencies = TRUE)
Then load the package:
First, set up the data in the needed format for the recommender library functions.
m1 <- as.matrix(dat1)
r1 <- as(m1, "realRatingMatrix")
In order, to get a view on the distribution of ratings we can plot the ratings as heat maps.
rn1 <- normalize(r1)
image(r1, main = "Ratings")
image(rn1, main = "Normalized Ratings")
Next, we can look at the mean rating for each student and plot this as a histogram.
rmn <- rowMeans(r1)
rmn
## [1] 3.300 2.800 3.000 2.667 3.182 2.700 3.857 2.000 5.000 3.000 3.333
## [12] 2.556 4.000 3.333 3.000 1.667 3.556 2.700 3.000 3.231 2.750 2.900
## [23] 2.333 2.429 3.000 2.286 5.000 3.000 3.333 2.800 3.000 1.600 3.375
## [34] 5.000 2.417 4.000 2.214 3.500 3.467 4.500 3.200 2.818 3.273 3.600
## [45] 1.667 3.462 2.600 2.786 2.636 3.800 3.750 3.600 1.615 3.600 2.769
## [56] 3.500 3.182 1.462 2.667 3.500 2.231 2.444 3.000 1.875 2.583 1.182
## [67] 2.800 3.000 4.667 3.714 2.667 3.091 3.000 3.125 3.455 4.500 3.000
## [78] 2.667 2.500 3.667 3.429 2.000 4.333 2.889 1.778
hist(rmn, breaks = 10, main = "Histogram of Mean User Ratings", xlab = "Ratings")
We can also look at the mean rating for each show.
cmn <- colMeans(r1)
cmn
## American.Idol Big.Bang.Theory
## 2.058 3.556
## Breaking.Bad Game.of.Thrones
## 4.064 3.732
## Grey.s.Anatomy Homeland
## 1.878 2.316
## How.I.Met.Your.Mother Justified
## 3.491 2.545
## Mad.Men Modern.Family
## 2.750 2.659
## NCIS Parks.and.Recreation
## 3.551 2.792
## Real.Housewives.of.New.Jersey The.Voice
## 1.300 2.062
## The.Walking.Dead
## 3.750
hist(cmn, breaks = 10, main = "Histogram of Mean TV Rating", xlab = "Ratings")
In fact, we can see by sorting via the mean TV rating, that the highest rated shows are:
nd <- order(cmn)
colnames(dat1)[nd]
## [1] "Real.Housewives.of.New.Jersey" "Grey.s.Anatomy"
## [3] "American.Idol" "The.Voice"
## [5] "Homeland" "Justified"
## [7] "Modern.Family" "Mad.Men"
## [9] "Parks.and.Recreation" "How.I.Met.Your.Mother"
## [11] "NCIS" "Big.Bang.Theory"
## [13] "Game.of.Thrones" "The.Walking.Dead"
## [15] "Breaking.Bad"
Following examples of recommenderlab documentation.
ubr <- Recommender(r1, method = "UBCF")
pred <- predict(ubr, r1, type = "ratings")
# as(pred, 'matrix')
The supplied ratings for User 3 are:
dat1[3, ]
## American.Idol Big.Bang.Theory Breaking.Bad Game.of.Thrones
## 3 3 1 5 NA
## Grey.s.Anatomy Homeland How.I.Met.Your.Mother Justified Mad.Men
## 3 NA NA 5 NA NA
## Modern.Family NCIS Parks.and.Recreation Real.Housewives.of.New.Jersey
## 3 NA NA NA NA
## The.Voice The.Walking.Dead
## 3 2 2
The predicted ratings for the missing shows are:
colnames(dat1)[is.na(dat1[3, ])]
## [1] "Game.of.Thrones" "Grey.s.Anatomy"
## [3] "Homeland" "Justified"
## [5] "Mad.Men" "Modern.Family"
## [7] "NCIS" "Parks.and.Recreation"
## [9] "Real.Housewives.of.New.Jersey"
getRatings(pred[3, ])
## [1] 3.414 2.627 2.872 3.034 3.021 2.932 3.099 3.152 2.657
Present the top 2 predicted ratings for each user (output surpressed for space)
pred <- predict(ubr, r1, n = 2)
recs <- bestN(pred, n = 2)
as(recs, "list")
Present the top 3 predicted ratins for user 3
pred <- predict(ubr, r1[3], n = 3)
as(pred, "list")
## [[1]]
## [1] "Game.of.Thrones" "Parks.and.Recreation" "NCIS"