Simple R Packages Recommendation App

Aden Guo
2014/9/16

Introduction

This is app making recommendation of R packages for you!

  • Data is from https://www.kaggle.com/
  • Algorithm is basiclly from the book “Machine Learning for Hackers”.
  • This app calculates the most similar R packages from your input by KNN algorithm.

Data Transform

First, transform the raw data to a user-based matrix. There are total 52 user and 2488 packages in the data.

library('reshape')
installations = read.csv(file = "training_data.csv")
user_package_matrix <- cast(installations, User ~ Package, value = 'Installed')
dim(user_package_matrix)
[1]   52 2488

Calculation of Data

Second, calculate the correlation of R packages and transform them into distances.

similarities <- cor(user_package_matrix)
distances <- -log((similarities / 2) + 0.5)
package_name <- colnames(distances)

Save and Use of the data

Then, the results and names of packages is saved to disk.

save(package_name,file = "package_list")
save(distances, file = "distance_matrix")

My app use this trained model to determined which packages are most near the packages you have input. The source file server.r and ui.r can be find in https://github.com/AdenGuo/DevelopingDataProducts