Aden Guo
2014/9/16
This is app making recommendation of R packages for you!
First, transform the raw data to a user-based matrix. There are total 52 user and 2488 packages in the data.
library('reshape')
installations = read.csv(file = "training_data.csv")
user_package_matrix <- cast(installations, User ~ Package, value = 'Installed')
dim(user_package_matrix)
[1] 52 2488
Second, calculate the correlation of R packages and transform them into distances.
similarities <- cor(user_package_matrix)
distances <- -log((similarities / 2) + 0.5)
package_name <- colnames(distances)
Then, the results and names of packages is saved to disk.
save(package_name,file = "package_list")
save(distances, file = "distance_matrix")
My app use this trained model to determined which packages are most near the packages you have input. The source file server.r and ui.r can be find in https://github.com/AdenGuo/DevelopingDataProducts