DATA 612 Final Project | Donor matching recommender systems
DATA 612 Final Project | Donor matching recommender systems
- Introduction
- Objective
- Data Description
- Preparing the Non-Text features
- Cosine Similarity between two donors
- Recommendations using Text Features
- Prune the vocabulary to eliminate the unnecessary words.
- Create the Document Term Matrix(DTM)
- TF-IDF
- Cosine similiarity between project 1 and Project 1 using Text Features only
- Cosine similiarity between project 1 and Project 2
- Cosine similiarity between project 1 and Project 50
- Cosine similiarity between project 1 and Project 100
- Cosine similiarity of projects of the same donor
- Combining the Text and Non Text Features
- Recommendations using Non Text Features only
- Cosine similiarity between project 1 and Project 2
- Cosine similiarity between project 1 and Project 50
- Cosine similiarity between project 1 and Project 100
- Cosine Similiarity of projects of same donor Sample 1
- Cosine Similiarity of projects of same donor Sample 2
- Cosine Similiarity of projects of same donor Sample 3
- Top 10 recommended Projects
- Donor 1 Projects
- Recommendations using Text and Non Text Features
- Cosine similiarity between project 1 and Project 2
- Cosine similiarity between project 1 and Project 50
- Cosine similiarity between project 1 and Project 100
- Cosine Similiarity of projects of same donor Sample 1
- Cosine Similiarity of projects of same donor Sample 2
- Cosine Similiarity of projects of same donor Sample 3
- Top 10 recommended Projects
- Donor 1 Projects
- Conclusion
- Projects with more than 1 Donor
- Donor ID Project ID matrix
- Restructuring the matrix
- Using SVD
- Recommendations for Donor 3
- Summary
- References
Introduction
In 2000, Charles Best, a teacher at a Bronx public high school, wanted his students to read Little House on the Prairie. As he was making photocopies of the one book he could procure, Charles thought about all the money he and his colleagues were spending on books, art supplies, and other materials. And he figured there were people out there who’d want to help — if they could see where their money was going. Charles sketched out a website where teachers could post classroom project requests, and donors could choose the ones they wanted to support. His colleagues posted the first 11 requests. Then it spread. Today, they’re open to every public school in America.
Objective
The objective of Donors recommender system is to recommend relevant items for users, based on their preference. Preference and relevance are subjective, and they are generally inferred by items users have consumed previously.
DonorsChoose.org has funded over 1.1 million classroom requests through the support of 3 million donors, the majority of whom were making their first-ever donation to a public school. If DonorsChoose.org can motivate even a fraction of those donors to make another donation, that could have a huge impact on the number of classroom requests fulfilled.
A good solution will enable DonorsChoose.org to build targeted email campaigns recommending specific classroom requests to prior donors.
Data Description
Connect with a local spark instance
The returned Spark connection (sc) provides a remote dplyr data source to the Spark cluster.
Load the required libraries
Load the donors dataset
The dataset is chosen from the donors choose dataset DonorsChoose dataset
A combined dataset will be prepared with the projects, donations, donors, schools and teachers datasets.
rm(list = ls())
fillColor = "#FFA07A"
fillColor2 = "#F1C40F"
fillColorLightCoral = "#F08080"
donations <- as.tibble(fread("/Users/priyashaji/Documents/cuny msds/Summer'19/data 612/Final Project/Datasets/Donations.csv"))
donors <- as.tibble(fread("/Users/priyashaji/Documents/cuny msds/Summer'19/data 612/Final Project/Datasets/Donors.csv"))
projects <- read_csv("/Users/priyashaji/Documents/cuny msds/Summer'19/data 612/Final Project/Datasets/Projects.csv",
col_types = cols(X1 = col_integer(), `Project ID` = col_character(), `School ID` = col_character(),
`Teacher ID` = col_character(), `Teacher Project Posted Sequence` = col_integer(),
`Project Type` = col_character(), `Project Title` = col_character(),
`Project Essay` = col_character(), `Project Subject Category Tree` = col_character(),
`Project Subject Subcategory Tree` = col_character(), `Project Grade Level Category` = col_character(),
`Project Resource Category` = col_character(), `Project Cost` = col_character(),
`Project Posted Date` = col_date(format = ""), `Project Current Status` = col_character(),
`Project Fully Funded Date` = col_date(format = "")))
schools <- read_csv("/Users/priyashaji/Documents/cuny msds/Summer'19/data 612/Final Project/Datasets/Schools.csv")
teachers <- read_csv("/Users/priyashaji/Documents/cuny msds/Summer'19/data 612/Final Project/Datasets/Teachers.csv")
projects <- projects %>% rename(ProjectType = `Project Type`) %>% rename(Category = `Project Subject Category Tree`) %>%
rename(SubCategory = `Project Subject Subcategory Tree`) %>% rename(Grade = `Project Grade Level Category`) %>%
rename(ResourceCategory = `Project Resource Category`) %>% rename(Cost = `Project Cost`) %>%
rename(PostedDate = `Project Posted Date`) %>% rename(CurrentStatus = `Project Current Status`) %>%
rename(FullyFundedDate = `Project Fully Funded Date`)
donations <- donations %>% rename(DonationAmount = `Donation Amount`)
donors <- donors %>% rename(DonorState = `Donor State`)
projects <- projects %>% rename(ProjectTitle = `Project Title`)Copying the datasets to spark instance
Glimpse of data
Donations
Observations: 4,687,884
Variables: 7
$ `Project ID` <chr> "000009891526c0ade7180f842…
$ `Donation ID` <chr> "688729120858666221208529e…
$ `Donor ID` <chr> "1f4b5b6e68445c6c4a0509b3a…
$ `Donation Included Optional Donation` <chr> "No", "Yes", "Yes", "Yes",…
$ DonationAmount <dbl> 178.37, 25.00, 20.00, 25.0…
$ `Donor Cart Sequence` <int> 11, 2, 3, 1, 2, 1, 1, 2, 2…
$ `Donation Received Date` <chr> "2016-08-23 13:15:57", "20…
Donors
Observations: 2,122,640
Variables: 5
$ `Donor ID` <chr> "00000ce845c00cbf0686c992fc369df4", "00002783…
$ `Donor City` <chr> "Evanston", "Appomattox", "Winton", "Indianap…
$ DonorState <chr> "Illinois", "other", "California", "Indiana",…
$ `Donor Is Teacher` <chr> "No", "No", "Yes", "No", "No", "No", "No", "N…
$ `Donor Zip` <chr> "602", "245", "953", "462", "075", "", "069",…
Project
Observations: 62,806
Variables: 66
$ `Project ID` <chr> "7685f0265a19d7b52a470ee4bac88…
$ `School ID` <chr> "e180c7424cb9c68cb49f141b092a9…
$ `Teacher ID` <chr> "4ee5200e89d9e2998ec8baad8a3c5…
$ `Teacher Project Posted Sequence` <int> 25, 3, 1, 2, NA, NA, 3, 57, 14…
$ ProjectType <chr> "Teacher-Led", "Teacher-Led", …
$ ProjectTitle <chr> "Stand Up to Bullying: Togethe…
$ `Project Essay` <chr> "Did you know that 1-7 student…
$ `Project Short Description` <chr> "\"Stand Up For Yourself and Y…
$ `Project Need Statement` <chr> "\"A Smart Kid's Guide to Onli…
$ Category <chr> "Applied Learning", "Applied L…
$ SubCategory <chr> "Character Education, Early De…
$ Grade <chr> "Grades PreK-2", "Grades PreK-…
$ ResourceCategory <chr> "Technology", "Technology", "a…
$ Cost <chr> "361.8", "512.85", "where they…
$ PostedDate <date> 2013-01-01, 2013-01-01, NA, N…
$ `Project Expiration Date` <chr> "2013-05-30", "2013-05-31", "T…
$ CurrentStatus <chr> "Fully Funded", "Expired", "ar…
$ FullyFundedDate <date> 2013-01-11, NA, NA, NA, NA, N…
$ X19 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X20 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X21 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X22 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X23 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X24 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X25 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X26 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X27 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X28 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X29 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X30 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X31 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X32 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X33 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X34 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X35 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X36 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X37 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X38 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X39 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X40 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X41 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X42 <dbl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X43 <date> NA, NA, NA, NA, NA, NA, NA, N…
$ X44 <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X45 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X46 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X47 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X48 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X49 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X50 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X51 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X52 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X53 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X54 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X55 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X56 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X57 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X58 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X59 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X60 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X61 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X62 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X63 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X64 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X65 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X66 <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
project_char <- data.frame(lapply(projects, as.character), stringsAsFactors = FALSE)
glimpse(project_char)Observations: 62,806
Variables: 66
$ Project.ID <chr> "7685f0265a19d7b52a470ee4bac883b…
$ School.ID <chr> "e180c7424cb9c68cb49f141b092a988…
$ Teacher.ID <chr> "4ee5200e89d9e2998ec8baad8a3c596…
$ Teacher.Project.Posted.Sequence <chr> "25", "3", "1", "2", NA, NA, "3"…
$ ProjectType <chr> "Teacher-Led", "Teacher-Led", "T…
$ ProjectTitle <chr> "Stand Up to Bullying: Together …
$ Project.Essay <chr> "Did you know that 1-7 students …
$ Project.Short.Description <chr> "\"Stand Up For Yourself and You…
$ Project.Need.Statement <chr> "\"A Smart Kid's Guide to Online…
$ Category <chr> "Applied Learning", "Applied Lea…
$ SubCategory <chr> "Character Education, Early Deve…
$ Grade <chr> "Grades PreK-2", "Grades PreK-2"…
$ ResourceCategory <chr> "Technology", "Technology", "and…
$ Cost <chr> "361.8", "512.85", "where they w…
$ PostedDate <chr> "2013-01-01", "2013-01-01", NA, …
$ Project.Expiration.Date <chr> "2013-05-30", "2013-05-31", "The…
$ CurrentStatus <chr> "Fully Funded", "Expired", "are …
$ FullyFundedDate <chr> "2013-01-11", NA, NA, NA, NA, NA…
$ X19 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X20 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X21 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X22 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X23 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X24 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X25 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X26 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X27 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X28 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X29 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X30 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X31 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X32 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X33 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X34 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X35 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X36 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X37 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X38 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X39 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X40 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X41 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X42 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X43 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X44 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X45 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X46 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X47 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X48 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X49 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X50 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X51 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X52 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X53 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X54 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X55 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X56 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X57 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X58 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X59 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X60 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X61 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X62 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X63 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X64 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X65 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X66 <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
Schools
Observations: 72,993
Variables: 9
$ `School ID` <chr> "00003e0fdd601b8ea0a6eb44057b9c5e…
$ `School Name` <chr> "Capon Bridge Middle School", "Th…
$ `School Metro Type` <chr> "rural", "urban", "suburban", "un…
$ `School Percentage Free Lunch` <dbl> 56, 41, 2, 76, 50, 63, 17, 15, 46…
$ `School State` <chr> "West Virginia", "Texas", "Washin…
$ `School Zip` <chr> "26711", "77384", "98074", "48370…
$ `School City` <chr> "Capon Bridge", "The Woodlands", …
$ `School County` <chr> "Hampshire", "Montgomery", "King"…
$ `School District` <chr> "Hampshire Co School District", "…
Teachers
Observations: 402,900
Variables: 3
$ `Teacher ID` <chr> "00000f7264c27ba6fea0c837ed6…
$ `Teacher Prefix` <chr> "Mrs.", "Mrs.", "Mr.", "Ms."…
$ `Teacher First Project Posted Date` <date> 2013-08-21, 2016-10-23, 201…
Combining 5 datasets
A combined dataset is prepared with the projects, donations, donors, schools and teachers datasets.
projects_sample <- head(projects, 5000)
projects_sample_donations <- inner_join(projects_sample, donations)
projects_sample_donations_donors <- inner_join(projects_sample_donations, donors)
projects_sample_donations_donors_teachers <- inner_join(projects_sample_donations_donors,
teachers, by = c("Teacher ID"))
projects_sample_donations_donors_teachers_schools <- inner_join(projects_sample_donations_donors_teachers,
schools, by = c("School ID"))
combined_sample <- projects_sample_donations_donors_teachers_schools
glimpse(combined_sample)Observations: 17,917
Variables: 86
$ `Project ID` <chr> "7685f0265a19d7b52a470ee4b…
$ `School ID` <chr> "e180c7424cb9c68cb49f141b0…
$ `Teacher ID` <chr> "4ee5200e89d9e2998ec8baad8…
$ `Teacher Project Posted Sequence` <int> 25, 25, 25, 25, 25, 25, 25…
$ ProjectType <chr> "Teacher-Led", "Teacher-Le…
$ ProjectTitle <chr> "Stand Up to Bullying: Tog…
$ `Project Essay` <chr> "Did you know that 1-7 stu…
$ `Project Short Description` <chr> "\"Stand Up For Yourself a…
$ `Project Need Statement` <chr> "\"A Smart Kid's Guide to …
$ Category <chr> "Applied Learning", "Appli…
$ SubCategory <chr> "Character Education, Earl…
$ Grade <chr> "Grades PreK-2", "Grades P…
$ ResourceCategory <chr> "Technology", "Technology"…
$ Cost <chr> "361.8", "361.8", "361.8",…
$ PostedDate <date> 2013-01-01, 2013-01-01, 2…
$ `Project Expiration Date` <chr> "2013-05-30", "2013-05-30"…
$ CurrentStatus <chr> "Fully Funded", "Fully Fun…
$ FullyFundedDate <date> 2013-01-11, 2013-01-11, 2…
$ X19 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X20 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X21 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X22 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X23 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X24 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X25 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X26 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X27 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X28 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X29 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X30 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X31 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X32 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X33 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X34 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X35 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X36 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X37 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X38 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X39 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X40 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X41 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X42 <dbl> NA, NA, NA, NA, NA, NA, NA…
$ X43 <date> NA, NA, NA, NA, NA, NA, N…
$ X44 <chr> NA, NA, NA, NA, NA, NA, NA…
$ X45 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X46 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X47 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X48 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X49 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X50 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X51 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X52 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X53 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X54 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X55 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X56 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X57 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X58 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X59 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X60 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X61 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X62 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X63 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X64 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X65 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X66 <lgl> NA, NA, NA, NA, NA, NA, NA…
$ `Donation ID` <chr> "fd43856fcca94c699bc4ed0dc…
$ `Donor ID` <chr> "ec3ebabc792625d381bcd3d91…
$ `Donation Included Optional Donation` <chr> "Yes", "Yes", "Yes", "Yes"…
$ DonationAmount <dbl> 25.0, 10.0, 1.0, 1.0, 1.0,…
$ `Donor Cart Sequence` <int> 2, 26, 2, 1, 1, 1234, 11, …
$ `Donation Received Date` <chr> "2013-01-10 18:41:11", "20…
$ `Donor City` <chr> "", "Los Angeles", "", "",…
$ DonorState <chr> "other", "California", "ot…
$ `Donor Is Teacher` <chr> "No", "Yes", "No", "No", "…
$ `Donor Zip` <chr> "", "900", "", "", "", "",…
$ `Teacher Prefix` <chr> "Mrs.", "Mrs.", "Mrs.", "M…
$ `Teacher First Project Posted Date` <date> 2011-12-11, 2011-12-11, 2…
$ `School Name` <chr> "Stanford Primary Center",…
$ `School Metro Type` <chr> "suburban", "suburban", "s…
$ `School Percentage Free Lunch` <dbl> 95, 95, 95, 95, 95, 95, 95…
$ `School State` <chr> "California", "California"…
$ `School Zip` <chr> "90280", "90280", "90280",…
$ `School City` <chr> "South Gate", "South Gate"…
$ `School County` <chr> "Los Angeles", "Los Angele…
$ `School District` <chr> "Los Angeles Unif Sch Dist…
Preparing the Non-Text features
In the combined dataset the non-textual features are
Category,SubCategory,Grade,ResourceCategory,SchoolState, TeacherPrefix features.
combined_sample <- combined_sample %>% rename(TeacherPrefix = "Teacher Prefix",
SchoolName = "School Name", SchoolMetroType = "School Metro Type", SchoolState = "School State",
SchoolCity = "School City", SchoolDistrict = "School District")
combined_sample2 <- combined_sample %>% select("Project ID", "ProjectType",
"ProjectTitle", "Project Essay", "Category", "SubCategory", "Grade", "ResourceCategory",
"Donor ID", "TeacherPrefix", "SchoolState")
dummies <- dummyVars(~Category + SubCategory + Grade + ResourceCategory + SchoolState +
TeacherPrefix, data = combined_sample)
projects_ohe <- as.data.frame(predict(dummies, newdata = combined_sample))
projects_ohe[is.na(projects_ohe)] <- 0
names(projects_ohe) <- make.names(names(projects_ohe), unique = TRUE)
category_features <- c("Category", "SubCategory", "Grade", "ResourceCategory",
"SchoolState", "TeacherPrefix")
combined_sample2 <- cbind(combined_sample2[, -c(which(colnames(combined_sample2) %in%
category_features))], projects_ohe)
combined_sample3 <- combined_sample2 %>% select(-`Donor ID`, -`Project Essay`,
-ProjectTitle, -`Project ID`)
features <- colnames(combined_sample3)
for (f in features) {
if ((class(combined_sample3[[f]]) == "factor") || (class(combined_sample3[[f]]) ==
"character")) {
levels <- unique(combined_sample3[[f]])
combined_sample3[[f]] <- as.numeric(factor(combined_sample3[[f]], levels = levels))
}
}Cosine Similarity between two donors
The cosine similiarity helps to determine whether the items requested is similar or not.
It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.
Here we want to compare the projects how similiar they are to each other. If they are similiar, then the donor is Recommended the similiar project.
Recommendations using Text Features
Let us use the Project Title and Projects Essays to find similiarities between projects using the TFIDF concept and cosine similiarity
Create a vocabulary-based DTM. Here we collect unique terms from all documents and mark each of them with a unique ID using the create_vocabulary() function. An iterator was used to create the vocabulary. Vocabulary is also pruned to reduce the terms in the matrix.
The greater is the value of the cosine, the greater is the similiarity between the projects. The similiar projects are the candidates for recommendation.
combined_train_text = paste(combined_sample2$ProjectTitle, combined_sample2$`Project Essay`)
prep_fun = function(x) {
stringr::str_replace_all(tolower(x), "[^[:graph:]]", " ")
}
myTokeniser <- function(str) str %>% stringr::str_replace_all("[']|[â\u0080\u0099]",
" ") %>% tokenizers::tokenize_words(., stopwords = stopwords(language = "en"))
tok_fun = myTokeniser
it_train = itoken(combined_train_text, preprocessor = prep_fun, tokenizer = tok_fun,
ids = combined_sample2$`Project ID`, progressbar = FALSE)
vocab = create_vocabulary(it_train, ngram = c(1L, 3L), stopwords = stopwords("en"))Prune the vocabulary to eliminate the unnecessary words.
vocab = vocab %>% prune_vocabulary(term_count_min = 10, doc_proportion_max = 0.5,
doc_proportion_min = 0.01, vocab_term_max = 5000)
vocabNumber of docs: 17917
175 stopwords: i, me, my, myself, we, our ...
ngram_min = 1; ngram_max = 3
Vocabulary:
term term_count doc_count
1: need 11118 8245
2: can 9647 6469
3: school 8884 5976
4: help 8300 5776
5: learning 7427 5429
---
970: technology_technology 182 182
971: students_don 181 181
972: hav 181 181
973: hispanic 181 181
974: grades_6 180 180
Create the Document Term Matrix(DTM)
Create the Document Term Matrix.
TF-IDF
Let’s do a TF-IDF to increase the weight of terms which are specific to a single document or handful of documents and decrease the weight for terms used in most documents
Cosine similiarity between project 1 and Project 1 using Text Features only
Here I choose an example to compare similiarities using Text Features only between projects 1 and 1
getCosine <- function(x, y) {
this.cosine <- sum(x * y)/(sqrt(sum(x * x)) * sqrt(sum(y * y)))
return(this.cosine)
}
getCosine(dtm_train_tfidf[1, ], dtm_train_tfidf[1, ])[1] 1
Cosine similiarity between project 1 and Project 2
Here I choose an example to compare similiarities using Text Features only between projects 1 and 2
[1] 1
Cosine similiarity between project 1 and Project 50
Here I choose an example to compare similiarities using Text Features only between projects 1 and 50
[1] 0.008048809
Cosine similiarity between project 1 and Project 100
Here I choose an example to compare similiarities using Text Features only between projects 1 and 100
[1] 0.03398923
Cosine similiarity of projects of the same donor
Let’s find the projects associated with the same donor. In the following code section, we find Sample 1 , 2 and 3 of the same donors.
donations_project_group <- combined_sample %>% group_by(`Donor ID`) %>% summarise(Count = n())
donations_project_group <- donations_project_group %>% filter(Count > 1)
same_donor <- which(combined_sample$`Donor ID` == donations_project_group[1,
]$`Donor ID`)
same_donor2 <- which(combined_sample$`Donor ID` == donations_project_group[2,
]$`Donor ID`)
same_donor3 <- which(combined_sample$`Donor ID` == donations_project_group[3,
]$`Donor ID`)Same Donors Sample 1
Choose Sample 1 Same Donors
combined_sample[same_donor[1], ] %>% select(`Donor ID`, `Project ID`, ProjectTitle,
Category, SubCategory, Grade, ResourceCategory, SchoolState, TeacherPrefix) %>%
kable()| Donor ID | Project ID | ProjectTitle | Category | SubCategory | Grade | ResourceCategory | SchoolState | TeacherPrefix |
|---|---|---|---|---|---|---|---|---|
| 00589b11dd54d68b0950555262ed8475 | 1abc7b1f997d4299c97f5567b62dab58 | Leaping Lizards! Learning Literacy Is Child’s Play | NA | NA | NA | NA | Oklahoma | Mrs. |
combined_sample[same_donor[2], ] %>% select(`Donor ID`, `Project ID`, ProjectTitle,
Category, SubCategory, Grade, ResourceCategory, SchoolState, TeacherPrefix) %>%
kable()| Donor ID | Project ID | ProjectTitle | Category | SubCategory | Grade | ResourceCategory | SchoolState | TeacherPrefix |
|---|---|---|---|---|---|---|---|---|
| 00589b11dd54d68b0950555262ed8475 | cb7a036b7e6c3f1faa0c90b667dfb897 | Improving Agriculture Literacy via a School Farm | Math & Science, Applied Learning | Applied Sciences, Community Service | Grades 9-12 | Supplies | Massachusetts | Mrs. |
Cosine Similiarity Same Donors Sample 1
Calculation of Cosine similiarity of projects belonging to same donor Sample 1
[1] 0.02384247
Same Donors Sample 2
Choose Sample 2 Same Donors
combined_sample[same_donor2[1], ] %>% select(`Donor ID`, `Project ID`, ProjectTitle,
Category, SubCategory, Grade, ResourceCategory, SchoolState, TeacherPrefix) %>%
kable()| Donor ID | Project ID | ProjectTitle | Category | SubCategory | Grade | ResourceCategory | SchoolState | TeacherPrefix |
|---|---|---|---|---|---|---|---|---|
| 00906dbd8da115318175861ccfe1dcd8 | b81056d684753054e332b3f185576c64 | “Center”ing in Math | Mathematics | Grades PreK-2 | Other | 388.14 | Texas | Ms. |
combined_sample[same_donor2[2], ] %>% select(`Donor ID`, `Project ID`, ProjectTitle,
Category, SubCategory, Grade, ResourceCategory, SchoolState, TeacherPrefix) %>%
kable()| Donor ID | Project ID | ProjectTitle | Category | SubCategory | Grade | ResourceCategory | SchoolState | TeacherPrefix |
|---|---|---|---|---|---|---|---|---|
| 00906dbd8da115318175861ccfe1dcd8 | 2d7f1f43ad03503f85c47f024774ec33 | Interactive Spanish Centers For Our Rising Stars | some of my students are the only English speakers | my students will finally be able to reap the full | NA | NA | Texas | Ms. |
Cosine Similiarity Same Donors Sample 2
Calculation of Cosine similiarity of projects belonging to same donor Sample 2
[1] 0.005716124
Same Donors Sample 3
Choose Sample 3 Same Donors
combined_sample[same_donor3[1], ] %>% select(`Donor ID`, `Project ID`, ProjectTitle,
Category, SubCategory, Grade, ResourceCategory, SchoolState, TeacherPrefix) %>%
kable()| Donor ID | Project ID | ProjectTitle | Category | SubCategory | Grade | ResourceCategory | SchoolState | TeacherPrefix |
|---|---|---|---|---|---|---|---|---|
| 009a9a98cb675dc9eed258d5ad2bfa77 | 471b790b9d9e266d76de3d364197e191 | Stop Sharing Sniffles And Sneezes | much like an overhead projector. A new projector | base ten blocks | or science experiment | they need to be able to see the details and proce | New York | Mrs. |
combined_sample[same_donor3[2], ] %>% select(`Donor ID`, `Project ID`, ProjectTitle,
Category, SubCategory, Grade, ResourceCategory, SchoolState, TeacherPrefix) %>%
kable()| Donor ID | Project ID | ProjectTitle | Category | SubCategory | Grade | ResourceCategory | SchoolState | TeacherPrefix |
|---|---|---|---|---|---|---|---|---|
| 009a9a98cb675dc9eed258d5ad2bfa77 | 471b790b9d9e266d76de3d364197e191 | Stop Sharing Sniffles And Sneezes | much like an overhead projector. A new projector | base ten blocks | or science experiment | they need to be able to see the details and proce | New York | Mrs. |
Cosine Similiarity Same Donors Sample 3
Calculation of Cosine similiarity of projects belonging to same donor Sample 3
[1] 1
Combining the Text and Non Text Features
combined_non_text_sample <- combined_sample3 %>% replace_na(list(SchoolPercentageFreeLunch = -1,
ItemName = -1, SchoolID = -1, TeacherID = -1, ProjectType = -1, Category = -1,
SubCategory = -1, Grade = -1, ResourceCategory = -1, Cost = -1, TeacherPrefix = -1,
SchoolName = -1, SchoolMetroType = -1, SchoolState = -1, SchoolCity = -1,
SchoolDistrict = -1, Qty = -1, UnitPrice = -1, Amount = -1)) %>% sparse.model.matrix(~. -
1, .)
combined_sample4 <- combined_non_text_sample %>% cbind(dtm_train_tfidf)Recommendations using Non Text Features only
The Non Text features include Category,SubCategory,Grade,ResourceCategory,SchoolState, TeacherPrefix
Cosine similiarity between project 1 and Project 2
Here I choose an example to compare similiarities using Non Text Features between projects 1 and 2
[1] 1
Cosine similiarity between project 1 and Project 50
Here I choose an example to compare similiarities using Non Text Features between projects 1 and 50
[1] 0.3380617
Cosine similiarity between project 1 and Project 100
Here I choose an example to compare similiarities using Non Text Features between projects 1 and 100
[1] 0.2857143
Cosine Similiarity of projects of same donor Sample 1
[1] 0.4364358
Cosine Similiarity of projects of same donor Sample 2
[1] 0.5070926
Cosine Similiarity of projects of same donor Sample 3
[1] 1
Top 10 recommended Projects
The recommendations using Non Text Features
Donor 1 Projects
combined_sample %>% head(1) %>% select(`Donor ID`, `Project ID`, ProjectTitle,
Category, SubCategory, Grade, ResourceCategory, SchoolState, TeacherPrefix) %>%
unique() %>% kable()| Donor ID | Project ID | ProjectTitle | Category | SubCategory | Grade | ResourceCategory | SchoolState | TeacherPrefix |
|---|---|---|---|---|---|---|---|---|
| ec3ebabc792625d381bcd3d911c72383 | 7685f0265a19d7b52a470ee4bac883ba | Stand Up to Bullying: Together We Can! | Applied Learning | Character Education, Early Development | Grades PreK-2 | Technology | California | Mrs. |
This lists down the Top 10 Recommended Projects for Donor 1
counter_vector <- c()
cosine_vector <- c()
ProjectID <- c()
Category <- c()
SubCategory <- c()
Grade <- c()
ResourceCategory <- c()
SchoolState <- c()
TeacherPrefix <- c()
for (counter in 2:102) {
cosine_sim <- getCosine(combined_non_text_sample[1, ], combined_non_text_sample[counter,
])
counter_vector <- c(counter_vector, counter)
cosine_vector <- c(cosine_vector, cosine_sim)
ProjectID <- c(ProjectID, combined_sample[counter, ]$`Project ID`)
Category <- c(Category, combined_sample[counter, ]$Category)
SubCategory <- c(SubCategory, combined_sample[counter, ]$SubCategory)
Grade <- c(Grade, combined_sample[counter, ]$Grade)
ResourceCategory <- c(ResourceCategory, combined_sample[counter, ]$ResourceCategory)
SchoolState <- c(SchoolState, combined_sample[counter, ]$SchoolState)
TeacherPrefix <- c(TeacherPrefix, combined_sample[counter, ]$TeacherPrefix)
}
df <- data.frame(cbind(counter_vector, cosine_vector, ProjectID, Category, SubCategory,
Grade, ResourceCategory, SchoolState, TeacherPrefix))
df <- df %>% arrange(desc(cosine_vector))
donor_project_id <- combined_sample %>% head(1) %>% select(`Project ID`)
df %>% filter(ProjectID != donor_project_id$`Project ID`) %>% select(cosine_vector,
ProjectID, Category, SubCategory, Grade, ResourceCategory, SchoolState,
TeacherPrefix) %>% unique() %>% head(10) %>% kable()| cosine_vector | ProjectID | Category | SubCategory | Grade | ResourceCategory | SchoolState | TeacherPrefix | |
|---|---|---|---|---|---|---|---|---|
| 1 | 0.654653670707977 | c265d5c4b7ef2059e4e84ddcef18bdd9 | NA | NA | NA | NA | California | Mrs. |
| 4 | 0.571428571428571 | f9f4af7099061fb4bf44642a03e5c331 | Applied Learning, Literacy & Language | Early Development, Literacy | Grades PreK-2 | Technology | Georgia | Mrs. |
| 6 | 0.338061701891407 | ec82a697fab916c0db0cdad746338df9 | My students need items such as Velcro, two pounds | |||||
| 56395 | 8074d7b12b48b939279e | b59e6ca,b79a19772090efccde93b3a5934 | d829f,5ef1793ff657860ca7856d475715ec2a,4,Teacher-Led,It’s about Time… Time for Kids!,We know that success in school is directly relate more of their education is tied to textbooks and NA NA | Colorad | o Mrs. | |||
| 12 | 0.308606699924184 | 717c7a01215d532d68f6fe9e666c88c3 | Applied Learning | College & Career Prep, Community Service | Grades | NA | New Jersey | Ms. |
| 34 | 0.285714285714286 | afd99a01739ad5557b51b1ba0174e832 | and suspense. Stude | the students at our school are very resilient and | but they still need support from donors like you. | and studies show that reading at home is an impor | New York | Mrs. |
| 46 | 0.285714285714286 | 49825532f85d0cdb569797df3ab8ec46 | Supplies | 339.2 | 2013-01-01 | 2013-05-29 | New Mexico | Mrs. |
| 51 | 0.285714285714286 | 3dfcaad759c25fcce140814eb8a47592 | many do not have educationally enriching experien | hands-on activities to help fill these gaps. My f | addition | and basic number relationships. The white boards | California | Ms. |
| 52 | 0.285714285714286 | 49409b4858006bbfba35c36338e10ee7 | History & Civics, Math & Science | Economics, Environmental Science | Grades 3-5 | Supplies | Texas | Mrs. |
| 53 | 0.285714285714286 | 14c19c45cbb73319b6b506791c659d9d | with the majority of our students receiving free | which makes it a challenge to differentiate and m | so it is necessary to accommodate their different | several children will record themselves reading a | Massachusetts | Mrs. |
| 54 | 0.285714285714286 | 70d2edb28d3f75b283cc513a9a24eb8b | at a desk | for 7 hours a day? | ) Thailand | Nepal | Colorado | Mrs. |
Recommendations using Text and Non Text Features
Cosine similiarity between project 1 and Project 2
Here I choose an example to compare similiarities using Text and Non Text Features between projects 1 and 2
[1] 1
Cosine similiarity between project 1 and Project 50
Here I choose an example to compare similiarities using Text and Non Text Features between projects 1 and 50
[1] 0.1426086
Cosine similiarity between project 1 and Project 100
Here I choose an example to compare similiarities using Text and Non Text Features between projects 1 and 100
[1] 0.1374252
Cosine Similiarity of projects of same donor Sample 1
[1] 0.1638579
Cosine Similiarity of projects of same donor Sample 2
[1] 0.2173009
Cosine Similiarity of projects of same donor Sample 3
[1] 1
Top 10 recommended Projects
The recommendations using Text and Non Text Features
Donor 1 Projects
combined_sample %>% head(1) %>% select(`Donor ID`, `Project ID`, ProjectTitle,
Category, SubCategory, Grade, ResourceCategory, SchoolState, TeacherPrefix) %>%
unique() %>% kable()| Donor ID | Project ID | ProjectTitle | Category | SubCategory | Grade | ResourceCategory | SchoolState | TeacherPrefix |
|---|---|---|---|---|---|---|---|---|
| ec3ebabc792625d381bcd3d911c72383 | 7685f0265a19d7b52a470ee4bac883ba | Stand Up to Bullying: Together We Can! | Applied Learning | Character Education, Early Development | Grades PreK-2 | Technology | California | Mrs. |
This lists down the Top 10 Recommended Projects for Donor 1
counter_vector <- c()
cosine_vector <- c()
ProjectID <- c()
Category <- c()
SubCategory <- c()
Grade <- c()
ResourceCategory <- c()
SchoolState <- c()
TeacherPrefix <- c()
for (counter in 2:102) {
cosine_sim <- getCosine(combined_sample4[1, ], combined_sample4[counter,
])
counter_vector <- c(counter_vector, counter)
cosine_vector <- c(cosine_vector, cosine_sim)
ProjectID <- c(ProjectID, combined_sample[counter, ]$`Project ID`)
Category <- c(Category, combined_sample[counter, ]$Category)
SubCategory <- c(SubCategory, combined_sample[counter, ]$SubCategory)
Grade <- c(Grade, combined_sample[counter, ]$Grade)
ResourceCategory <- c(ResourceCategory, combined_sample[counter, ]$ResourceCategory)
SchoolState <- c(SchoolState, combined_sample[counter, ]$SchoolState)
TeacherPrefix <- c(TeacherPrefix, combined_sample[counter, ]$TeacherPrefix)
}
df <- data.frame(cbind(counter_vector, cosine_vector, ProjectID, Category, SubCategory,
Grade, ResourceCategory, SchoolState, TeacherPrefix))
df <- df %>% arrange(desc(cosine_vector))
donor_project_id <- combined_sample %>% head(1) %>% select(`Project ID`)
df %>% filter(ProjectID != donor_project_id$`Project ID`) %>% select(cosine_vector,
ProjectID, Category, SubCategory, Grade, ResourceCategory, SchoolState,
TeacherPrefix) %>% unique() %>% head(10) %>% kable()| cosine_vector | ProjectID | Category | SubCategory | Grade | ResourceCategory | SchoolState | TeacherPrefix | |
|---|---|---|---|---|---|---|---|---|
| 1 | 0.252287883049293 | f9f4af7099061fb4bf44642a03e5c331 | Applied Learning, Literacy & Language | Early Development, Literacy | Grades PreK-2 | Technology | Georgia | Mrs. |
| 3 | 0.214650050348922 | c265d5c4b7ef2059e4e84ddcef18bdd9 | NA | NA | NA | NA | California | Mrs. |
| 6 | 0.163704722465574 | afd99a01739ad5557b51b1ba0174e832 | and suspense. Stude | the students at our school are very resilient and | but they still need support from donors like you. | and studies show that reading at home is an impor | New York | Mrs. |
| 18 | 0.161534585915996 | 14c19c45cbb73319b6b506791c659d9d | with the majority of our students receiving free | which makes it a challenge to differentiate and m | so it is necessary to accommodate their different | several children will record themselves reading a | Massachusetts | Mrs. |
| 19 | 0.159438753518438 | 49825532f85d0cdb569797df3ab8ec46 | Supplies | 339.2 | 2013-01-01 | 2013-05-29 | New Mexico | Mrs. |
| 24 | 0.152412074549055 | 70d2edb28d3f75b283cc513a9a24eb8b | at a desk | for 7 hours a day? | ) Thailand | Nepal | Colorado | Mrs. |
| 30 | 0.142608553663939 | ec82a697fab916c0db0cdad746338df9 | My students need items such as Velcro, two pounds | |||||
| 56395 | 8074d7b12b48b939279e | b59e6ca,b79a19772090efccde93b3a5934 | d829f,5ef1793ff657860ca7856d475715ec2a,4,Teacher-Led,It’s about Time… Time for Kids!,We know that success in school is directly relate more of their education is tied to textbooks and NA NA | Colorad | o Mrs. | |||
| 36 | 0.139151094789132 | 717c7a01215d532d68f6fe9e666c88c3 | Applied Learning | College & Career Prep, Community Service | Grades | NA | New Jersey | Ms. |
| 58 | 0.137878789522907 | 3dfcaad759c25fcce140814eb8a47592 | many do not have educationally enriching experien | hands-on activities to help fill these gaps. My f | addition | and basic number relationships. The white boards | California | Ms. |
| 59 | 0.137425214539727 | d3fc101ea24e26443dbfe9bc7560d08c | Math & Science, Literacy & Language | Environmental Science, Literacy | Grades PreK-2 | Supplies | North Carolina | Ms. |
Conclusion
I did the Item Item Recommendations using
Text Features Only
Text and Non Text Features
Non Text Features only .
The Non Text features include Category,SubCategory,Grade,ResourceCategory,SchoolState, TeacherPrefix
The accuracy is highest for the recommendations using the Non Text Features only.
Projects with more than 1 Donor
I choose the Projects with more than one donor.
donations = donations %>% rename(DonorID = `Donor ID`, ProjectID = `Project ID`)
donations_project_group <- donations %>% group_by(ProjectID) %>% summarise(Count = n())
donations_project_group <- donations_project_group %>% filter(Count > 1)
donations_sample <- donations %>% filter(ProjectID %in% donations_project_group$ProjectID) %>%
head(10000)Donor ID Project ID matrix
Let’s create a Dataframe with Donor ID, Project ID and the Donation Amount. I will do log transformation of the Donation Amount.
donations_sample_group <- donations_sample %>% group_by(DonorID, ProjectID) %>%
summarise(Rating = sum(DonationAmount)) %>% ungroup()
donations_sample_group$Rating <- log(donations_sample_group$Rating + 1)
glimpse(donations_sample_group)Observations: 8,585
Variables: 3
$ DonorID <chr> "0003aba06ccf49f8c44fc2dd3b582411", "000f7306e8ddb3629…
$ ProjectID <chr> "0081553d51ed5d2529e2e38b0827133a", "007e2a1a47ce50ded…
$ Rating <dbl> 3.931826, 3.258097, 3.258097, 2.140066, 3.931826, 2.82…
Restructuring the matrix
The matrix is created with Donor Names along the rows and Project IDS along the columns.
dimension_names <- list(DonorID = sort(unique(donations_sample_group$DonorID)),
ProjectID = sort(unique(donations_sample_group$ProjectID)))
ratingmat <- spread(select(donations_sample_group, ProjectID, DonorID, Rating),
ProjectID, Rating) %>% select(-DonorID)
ratingmat <- as.matrix(ratingmat)
dimnames(ratingmat) <- dimension_names
ratingmat[1:5, 1:5] ProjectID
DonorID 000009891526c0ade7180f8423792063
0003aba06ccf49f8c44fc2dd3b582411 NA
000f7306e8ddb36296f0d97a34d67d76 NA
00125f251b05d9e447a5448bef981028 NA
0013dfb2a873420fe6e7d750ef24ce98 NA
0016b23800f7ea46424b3254f016007a NA
ProjectID
DonorID 00000ce845c00cbf0686c992fc369df4
0003aba06ccf49f8c44fc2dd3b582411 NA
000f7306e8ddb36296f0d97a34d67d76 NA
00125f251b05d9e447a5448bef981028 NA
0013dfb2a873420fe6e7d750ef24ce98 NA
0016b23800f7ea46424b3254f016007a NA
ProjectID
DonorID 00002d44003ed46b066607c5455a999a
0003aba06ccf49f8c44fc2dd3b582411 NA
000f7306e8ddb36296f0d97a34d67d76 NA
00125f251b05d9e447a5448bef981028 NA
0013dfb2a873420fe6e7d750ef24ce98 NA
0016b23800f7ea46424b3254f016007a NA
ProjectID
DonorID 00002eb25d60a09c318efbd0797bffb5
0003aba06ccf49f8c44fc2dd3b582411 NA
000f7306e8ddb36296f0d97a34d67d76 NA
00125f251b05d9e447a5448bef981028 NA
0013dfb2a873420fe6e7d750ef24ce98 NA
0016b23800f7ea46424b3254f016007a NA
ProjectID
DonorID 00008f7aaca8ab932c1bc1d0bc449186
0003aba06ccf49f8c44fc2dd3b582411 NA
000f7306e8ddb36296f0d97a34d67d76 NA
00125f251b05d9e447a5448bef981028 NA
0013dfb2a873420fe6e7d750ef24ce98 NA
0016b23800f7ea46424b3254f016007a NA
Using SVD
Donor Project Matrix dimensions
The NA values are replaced by zero since SVD does not work with NA values.
[1] 7926 1468
Dimensionality reduction using SVD
The first 20 factors are taken into consideration
svd1 <- svd(ratingmat0)
approx20 <- svd1$u[, 1:20] %*% diag(svd1$d[1:20]) %*% t(svd1$v[, 1:20])
dimnames(approx20) <- dimension_names
dim(approx20)[1] 7926 1468
ProjectID
DonorID 000009891526c0ade7180f8423792063
0003aba06ccf49f8c44fc2dd3b582411 0
000f7306e8ddb36296f0d97a34d67d76 0
00125f251b05d9e447a5448bef981028 0
0013dfb2a873420fe6e7d750ef24ce98 0
0016b23800f7ea46424b3254f016007a 0
00309a47b765e12714d817ee3215de1e 0
0036448e416b71ab040182c428958b6f 0
ProjectID
DonorID 00000ce845c00cbf0686c992fc369df4
0003aba06ccf49f8c44fc2dd3b582411 0
000f7306e8ddb36296f0d97a34d67d76 0
00125f251b05d9e447a5448bef981028 0
0013dfb2a873420fe6e7d750ef24ce98 0
0016b23800f7ea46424b3254f016007a 0
00309a47b765e12714d817ee3215de1e 0
0036448e416b71ab040182c428958b6f 0
ProjectID
DonorID 00002d44003ed46b066607c5455a999a
0003aba06ccf49f8c44fc2dd3b582411 0
000f7306e8ddb36296f0d97a34d67d76 0
00125f251b05d9e447a5448bef981028 0
0013dfb2a873420fe6e7d750ef24ce98 0
0016b23800f7ea46424b3254f016007a 0
00309a47b765e12714d817ee3215de1e 0
0036448e416b71ab040182c428958b6f 0
ProjectID
DonorID 00002eb25d60a09c318efbd0797bffb5
0003aba06ccf49f8c44fc2dd3b582411 0
000f7306e8ddb36296f0d97a34d67d76 0
00125f251b05d9e447a5448bef981028 0
0013dfb2a873420fe6e7d750ef24ce98 0
0016b23800f7ea46424b3254f016007a 0
00309a47b765e12714d817ee3215de1e 0
0036448e416b71ab040182c428958b6f 0
ProjectID
DonorID 00008f7aaca8ab932c1bc1d0bc449186
0003aba06ccf49f8c44fc2dd3b582411 0
000f7306e8ddb36296f0d97a34d67d76 0
00125f251b05d9e447a5448bef981028 0
0013dfb2a873420fe6e7d750ef24ce98 0
0016b23800f7ea46424b3254f016007a 0
00309a47b765e12714d817ee3215de1e 0
0036448e416b71ab040182c428958b6f 0
ProjectID
DonorID 0000bbd74feb563a324fe441eae19feb
0003aba06ccf49f8c44fc2dd3b582411 0
000f7306e8ddb36296f0d97a34d67d76 0
00125f251b05d9e447a5448bef981028 0
0013dfb2a873420fe6e7d750ef24ce98 0
0016b23800f7ea46424b3254f016007a 0
00309a47b765e12714d817ee3215de1e 0
0036448e416b71ab040182c428958b6f 0
ProjectID
DonorID 0000be4b3c81e1cef858d536bb740052
0003aba06ccf49f8c44fc2dd3b582411 0
000f7306e8ddb36296f0d97a34d67d76 0
00125f251b05d9e447a5448bef981028 0
0013dfb2a873420fe6e7d750ef24ce98 0
0016b23800f7ea46424b3254f016007a 0
00309a47b765e12714d817ee3215de1e 0
0036448e416b71ab040182c428958b6f 0
ProjectID
DonorID 0000c0bdc0f15bd239cfffa884791a10
0003aba06ccf49f8c44fc2dd3b582411 -7.816866e-22
000f7306e8ddb36296f0d97a34d67d76 -1.780863e-22
00125f251b05d9e447a5448bef981028 3.182379e-22
0013dfb2a873420fe6e7d750ef24ce98 9.003258e-22
0016b23800f7ea46424b3254f016007a 2.249276e-20
00309a47b765e12714d817ee3215de1e 9.335889e-22
0036448e416b71ab040182c428958b6f 1.606395e-04
ProjectID
DonorID 0000c0ea0aecb2ad60e8d234eab6ed28
0003aba06ccf49f8c44fc2dd3b582411 3.024556e-36
000f7306e8ddb36296f0d97a34d67d76 -1.143914e-36
00125f251b05d9e447a5448bef981028 4.303249e-36
0013dfb2a873420fe6e7d750ef24ce98 -1.955623e-35
0016b23800f7ea46424b3254f016007a -3.122233e-34
00309a47b765e12714d817ee3215de1e 5.490349e-31
0036448e416b71ab040182c428958b6f -1.288485e-18
ProjectID
DonorID 0000cce04fec25bf7f21b0e2f1dcf4b6
0003aba06ccf49f8c44fc2dd3b582411 7.809999e-35
000f7306e8ddb36296f0d97a34d67d76 1.727252e-35
00125f251b05d9e447a5448bef981028 -3.005634e-35
0013dfb2a873420fe6e7d750ef24ce98 -1.170701e-34
0016b23800f7ea46424b3254f016007a -3.980843e-33
00309a47b765e12714d817ee3215de1e 3.005358e-31
0036448e416b71ab040182c428958b6f -1.624449e-17
[ reached getOption("max.print") -- omitted 3 rows ]
Recommendations for Donor 3
Donor 3 Projects
Now according to the prior projects chosen by donor 3, the recommender system would recommend a project for the 3rd donor.
Top Recommended projects for Donor 3
df <- approx20[3, ]
df <- as.data.frame(df)
df$ProjectID <- NULL
df$ProjectID <- row.names(df)
df <- df %>% dplyr::rename(Rating = df)
df <- df %>% arrange(-Rating) %>% head(10)
"%!in%" <- function(x, y) !(x %in% y)
projects %>% filter(`Project ID` %in% df$ProjectID) %>% filter(`Project ID` %!in%
ProjectIDs$ProjectID) %>% select(`Project ID`, Category, SubCategory, Grade,
ResourceCategory) %>% kable()| Project ID | Category | SubCategory | Grade | ResourceCategory |
|---|---|---|---|---|
| 0015703508d8a6703bc0d7f71027fdb4 | Literature & Writing, Social Sciences | Grades 3-5 | Books | 493.35 |
Summary
We did the Item Item Recommendations using
Text Features Only using the Project Title and Projects Essays The Non Text features include Category,SubCategory,Grade,ResourceCategory,SchoolState, TeacherPrefix * Text and Non Text Features.
Non Text Features only
The accuracy is highest for the recommendations using the Non Text Features only.
We also did the User User Recommnedations using the Users and the Amount.