DATA 612 Final Project | Donor matching recommender systems

Introduction

In 2000, Charles Best, a teacher at a Bronx public high school, wanted his students to read Little House on the Prairie. As he was making photocopies of the one book he could procure, Charles thought about all the money he and his colleagues were spending on books, art supplies, and other materials. And he figured there were people out there who’d want to help — if they could see where their money was going. Charles sketched out a website where teachers could post classroom project requests, and donors could choose the ones they wanted to support. His colleagues posted the first 11 requests. Then it spread. Today, they’re open to every public school in America.

Objective

The objective of Donors recommender system is to recommend relevant items for users, based on their preference. Preference and relevance are subjective, and they are generally inferred by items users have consumed previously.

DonorsChoose.org has funded over 1.1 million classroom requests through the support of 3 million donors, the majority of whom were making their first-ever donation to a public school. If DonorsChoose.org can motivate even a fraction of those donors to make another donation, that could have a huge impact on the number of classroom requests fulfilled.

A good solution will enable DonorsChoose.org to build targeted email campaigns recommending specific classroom requests to prior donors.

Data Description

Connect with a local spark instance

The returned Spark connection (sc) provides a remote dplyr data source to the Spark cluster.

Load the donors dataset

The dataset is chosen from the donors choose dataset DonorsChoose dataset

A combined dataset will be prepared with the projects, donations, donors, schools and teachers datasets.

rm(list = ls())

fillColor = "#FFA07A"
fillColor2 = "#F1C40F"
fillColorLightCoral = "#F08080"


donations <- as.tibble(fread("/Users/priyashaji/Documents/cuny msds/Summer'19/data 612/Final Project/Datasets/Donations.csv"))

donors <- as.tibble(fread("/Users/priyashaji/Documents/cuny msds/Summer'19/data 612/Final Project/Datasets/Donors.csv"))

projects <- read_csv("/Users/priyashaji/Documents/cuny msds/Summer'19/data 612/Final Project/Datasets/Projects.csv", 
    col_types = cols(X1 = col_integer(), `Project ID` = col_character(), `School ID` = col_character(), 
        `Teacher ID` = col_character(), `Teacher Project Posted Sequence` = col_integer(), 
        `Project Type` = col_character(), `Project Title` = col_character(), 
        `Project Essay` = col_character(), `Project Subject Category Tree` = col_character(), 
        `Project Subject Subcategory Tree` = col_character(), `Project Grade Level Category` = col_character(), 
        `Project Resource Category` = col_character(), `Project Cost` = col_character(), 
        `Project Posted Date` = col_date(format = ""), `Project Current Status` = col_character(), 
        `Project Fully Funded Date` = col_date(format = "")))

schools <- read_csv("/Users/priyashaji/Documents/cuny msds/Summer'19/data 612/Final Project/Datasets/Schools.csv")

teachers <- read_csv("/Users/priyashaji/Documents/cuny msds/Summer'19/data 612/Final Project/Datasets/Teachers.csv")

projects <- projects %>% rename(ProjectType = `Project Type`) %>% rename(Category = `Project Subject Category Tree`) %>% 
    rename(SubCategory = `Project Subject Subcategory Tree`) %>% rename(Grade = `Project Grade Level Category`) %>% 
    rename(ResourceCategory = `Project Resource Category`) %>% rename(Cost = `Project Cost`) %>% 
    rename(PostedDate = `Project Posted Date`) %>% rename(CurrentStatus = `Project Current Status`) %>% 
    rename(FullyFundedDate = `Project Fully Funded Date`)

donations <- donations %>% rename(DonationAmount = `Donation Amount`)

donors <- donors %>% rename(DonorState = `Donor State`)

projects <- projects %>% rename(ProjectTitle = `Project Title`)

Copying the datasets to spark instance

Glimpse of data

Donations

Observations: 4,687,884
Variables: 7
$ `Project ID`                          <chr> "000009891526c0ade7180f842…
$ `Donation ID`                         <chr> "688729120858666221208529e…
$ `Donor ID`                            <chr> "1f4b5b6e68445c6c4a0509b3a…
$ `Donation Included Optional Donation` <chr> "No", "Yes", "Yes", "Yes",…
$ DonationAmount                        <dbl> 178.37, 25.00, 20.00, 25.0…
$ `Donor Cart Sequence`                 <int> 11, 2, 3, 1, 2, 1, 1, 2, 2…
$ `Donation Received Date`              <chr> "2016-08-23 13:15:57", "20…

Donors

Observations: 2,122,640
Variables: 5
$ `Donor ID`         <chr> "00000ce845c00cbf0686c992fc369df4", "00002783…
$ `Donor City`       <chr> "Evanston", "Appomattox", "Winton", "Indianap…
$ DonorState         <chr> "Illinois", "other", "California", "Indiana",…
$ `Donor Is Teacher` <chr> "No", "No", "Yes", "No", "No", "No", "No", "N…
$ `Donor Zip`        <chr> "602", "245", "953", "462", "075", "", "069",…

Project

Observations: 62,806
Variables: 66
$ `Project ID`                      <chr> "7685f0265a19d7b52a470ee4bac88…
$ `School ID`                       <chr> "e180c7424cb9c68cb49f141b092a9…
$ `Teacher ID`                      <chr> "4ee5200e89d9e2998ec8baad8a3c5…
$ `Teacher Project Posted Sequence` <int> 25, 3, 1, 2, NA, NA, 3, 57, 14…
$ ProjectType                       <chr> "Teacher-Led", "Teacher-Led", …
$ ProjectTitle                      <chr> "Stand Up to Bullying: Togethe…
$ `Project Essay`                   <chr> "Did you know that 1-7 student…
$ `Project Short Description`       <chr> "\"Stand Up For Yourself and Y…
$ `Project Need Statement`          <chr> "\"A Smart Kid's Guide to Onli…
$ Category                          <chr> "Applied Learning", "Applied L…
$ SubCategory                       <chr> "Character Education, Early De…
$ Grade                             <chr> "Grades PreK-2", "Grades PreK-…
$ ResourceCategory                  <chr> "Technology", "Technology", "a…
$ Cost                              <chr> "361.8", "512.85", "where they…
$ PostedDate                        <date> 2013-01-01, 2013-01-01, NA, N…
$ `Project Expiration Date`         <chr> "2013-05-30", "2013-05-31", "T…
$ CurrentStatus                     <chr> "Fully Funded", "Expired", "ar…
$ FullyFundedDate                   <date> 2013-01-11, NA, NA, NA, NA, N…
$ X19                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X20                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X21                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X22                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X23                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X24                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X25                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X26                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X27                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X28                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X29                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X30                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X31                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X32                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X33                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X34                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X35                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X36                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X37                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X38                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X39                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X40                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X41                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X42                               <dbl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X43                               <date> NA, NA, NA, NA, NA, NA, NA, N…
$ X44                               <chr> NA, NA, NA, NA, NA, NA, NA, NA…
$ X45                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X46                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X47                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X48                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X49                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X50                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X51                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X52                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X53                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X54                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X55                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X56                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X57                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X58                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X59                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X60                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X61                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X62                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X63                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X64                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X65                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
$ X66                               <lgl> NA, NA, NA, NA, NA, NA, NA, NA…
Observations: 62,806
Variables: 66
$ Project.ID                      <chr> "7685f0265a19d7b52a470ee4bac883b…
$ School.ID                       <chr> "e180c7424cb9c68cb49f141b092a988…
$ Teacher.ID                      <chr> "4ee5200e89d9e2998ec8baad8a3c596…
$ Teacher.Project.Posted.Sequence <chr> "25", "3", "1", "2", NA, NA, "3"…
$ ProjectType                     <chr> "Teacher-Led", "Teacher-Led", "T…
$ ProjectTitle                    <chr> "Stand Up to Bullying: Together …
$ Project.Essay                   <chr> "Did you know that 1-7 students …
$ Project.Short.Description       <chr> "\"Stand Up For Yourself and You…
$ Project.Need.Statement          <chr> "\"A Smart Kid's Guide to Online…
$ Category                        <chr> "Applied Learning", "Applied Lea…
$ SubCategory                     <chr> "Character Education, Early Deve…
$ Grade                           <chr> "Grades PreK-2", "Grades PreK-2"…
$ ResourceCategory                <chr> "Technology", "Technology", "and…
$ Cost                            <chr> "361.8", "512.85", "where they w…
$ PostedDate                      <chr> "2013-01-01", "2013-01-01", NA, …
$ Project.Expiration.Date         <chr> "2013-05-30", "2013-05-31", "The…
$ CurrentStatus                   <chr> "Fully Funded", "Expired", "are …
$ FullyFundedDate                 <chr> "2013-01-11", NA, NA, NA, NA, NA…
$ X19                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X20                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X21                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X22                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X23                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X24                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X25                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X26                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X27                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X28                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X29                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X30                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X31                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X32                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X33                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X34                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X35                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X36                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X37                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X38                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X39                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X40                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X41                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X42                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X43                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X44                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X45                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X46                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X47                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X48                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X49                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X50                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X51                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X52                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X53                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X54                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X55                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X56                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X57                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X58                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X59                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X60                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X61                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X62                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X63                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X64                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X65                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …
$ X66                             <chr> NA, NA, NA, NA, NA, NA, NA, NA, …

Schools

Observations: 72,993
Variables: 9
$ `School ID`                    <chr> "00003e0fdd601b8ea0a6eb44057b9c5e…
$ `School Name`                  <chr> "Capon Bridge Middle School", "Th…
$ `School Metro Type`            <chr> "rural", "urban", "suburban", "un…
$ `School Percentage Free Lunch` <dbl> 56, 41, 2, 76, 50, 63, 17, 15, 46…
$ `School State`                 <chr> "West Virginia", "Texas", "Washin…
$ `School Zip`                   <chr> "26711", "77384", "98074", "48370…
$ `School City`                  <chr> "Capon Bridge", "The Woodlands", …
$ `School County`                <chr> "Hampshire", "Montgomery", "King"…
$ `School District`              <chr> "Hampshire Co School District", "…

Teachers

Observations: 402,900
Variables: 3
$ `Teacher ID`                        <chr> "00000f7264c27ba6fea0c837ed6…
$ `Teacher Prefix`                    <chr> "Mrs.", "Mrs.", "Mr.", "Ms."…
$ `Teacher First Project Posted Date` <date> 2013-08-21, 2016-10-23, 201…

Combining 5 datasets

A combined dataset is prepared with the projects, donations, donors, schools and teachers datasets.

Observations: 17,917
Variables: 86
$ `Project ID`                          <chr> "7685f0265a19d7b52a470ee4b…
$ `School ID`                           <chr> "e180c7424cb9c68cb49f141b0…
$ `Teacher ID`                          <chr> "4ee5200e89d9e2998ec8baad8…
$ `Teacher Project Posted Sequence`     <int> 25, 25, 25, 25, 25, 25, 25…
$ ProjectType                           <chr> "Teacher-Led", "Teacher-Le…
$ ProjectTitle                          <chr> "Stand Up to Bullying: Tog…
$ `Project Essay`                       <chr> "Did you know that 1-7 stu…
$ `Project Short Description`           <chr> "\"Stand Up For Yourself a…
$ `Project Need Statement`              <chr> "\"A Smart Kid's Guide to …
$ Category                              <chr> "Applied Learning", "Appli…
$ SubCategory                           <chr> "Character Education, Earl…
$ Grade                                 <chr> "Grades PreK-2", "Grades P…
$ ResourceCategory                      <chr> "Technology", "Technology"…
$ Cost                                  <chr> "361.8", "361.8", "361.8",…
$ PostedDate                            <date> 2013-01-01, 2013-01-01, 2…
$ `Project Expiration Date`             <chr> "2013-05-30", "2013-05-30"…
$ CurrentStatus                         <chr> "Fully Funded", "Fully Fun…
$ FullyFundedDate                       <date> 2013-01-11, 2013-01-11, 2…
$ X19                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X20                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X21                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X22                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X23                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X24                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X25                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X26                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X27                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X28                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X29                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X30                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X31                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X32                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X33                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X34                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X35                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X36                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X37                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X38                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X39                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X40                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X41                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X42                                   <dbl> NA, NA, NA, NA, NA, NA, NA…
$ X43                                   <date> NA, NA, NA, NA, NA, NA, N…
$ X44                                   <chr> NA, NA, NA, NA, NA, NA, NA…
$ X45                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X46                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X47                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X48                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X49                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X50                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X51                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X52                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X53                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X54                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X55                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X56                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X57                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X58                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X59                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X60                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X61                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X62                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X63                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X64                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X65                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ X66                                   <lgl> NA, NA, NA, NA, NA, NA, NA…
$ `Donation ID`                         <chr> "fd43856fcca94c699bc4ed0dc…
$ `Donor ID`                            <chr> "ec3ebabc792625d381bcd3d91…
$ `Donation Included Optional Donation` <chr> "Yes", "Yes", "Yes", "Yes"…
$ DonationAmount                        <dbl> 25.0, 10.0, 1.0, 1.0, 1.0,…
$ `Donor Cart Sequence`                 <int> 2, 26, 2, 1, 1, 1234, 11, …
$ `Donation Received Date`              <chr> "2013-01-10 18:41:11", "20…
$ `Donor City`                          <chr> "", "Los Angeles", "", "",…
$ DonorState                            <chr> "other", "California", "ot…
$ `Donor Is Teacher`                    <chr> "No", "Yes", "No", "No", "…
$ `Donor Zip`                           <chr> "", "900", "", "", "", "",…
$ `Teacher Prefix`                      <chr> "Mrs.", "Mrs.", "Mrs.", "M…
$ `Teacher First Project Posted Date`   <date> 2011-12-11, 2011-12-11, 2…
$ `School Name`                         <chr> "Stanford Primary Center",…
$ `School Metro Type`                   <chr> "suburban", "suburban", "s…
$ `School Percentage Free Lunch`        <dbl> 95, 95, 95, 95, 95, 95, 95…
$ `School State`                        <chr> "California", "California"…
$ `School Zip`                          <chr> "90280", "90280", "90280",…
$ `School City`                         <chr> "South Gate", "South Gate"…
$ `School County`                       <chr> "Los Angeles", "Los Angele…
$ `School District`                     <chr> "Los Angeles Unif Sch Dist…

Preparing the Non-Text features

In the combined dataset the non-textual features are

Category,SubCategory,Grade,ResourceCategory,SchoolState, TeacherPrefix features.

Cosine Similarity between two donors

The cosine similiarity helps to determine whether the items requested is similar or not.

It is thus a judgment of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Here we want to compare the projects how similiar they are to each other. If they are similiar, then the donor is Recommended the similiar project.

Recommendations using Text Features

Let us use the Project Title and Projects Essays to find similiarities between projects using the TFIDF concept and cosine similiarity

Create a vocabulary-based DTM. Here we collect unique terms from all documents and mark each of them with a unique ID using the create_vocabulary() function. An iterator was used to create the vocabulary. Vocabulary is also pruned to reduce the terms in the matrix.

The greater is the value of the cosine, the greater is the similiarity between the projects. The similiar projects are the candidates for recommendation.

Prune the vocabulary to eliminate the unnecessary words.

Number of docs: 17917 
175 stopwords: i, me, my, myself, we, our ... 
ngram_min = 1; ngram_max = 3 
Vocabulary: 
                      term term_count doc_count
  1:                  need      11118      8245
  2:                   can       9647      6469
  3:                school       8884      5976
  4:                  help       8300      5776
  5:              learning       7427      5429
 ---                                           
970: technology_technology        182       182
971:          students_don        181       181
972:                   hav        181       181
973:              hispanic        181       181
974:              grades_6        180       180

Create the Document Term Matrix(DTM)

Create the Document Term Matrix.

TF-IDF

Let’s do a TF-IDF to increase the weight of terms which are specific to a single document or handful of documents and decrease the weight for terms used in most documents

Cosine similiarity between project 1 and Project 1 using Text Features only

Here I choose an example to compare similiarities using Text Features only between projects 1 and 1

[1] 1

Cosine similiarity between project 1 and Project 2

Here I choose an example to compare similiarities using Text Features only between projects 1 and 2

[1] 1

Cosine similiarity between project 1 and Project 50

Here I choose an example to compare similiarities using Text Features only between projects 1 and 50

[1] 0.008048809

Cosine similiarity between project 1 and Project 100

Here I choose an example to compare similiarities using Text Features only between projects 1 and 100

[1] 0.03398923

Cosine similiarity of projects of the same donor

Let’s find the projects associated with the same donor. In the following code section, we find Sample 1 , 2 and 3 of the same donors.

Same Donors Sample 1

Choose Sample 1 Same Donors

Donor ID Project ID ProjectTitle Category SubCategory Grade ResourceCategory SchoolState TeacherPrefix
00589b11dd54d68b0950555262ed8475 1abc7b1f997d4299c97f5567b62dab58 Leaping Lizards! Learning Literacy Is Child’s Play NA NA NA NA Oklahoma Mrs.
Donor ID Project ID ProjectTitle Category SubCategory Grade ResourceCategory SchoolState TeacherPrefix
00589b11dd54d68b0950555262ed8475 cb7a036b7e6c3f1faa0c90b667dfb897 Improving Agriculture Literacy via a School Farm Math & Science, Applied Learning Applied Sciences, Community Service Grades 9-12 Supplies Massachusetts Mrs.

Cosine Similiarity Same Donors Sample 1

Calculation of Cosine similiarity of projects belonging to same donor Sample 1

[1] 0.02384247

Same Donors Sample 2

Choose Sample 2 Same Donors

Donor ID Project ID ProjectTitle Category SubCategory Grade ResourceCategory SchoolState TeacherPrefix
00906dbd8da115318175861ccfe1dcd8 b81056d684753054e332b3f185576c64 “Center”ing in Math Mathematics Grades PreK-2 Other 388.14 Texas Ms.
Donor ID Project ID ProjectTitle Category SubCategory Grade ResourceCategory SchoolState TeacherPrefix
00906dbd8da115318175861ccfe1dcd8 2d7f1f43ad03503f85c47f024774ec33 Interactive Spanish Centers For Our Rising Stars some of my students are the only English speakers my students will finally be able to reap the full NA NA Texas Ms.

Cosine Similiarity Same Donors Sample 2

Calculation of Cosine similiarity of projects belonging to same donor Sample 2

[1] 0.005716124

Same Donors Sample 3

Choose Sample 3 Same Donors

Donor ID Project ID ProjectTitle Category SubCategory Grade ResourceCategory SchoolState TeacherPrefix
009a9a98cb675dc9eed258d5ad2bfa77 471b790b9d9e266d76de3d364197e191 Stop Sharing Sniffles And Sneezes much like an overhead projector. A new projector base ten blocks or science experiment they need to be able to see the details and proce New York Mrs.
Donor ID Project ID ProjectTitle Category SubCategory Grade ResourceCategory SchoolState TeacherPrefix
009a9a98cb675dc9eed258d5ad2bfa77 471b790b9d9e266d76de3d364197e191 Stop Sharing Sniffles And Sneezes much like an overhead projector. A new projector base ten blocks or science experiment they need to be able to see the details and proce New York Mrs.

Cosine Similiarity Same Donors Sample 3

Calculation of Cosine similiarity of projects belonging to same donor Sample 3

[1] 1

Recommendations using Non Text Features only

The Non Text features include Category,SubCategory,Grade,ResourceCategory,SchoolState, TeacherPrefix

Cosine similiarity between project 1 and Project 2

Here I choose an example to compare similiarities using Non Text Features between projects 1 and 2

[1] 1

Cosine similiarity between project 1 and Project 50

Here I choose an example to compare similiarities using Non Text Features between projects 1 and 50

[1] 0.3380617

Cosine similiarity between project 1 and Project 100

Here I choose an example to compare similiarities using Non Text Features between projects 1 and 100

[1] 0.2857143

Cosine Similiarity of projects of same donor Sample 1

[1] 0.4364358

Cosine Similiarity of projects of same donor Sample 2

[1] 0.5070926

Donor 1 Projects

Donor ID Project ID ProjectTitle Category SubCategory Grade ResourceCategory SchoolState TeacherPrefix
ec3ebabc792625d381bcd3d911c72383 7685f0265a19d7b52a470ee4bac883ba Stand Up to Bullying: Together We Can! Applied Learning Character Education, Early Development Grades PreK-2 Technology California Mrs.

This lists down the Top 10 Recommended Projects for Donor 1

cosine_vector ProjectID Category SubCategory Grade ResourceCategory SchoolState TeacherPrefix
1 0.654653670707977 c265d5c4b7ef2059e4e84ddcef18bdd9 NA NA NA NA California Mrs.
4 0.571428571428571 f9f4af7099061fb4bf44642a03e5c331 Applied Learning, Literacy & Language Early Development, Literacy Grades PreK-2 Technology Georgia Mrs.
6 0.338061701891407 ec82a697fab916c0db0cdad746338df9 My students need items such as Velcro, two pounds
56395 8074d7b12b48b939279e b59e6ca,b79a19772090efccde93b3a5934 d829f,5ef1793ff657860ca7856d475715ec2a,4,Teacher-Led,It’s about Time… Time for Kids!,We know that success in school is directly relate more of their education is tied to textbooks and NA NA Colorad o Mrs.
12 0.308606699924184 717c7a01215d532d68f6fe9e666c88c3 Applied Learning College & Career Prep, Community Service Grades NA New Jersey Ms.
34 0.285714285714286 afd99a01739ad5557b51b1ba0174e832 and suspense. Stude the students at our school are very resilient and but they still need support from donors like you. and studies show that reading at home is an impor New York Mrs.
46 0.285714285714286 49825532f85d0cdb569797df3ab8ec46 Supplies 339.2 2013-01-01 2013-05-29 New Mexico Mrs.
51 0.285714285714286 3dfcaad759c25fcce140814eb8a47592 many do not have educationally enriching experien hands-on activities to help fill these gaps. My f addition and basic number relationships. The white boards California Ms.
52 0.285714285714286 49409b4858006bbfba35c36338e10ee7 History & Civics, Math & Science Economics, Environmental Science Grades 3-5 Supplies Texas Mrs.
53 0.285714285714286 14c19c45cbb73319b6b506791c659d9d with the majority of our students receiving free which makes it a challenge to differentiate and m so it is necessary to accommodate their different several children will record themselves reading a Massachusetts Mrs.
54 0.285714285714286 70d2edb28d3f75b283cc513a9a24eb8b at a desk for 7 hours a day? ) Thailand Nepal Colorado Mrs.

Recommendations using Text and Non Text Features

Cosine similiarity between project 1 and Project 2

Here I choose an example to compare similiarities using Text and Non Text Features between projects 1 and 2

[1] 1

Cosine similiarity between project 1 and Project 50

Here I choose an example to compare similiarities using Text and Non Text Features between projects 1 and 50

[1] 0.1426086

Cosine similiarity between project 1 and Project 100

Here I choose an example to compare similiarities using Text and Non Text Features between projects 1 and 100

[1] 0.1374252

Cosine Similiarity of projects of same donor Sample 1

[1] 0.1638579

Cosine Similiarity of projects of same donor Sample 2

[1] 0.2173009

Cosine Similiarity of projects of same donor Sample 3

[1] 1

Donor 1 Projects

Donor ID Project ID ProjectTitle Category SubCategory Grade ResourceCategory SchoolState TeacherPrefix
ec3ebabc792625d381bcd3d911c72383 7685f0265a19d7b52a470ee4bac883ba Stand Up to Bullying: Together We Can! Applied Learning Character Education, Early Development Grades PreK-2 Technology California Mrs.

This lists down the Top 10 Recommended Projects for Donor 1

cosine_vector ProjectID Category SubCategory Grade ResourceCategory SchoolState TeacherPrefix
1 0.252287883049293 f9f4af7099061fb4bf44642a03e5c331 Applied Learning, Literacy & Language Early Development, Literacy Grades PreK-2 Technology Georgia Mrs.
3 0.214650050348922 c265d5c4b7ef2059e4e84ddcef18bdd9 NA NA NA NA California Mrs.
6 0.163704722465574 afd99a01739ad5557b51b1ba0174e832 and suspense. Stude the students at our school are very resilient and but they still need support from donors like you. and studies show that reading at home is an impor New York Mrs.
18 0.161534585915996 14c19c45cbb73319b6b506791c659d9d with the majority of our students receiving free which makes it a challenge to differentiate and m so it is necessary to accommodate their different several children will record themselves reading a Massachusetts Mrs.
19 0.159438753518438 49825532f85d0cdb569797df3ab8ec46 Supplies 339.2 2013-01-01 2013-05-29 New Mexico Mrs.
24 0.152412074549055 70d2edb28d3f75b283cc513a9a24eb8b at a desk for 7 hours a day? ) Thailand Nepal Colorado Mrs.
30 0.142608553663939 ec82a697fab916c0db0cdad746338df9 My students need items such as Velcro, two pounds
56395 8074d7b12b48b939279e b59e6ca,b79a19772090efccde93b3a5934 d829f,5ef1793ff657860ca7856d475715ec2a,4,Teacher-Led,It’s about Time… Time for Kids!,We know that success in school is directly relate more of their education is tied to textbooks and NA NA Colorad o Mrs.
36 0.139151094789132 717c7a01215d532d68f6fe9e666c88c3 Applied Learning College & Career Prep, Community Service Grades NA New Jersey Ms.
58 0.137878789522907 3dfcaad759c25fcce140814eb8a47592 many do not have educationally enriching experien hands-on activities to help fill these gaps. My f addition and basic number relationships. The white boards California Ms.
59 0.137425214539727 d3fc101ea24e26443dbfe9bc7560d08c Math & Science, Literacy & Language Environmental Science, Literacy Grades PreK-2 Supplies North Carolina Ms.

Conclusion

I did the Item Item Recommendations using

Text Features Only

Text and Non Text Features

Non Text Features only .

The Non Text features include Category,SubCategory,Grade,ResourceCategory,SchoolState, TeacherPrefix

The accuracy is highest for the recommendations using the Non Text Features only.

Donor ID Project ID matrix

Let’s create a Dataframe with Donor ID, Project ID and the Donation Amount. I will do log transformation of the Donation Amount.

Observations: 8,585
Variables: 3
$ DonorID   <chr> "0003aba06ccf49f8c44fc2dd3b582411", "000f7306e8ddb3629…
$ ProjectID <chr> "0081553d51ed5d2529e2e38b0827133a", "007e2a1a47ce50ded…
$ Rating    <dbl> 3.931826, 3.258097, 3.258097, 2.140066, 3.931826, 2.82…

Restructuring the matrix

The matrix is created with Donor Names along the rows and Project IDS along the columns.

                                  ProjectID
DonorID                            000009891526c0ade7180f8423792063
  0003aba06ccf49f8c44fc2dd3b582411                               NA
  000f7306e8ddb36296f0d97a34d67d76                               NA
  00125f251b05d9e447a5448bef981028                               NA
  0013dfb2a873420fe6e7d750ef24ce98                               NA
  0016b23800f7ea46424b3254f016007a                               NA
                                  ProjectID
DonorID                            00000ce845c00cbf0686c992fc369df4
  0003aba06ccf49f8c44fc2dd3b582411                               NA
  000f7306e8ddb36296f0d97a34d67d76                               NA
  00125f251b05d9e447a5448bef981028                               NA
  0013dfb2a873420fe6e7d750ef24ce98                               NA
  0016b23800f7ea46424b3254f016007a                               NA
                                  ProjectID
DonorID                            00002d44003ed46b066607c5455a999a
  0003aba06ccf49f8c44fc2dd3b582411                               NA
  000f7306e8ddb36296f0d97a34d67d76                               NA
  00125f251b05d9e447a5448bef981028                               NA
  0013dfb2a873420fe6e7d750ef24ce98                               NA
  0016b23800f7ea46424b3254f016007a                               NA
                                  ProjectID
DonorID                            00002eb25d60a09c318efbd0797bffb5
  0003aba06ccf49f8c44fc2dd3b582411                               NA
  000f7306e8ddb36296f0d97a34d67d76                               NA
  00125f251b05d9e447a5448bef981028                               NA
  0013dfb2a873420fe6e7d750ef24ce98                               NA
  0016b23800f7ea46424b3254f016007a                               NA
                                  ProjectID
DonorID                            00008f7aaca8ab932c1bc1d0bc449186
  0003aba06ccf49f8c44fc2dd3b582411                               NA
  000f7306e8ddb36296f0d97a34d67d76                               NA
  00125f251b05d9e447a5448bef981028                               NA
  0013dfb2a873420fe6e7d750ef24ce98                               NA
  0016b23800f7ea46424b3254f016007a                               NA

Using SVD

Donor Project Matrix dimensions

The NA values are replaced by zero since SVD does not work with NA values.

[1] 7926 1468

Dimensionality reduction using SVD

The first 20 factors are taken into consideration

[1] 7926 1468
                                  ProjectID
DonorID                            000009891526c0ade7180f8423792063
  0003aba06ccf49f8c44fc2dd3b582411                                0
  000f7306e8ddb36296f0d97a34d67d76                                0
  00125f251b05d9e447a5448bef981028                                0
  0013dfb2a873420fe6e7d750ef24ce98                                0
  0016b23800f7ea46424b3254f016007a                                0
  00309a47b765e12714d817ee3215de1e                                0
  0036448e416b71ab040182c428958b6f                                0
                                  ProjectID
DonorID                            00000ce845c00cbf0686c992fc369df4
  0003aba06ccf49f8c44fc2dd3b582411                                0
  000f7306e8ddb36296f0d97a34d67d76                                0
  00125f251b05d9e447a5448bef981028                                0
  0013dfb2a873420fe6e7d750ef24ce98                                0
  0016b23800f7ea46424b3254f016007a                                0
  00309a47b765e12714d817ee3215de1e                                0
  0036448e416b71ab040182c428958b6f                                0
                                  ProjectID
DonorID                            00002d44003ed46b066607c5455a999a
  0003aba06ccf49f8c44fc2dd3b582411                                0
  000f7306e8ddb36296f0d97a34d67d76                                0
  00125f251b05d9e447a5448bef981028                                0
  0013dfb2a873420fe6e7d750ef24ce98                                0
  0016b23800f7ea46424b3254f016007a                                0
  00309a47b765e12714d817ee3215de1e                                0
  0036448e416b71ab040182c428958b6f                                0
                                  ProjectID
DonorID                            00002eb25d60a09c318efbd0797bffb5
  0003aba06ccf49f8c44fc2dd3b582411                                0
  000f7306e8ddb36296f0d97a34d67d76                                0
  00125f251b05d9e447a5448bef981028                                0
  0013dfb2a873420fe6e7d750ef24ce98                                0
  0016b23800f7ea46424b3254f016007a                                0
  00309a47b765e12714d817ee3215de1e                                0
  0036448e416b71ab040182c428958b6f                                0
                                  ProjectID
DonorID                            00008f7aaca8ab932c1bc1d0bc449186
  0003aba06ccf49f8c44fc2dd3b582411                                0
  000f7306e8ddb36296f0d97a34d67d76                                0
  00125f251b05d9e447a5448bef981028                                0
  0013dfb2a873420fe6e7d750ef24ce98                                0
  0016b23800f7ea46424b3254f016007a                                0
  00309a47b765e12714d817ee3215de1e                                0
  0036448e416b71ab040182c428958b6f                                0
                                  ProjectID
DonorID                            0000bbd74feb563a324fe441eae19feb
  0003aba06ccf49f8c44fc2dd3b582411                                0
  000f7306e8ddb36296f0d97a34d67d76                                0
  00125f251b05d9e447a5448bef981028                                0
  0013dfb2a873420fe6e7d750ef24ce98                                0
  0016b23800f7ea46424b3254f016007a                                0
  00309a47b765e12714d817ee3215de1e                                0
  0036448e416b71ab040182c428958b6f                                0
                                  ProjectID
DonorID                            0000be4b3c81e1cef858d536bb740052
  0003aba06ccf49f8c44fc2dd3b582411                                0
  000f7306e8ddb36296f0d97a34d67d76                                0
  00125f251b05d9e447a5448bef981028                                0
  0013dfb2a873420fe6e7d750ef24ce98                                0
  0016b23800f7ea46424b3254f016007a                                0
  00309a47b765e12714d817ee3215de1e                                0
  0036448e416b71ab040182c428958b6f                                0
                                  ProjectID
DonorID                            0000c0bdc0f15bd239cfffa884791a10
  0003aba06ccf49f8c44fc2dd3b582411                    -7.816866e-22
  000f7306e8ddb36296f0d97a34d67d76                    -1.780863e-22
  00125f251b05d9e447a5448bef981028                     3.182379e-22
  0013dfb2a873420fe6e7d750ef24ce98                     9.003258e-22
  0016b23800f7ea46424b3254f016007a                     2.249276e-20
  00309a47b765e12714d817ee3215de1e                     9.335889e-22
  0036448e416b71ab040182c428958b6f                     1.606395e-04
                                  ProjectID
DonorID                            0000c0ea0aecb2ad60e8d234eab6ed28
  0003aba06ccf49f8c44fc2dd3b582411                     3.024556e-36
  000f7306e8ddb36296f0d97a34d67d76                    -1.143914e-36
  00125f251b05d9e447a5448bef981028                     4.303249e-36
  0013dfb2a873420fe6e7d750ef24ce98                    -1.955623e-35
  0016b23800f7ea46424b3254f016007a                    -3.122233e-34
  00309a47b765e12714d817ee3215de1e                     5.490349e-31
  0036448e416b71ab040182c428958b6f                    -1.288485e-18
                                  ProjectID
DonorID                            0000cce04fec25bf7f21b0e2f1dcf4b6
  0003aba06ccf49f8c44fc2dd3b582411                     7.809999e-35
  000f7306e8ddb36296f0d97a34d67d76                     1.727252e-35
  00125f251b05d9e447a5448bef981028                    -3.005634e-35
  0013dfb2a873420fe6e7d750ef24ce98                    -1.170701e-34
  0016b23800f7ea46424b3254f016007a                    -3.980843e-33
  00309a47b765e12714d817ee3215de1e                     3.005358e-31
  0036448e416b71ab040182c428958b6f                    -1.624449e-17
 [ reached getOption("max.print") -- omitted 3 rows ]

Recommendations for Donor 3

Donor 3 Projects

Now according to the prior projects chosen by donor 3, the recommender system would recommend a project for the 3rd donor.

Summary

We did the Item Item Recommendations using

Text Features Only using the Project Title and Projects Essays The Non Text features include Category,SubCategory,Grade,ResourceCategory,SchoolState, TeacherPrefix * Text and Non Text Features.

Non Text Features only

The accuracy is highest for the recommendations using the Non Text Features only.

We also did the User User Recommnedations using the Users and the Amount.

2019-07-19