Abstract

Programming Assignment: Frequent Itemset Mining Using Apriori

You have not submitted. You must earn 70/100 points to pass. Deadline Pass this assignment by September 4, 11:59 PM PDT

1. Instructions

Section 1.1 Input
Section 1.2 Output
Section 1.3 Important Tips

Description

In this programming assignment, you are required to implement the Apriori algorithm and apply it to mine frequent itemsets from a real-life data set.

1.1 Input

The provided input file (“categories.txt”) consists of the category lists of 77,185 places in the US. Each line corresponds to the category list of one place, where the list consists of a number of category instances (e.g., hotels, restaurants, etc.) that are separated by semicolons.

An example line is provided below:

Local Services; IT Services & Computer Repair

In the example above, the corresponding place has two category instances: “Local Services” and “IT Services & Computer Repair”.

categories.txt

1.2 Output

You need to implement the Apriori algorithm and use it to mine category sets that are frequent in the input data. When implementing the Apriori algorithm, you may use any programming language you like. We only need your result pattern file, not your source code file.

After implementing the Apriori algorithm, please set the relative minimum support to 0.01 and run it on the 77,185 category lists. In other words, you need to extract all the category sets that have an absolute support no smaller than 771.

Part 1

Please output all the length-1 frequent categories with their absolute supports into a text file named “patterns.txt”. Every line corresponds to exactly one frequent category and should be in the following format:

support:category

For example, suppose a category (Fast Food) has an absolute support 3000, then the line corresponding to this frequent category set in “patterns.txt” should be:

3000:Fast Food

Part 2

Please write all the frequent category sets along with their absolute supports into a text file named “patterns.txt”. Every line corresponds to exactly one frequent category set and should be in the following format:

support:category_1,category_2,category_3,...

For example, suppose a category set (Fast Food; Restaurants) has an absolute support 2851, then the line corresponding to this frequent category set in “patterns.txt” should be:

2851:Fast Food;Restaurants

1.3 Important Tips

Make sure that you format each line correctly in the output file. For instance, use a semicolon instead of another character to separate the categories for each frequent category set.

In the result pattern file, the order of the categories does not matter. For example, the following two cases will be considered equivalent by the grader:

Case 1:

2851:Fast Food;Restaurants

Case 2:

2851:Restaurants;Fast Food

2. My submission

Section 2.1 Frequent Single Item Mining
Section 2.2 Frequent Itemset Mining using Apriori

Upload Files and Submit

To upload a file, click the part below. Then, submit the files. You can submit as many times as you like. You do not need to upload all parts in order to submit.

Frequent Single Item Mining (30 points)
Frequent Itemset Mining using Apriori (70 points)

2.1 Frequent Single Item Mining

graph 2.1.1a : explore the top 20 items in the dataset.

graph 2.1.1b : explore the top 20 items in the dataset.

inspect(transDat[1:10]) # view the observations

##    items                                    transactionID
## 1  {Fashion}                          Accessories        
## 2  {Professional Services}            Accountants        
## 3  {Active Life,                                         
##     Amateur Sports Teams,                                
##     American (New),                                      
##     American (Traditional),                              
##     Amusement Parks,                                     
##     Aquariums,                                           
##     Arcades,                                             
##     Archery,                                             
##     Arts & Entertainment,                                
##     ATV Rentals/Tours,                                   
##     Automotive,                                          
##     Barre Classes,                                       
##     Bars,                                                
##     Batting Cages,                                       
##     Beaches,                                             
##     Beauty & Spas,                                       
##     Beer, Wine & Spirits,                                
##     Bike Rentals,                                        
##     Boating,                                             
##     Boot Camps,                                          
##     Bowling,                                             
##     Boxing,                                              
##     Cafes,                                               
##     Car Wash,                                            
##     Challenge Courses,                                   
##     Churches,                                            
##     Climbing,                                            
##     Colleges & Universities,                             
##     Counseling & Mental Health,                          
##     Country Clubs,                                       
##     Cycling Classes,                                     
##     Dance Studios,                                       
##     Day Camps,                                           
##     Day Spas,                                            
##     Department Stores,                                   
##     Disc Golf,                                           
##     Diving,                                              
##     Dog Parks,                                           
##     Education,                                           
##     Escape Games,                                        
##     Event Planning & Services,                           
##     Fast Food,                                           
##     Fire Departments,                                    
##     Fitness & Instruction,                               
##     Food,                                                
##     Go Karts,                                            
##     Golf,                                                
##     Golf Lessons,                                        
##     Gun/Rifle Ranges,                                    
##     Gymnastics,                                          
##     Gyms,                                                
##     Hair Salons,                                         
##     Health & Medical,                                    
##     Hiking,                                              
##     Horse Boarding,                                      
##     Horse Racing,                                        
##     Horseback Riding,                                    
##     Hot Air Balloons,                                    
##     Hotels & Travel,                                     
##     Italian,                                             
##     Kids Activities,                                     
##     Lakes,                                               
##     Landmarks & Historical Buildings,                    
##     Landscaping,                                         
##     Laser Tag,                                           
##     Leisure Centers,                                     
##     Local Flavor,                                        
##     Local Services,                                      
##     Martial Arts,                                        
##     Massage Therapy,                                     
##     Mini Golf,                                           
##     Mountain Biking,                                     
##     Museums,                                             
##     Music & DVDs,                                        
##     Nightlife,                                           
##     Nutritionists,                                       
##     Paddleboarding,                                      
##     Paintball,                                           
##     Parks,                                               
##     Party & Event Planning,                              
##     Party Supplies,                                      
##     Performing Arts,                                     
##     Persian/Iranian,                                     
##     Pets,                                                
##     Pilates,                                             
##     Playgrounds,                                         
##     Pool Cleaners,                                       
##     Pool Halls,                                          
##     Preschools,                                          
##     Races & Competitions,                                
##     Rafting/Kayaking,                                    
##     Recreation Centers,                                  
##     Resorts,                                             
##     Restaurants,                                         
##     Saunas,                                              
##     Shopping,                                            
##     Skate Parks,                                         
##     Skating Rinks,                                       
##     Skydiving,                                           
##     Soccer,                                              
##     Specialty Schools,                                   
##     Sporting Goods,                                      
##     Sports Clubs,                                        
##     Summer Camps,                                        
##     Sushi Bars,                                          
##     Swimming Pools,                                      
##     Tai Chi,                                             
##     Tennis,                                              
##     Thai,                                                
##     Trainers,                                            
##     Trampoline Parks,                                    
##     Venues & Event Spaces,                               
##     Videos & Video Game Rental,                          
##     Vocational & Technical School,                       
##     Wedding Planning,                                    
##     Weight Loss Centers,                                 
##     Yoga,                                                
##     Zoos}                             Active Life        
## 4  {Beauty & Spas,                                       
##     Colonics,                                            
##     Day Spas,                                            
##     Doctors,                                             
##     Hair Removal,                                        
##     Health & Medical,                                    
##     Massage Therapy,                                     
##     Medical Centers,                                     
##     Skin Care,                                           
##     Traditional Chinese Medicine}     Acupuncture        
## 5  {Arts & Entertainment,                                
##     Bars,                                                
##     Breakfast & Brunch,                                  
##     Nightlife,                                           
##     Party & Event Planning}           Adult Entertainment
## 6  {Halal,                                               
##     Mediterranean,                                       
##     Pakistani,                                           
##     Persian/Iranian,                                     
##     Restaurants}                      Afghan             
## 7  {Caribbean,                                           
##     Moroccan,                                            
##     Southern}                         African            
## 8  {Bars}                             Airport Lounges    
## 9  {Limos}                            Airport Shuttles   
## 10 {Doctors}                          Allergists

length(transDat) # get number of observations

## [1] 950

size(transDat[1:10]) # number of items in each observation

##  [1]   1   1 118  10   5   5   3   1   1   1

## Endless proceed 3 hours due to length of list, here I omit LIST() and only process inspect().
#'@ LIST(transDat) # convert 'transactions' to a list, note the LIST in CAPS

inspect(transDat2[1:100]) # view the observations

##     items                             
## 1   {American (Traditional),          
##      Breakfast & Brunch,              
##      Restaurants}                     
## 2   {Restaurants,                     
##      Sandwiches}                      
## 3   {IT Services & Computer Repair,   
##      Local Services}                  
## 4   {Italian,                         
##      Restaurants}                     
## 5   {Coffee & Tea,                    
##      Food}                            
## 6   {Fast Food,                       
##      Restaurants}                     
## 7   {Home Services,                   
##      Mortgage Brokers,                
##      Real Estate}                     
## 8   {Brasseries,                      
##      Restaurants}                     
## 9   {American (New),                  
##      Bars,                            
##      Chicken Wings,                   
##      Nightlife,                       
##      Restaurants,                     
##      Sports Bars}                     
## 10  {Auto Detailing,                  
##      Automotive,                      
##      Wheel & Rim Repair,              
##      Windshield Installation & Repair}
## 11  {Auto Parts & Supplies,           
##      Automotive}                      
## 12  {CSA,                             
##      Farmers Market,                  
##      Food,                            
##      Grocery}                         
## 13  {CPR Classes,                     
##      Education,                       
##      First Aid Classes,               
##      Specialty Schools}               
## 14  {Event Planning & Services,       
##      Venues & Event Spaces}           
## 15  {Furniture Stores,                
##      Home & Garden,                   
##      Home Decor,                      
##      Shopping}                        
## 16  {Books, Mags, Music & Video,      
##      Bookstores,                      
##      Shopping}                        
## 17  {Auto Repair,                     
##      Automotive}                      
## 18  {Dry Cleaning & Laundry,          
##      Local Services}                  
## 19  {American (New),                  
##      Burgers,                         
##      Restaurants}                     
## 20  {Pizza,                           
##      Restaurants}                     
## 21  {Beauty & Spas,                   
##      Massage}                         
## 22  {Food,                            
##      Juice Bars & Smoothies}          
## 23  {Pizza,                           
##      Restaurants}                     
## 24  {Bars,                            
##      Lounges,                         
##      Nightlife}                       
## 25  {Bars,                            
##      Champagne Bars,                  
##      Lounges,                         
##      Nightlife}                       
## 26  {Burgers,                         
##      Restaurants}                     
## 27  {American (Traditional),          
##      Bars,                            
##      Nightlife,                       
##      Restaurants}                     
## 28  {Event Photography,               
##      Event Planning & Services,       
##      Photographers,                   
##      Session Photography}             
## 29  {Beauty & Spas,                   
##      Convenience Stores,              
##      Cosmetics & Beauty Supply,       
##      Drugstores,                      
##      Food,                            
##      Shopping}                        
## 30  {Active Life,                     
##      Parks}                           
## 31  {Food,                            
##      Ice Cream & Frozen Yogurt}       
## 32  {Fast Food,                       
##      Restaurants}                     
## 33  {Beauty & Spas,                   
##      Hair Salons}                     
## 34  {Doctors,                         
##      Eyewear & Opticians,             
##      Health & Medical,                
##      Shopping}                        
## 35  {Eyewear & Opticians,             
##      Shopping}                        
## 36  {Beauty & Spas,                   
##      Dermatologists,                  
##      Doctors,                         
##      Health & Medical,                
##      Skin Care}                       
## 37  {Community Service/Non-Profit,    
##      Local Services}                  
## 38  {Art Galleries,                   
##      Arts & Entertainment,            
##      Shopping}                        
## 39  {Cafes,                           
##      Coffee & Tea,                    
##      Food,                            
##      Restaurants}                     
## 40  {Barbers,                         
##      Hair Salons,                     
##      Mens Hair Salons;Beauty & Spas}  
## 41  {Beauty & Spas,                   
##      Blow Dry/Out Services,           
##      Hair Extensions,                 
##      Hair Salons}                     
## 42  {Bakeries,                        
##      Desserts,                        
##      Food}                            
## 43  {Automotive,                      
##      Car Wash}                        
## 44  {Barbers,                         
##      Beauty & Spas,                   
##      Hair Extensions,                 
##      Hair Salons}                     
## 45  {Arts & Entertainment,            
##      Casinos,                         
##      Event Planning & Services,       
##      Hotels,                          
##      Hotels & Travel}                 
## 46  {Pizza,                           
##      Restaurants}                     
## 47  {Bagels,                          
##      Coffee & Tea,                    
##      Food,                            
##      Restaurants,                     
##      Sandwiches}                      
## 48  {Beauty & Spas,                   
##      Hair Salons}                     
## 49  {Delis,                           
##      Fast Food,                       
##      Restaurants,                     
##      Sandwiches}                      
## 50  {Active Life,                     
##      Fitness & Instruction,           
##      Yoga}                            
## 51  {Books, Mags, Music & Video,      
##      Shopping,                        
##      Video Game Stores,               
##      Videos & Video Game Rental}      
## 52  {Beauty & Spas,                   
##      Nail Salons}                     
## 53  {Beauty & Spas,                   
##      Nail Salons}                     
## 54  {Department Stores,               
##      Fashion,                         
##      Shopping}                        
## 55  {Food,                            
##      Ice Cream & Frozen Yogurt}       
## 56  {Portuguese,                      
##      Restaurants}                     
## 57  {Beauty & Spas,                   
##      Spray Tanning,                   
##      Tanning}                         
## 58  {British,                         
##      Restaurants}                     
## 59  {Creperies,                       
##      Restaurants}                     
## 60  {Food,                            
##      Ice Cream & Frozen Yogurt}       
## 61  {Bakeries,                        
##      Food}                            
## 62  {Department Stores,               
##      Fashion,                         
##      Shopping}                        
## 63  {American (New),                  
##      Restaurants}                     
## 64  {Pizza,                           
##      Restaurants}                     
## 65  {Italian,                         
##      Pizza,                           
##      Restaurants}                     
## 66  {Doctors,                         
##      Health & Medical}                
## 67  {Drugstores,                      
##      Shopping}                        
## 68  {Chinese,                         
##      Restaurants}                     
## 69  {Beauty & Spas,                   
##      Hair Removal,                    
##      Makeup Artists,                  
##      Skin Care}                       
## 70  {Home Services,                   
##      Real Estate,                     
##      Real Estate Agents,              
##      Real Estate Services}            
## 71  {Arts & Crafts,                   
##      Fabric Stores,                   
##      Home & Garden,                   
##      Home Decor,                      
##      Shopping}                        
## 72  {Womens Clothing;Fashion;Shopping}
## 73  {Auto Repair,                     
##      Automotive,                      
##      Tires}                           
## 74  {Italian,                         
##      Restaurants}                     
## 75  {Contractors,                     
##      Home Services}                   
## 76  {Dance Clubs,                     
##      Nightlife}                       
## 77  {Active Life,                     
##      Aquarium Services,               
##      Aquariums,                       
##      Pet Services,                    
##      Pet Stores,                      
##      Pets}                            
## 78  {Health & Medical,                
##      Massage Therapy}                 
## 79  {Beer, Wine & Spirits,            
##      Food,                            
##      Grocery}                         
## 80  {Active Life,                     
##      Fitness & Instruction,           
##      Gyms,                            
##      Trainers,                        
##      Yoga}                            
## 81  {Fast Food,                       
##      Restaurants}                     
## 82  {Bars,                            
##      Italian,                         
##      Nightlife,                       
##      Restaurants,                     
##      Wine Bars}                       
## 83  {Bakeries,                        
##      Candy Stores,                    
##      Coffee & Tea,                    
##      Food,                            
##      Specialty Food}                  
## 84  {Food,                            
##      Meat Shops,                      
##      Specialty Food}                  
## 85  {Coffee & Tea,                    
##      Food,                            
##      Restaurants,                     
##      Sandwiches}                      
## 86  {Buffets,                         
##      Restaurants}                     
## 87  {American (New),                  
##      Restaurants}                     
## 88  {Active Life,                     
##      Sports Clubs}                    
## 89  {Italian,                         
##      Pizza,                           
##      Restaurants}                     
## 90  {Coffee & Tea,                    
##      Food}                            
## 91  {Doctors,                         
##      Family Practice,                 
##      Health & Medical}                
## 92  {Restaurants,                     
##      Steakhouses}                     
## 93  {Active Life,                     
##      Fitness & Instruction,           
##      Gyms}                            
## 94  {Automotive,                      
##      Marinas}                         
## 95  {Auto Repair,                     
##      Automotive,                      
##      Tires}                           
## 96  {Restaurants,                     
##      Thai}                            
## 97  {Mexican,                         
##      Restaurants}                     
## 98  {Active Life,                     
##      Fitness & Instruction,           
##      Yoga}                            
## 99  {Doctors,                         
##      Health & Medical,                
##      Medical Centers}                 
## 100 {Mexican,                         
##      Restaurants}

length(transDat2) # get number of observations

## [1] 77185

size(transDat2[1:100]) # number of items in each observation

##   [1] 3 2 2 2 2 2 3 2 6 4 2 4 4 2 4 3 2 2 3 2 2 2 2 3 4 2 4 4 6 2 2 2 2 4 2
##  [36] 5 2 3 4 3 4 3 2 4 5 2 5 2 4 3 4 2 2 3 2 2 3 2 2 2 2 3 2 2 3 2 2 2 4 4
##  [71] 5 1 3 2 2 2 6 2 3 5 2 5 5 3 4 2 2 2 3 2 3 2 3 2 3 2 2 3 3 2

## Endless proceed 3 hours due to length of list, here I omit LIST() and only process inspect().
#'@ LIST(transDat) # convert 'transactions' to a list, note the LIST in CAPS

head(transDat)

## transactions in sparse format with
##  6 transactions (rows) and
##  830 items (columns)

head(transDat2)

## transactions in sparse format with
##  6 transactions (rows) and
##  1048 items (columns)

frequentItems <- eclat(transDat, parameter = list(supp = 0.07, maxlen = 15)) # calculates support for frequent items

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target   ext
##     FALSE    0.07      1     15 frequent itemsets FALSE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 66 
## 
## create itemset ... 
## set transactions ...[830 item(s), 950 transaction(s)] done [0.00s].
## sorting and recoding items ... [3 item(s)] done [0.00s].
## creating bit matrix ... [3 row(s), 950 column(s)] done [0.00s].
## writing  ... [3 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].

itemFrequencyPlot(transDat, topN = 10, type = 'absolute', col = rainbow(4)) # plot frequent items

graph 2.1.2a : top 10 items in dataset.

frequentItems <- eclat(transDat2, parameter = list(supp = 0.07, maxlen = 15)) # calculates support for frequent items

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target   ext
##     FALSE    0.07      1     15 frequent itemsets FALSE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 5402 
## 
## create itemset ... 
## set transactions ...[1048 item(s), 77185 transaction(s)] done [0.03s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating bit matrix ... [4 row(s), 77185 column(s)] done [0.00s].
## writing  ... [4 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].

itemFrequencyPlot(transDat2, topN = 10, type = 'absolute', col = rainbow(4)) # plot frequent items

graph 2.1.2b : top 10 items in dataset.

2.2 Frequent Itemset Mining using Apriori

# Get the rules
rules <- apriori(transDat, parameter = list(supp = 0.01, conf = 0.5))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport support minlen maxlen
##         0.5    0.1    1 none FALSE            TRUE    0.01      1     10
##  target   ext
##   rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[830 item(s), 950 transaction(s)] done [0.00s].
## sorting and recoding items ... [75 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.00s].
## writing ... [2236 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

# Show the top 5 rules, but only 2 digits
options(digits=2)
inspect(sort(subset(rules[1:5], subset = lift > 6), by = 'confidence'))

##   lhs                rhs           support confidence lift
## 4 {Salad}         => {Restaurants} 0.011   1.00        9.3
## 1 {Buffets}       => {Restaurants} 0.013   0.92        8.6
## 3 {Chicken Wings} => {Restaurants} 0.012   0.92        8.5
## 2 {Chicken Wings} => {Barbeque}    0.011   0.83       24.7
## 5 {Caribbean}     => {Restaurants} 0.011   0.83        7.8

# Get the rules
rules <- apriori(transDat2, parameter = list(supp = 0.01, conf = 0.5))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport support minlen maxlen
##         0.5    0.1    1 none FALSE            TRUE    0.01      1     10
##  target   ext
##   rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 771 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[1048 item(s), 77185 transaction(s)] done [0.02s].
## sorting and recoding items ... [49 item(s)] done [0.00s].
## creating transaction tree ... done [0.04s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [57 rule(s)] done [0.00s].
## creating S4 object  ... done [0.01s].

# Show the top 5 rules, but only 2 digits
options(digits=2)
inspect(sort(subset(rules[1:5], subset = lift > 6), by = 'confidence'))

##   lhs                            rhs                 support confidence
## 3 {Ice Cream & Frozen Yogurt} => {Food}              0.013   1.00      
## 4 {General Dentistry}         => {Dentists}          0.011   1.00      
## 5 {Dentists}                  => {General Dentistry} 0.011   0.69      
##   lift
## 3  8.3
## 4 64.6
## 5 64.6

Pattern Discovery in Data Mining Programming Assignment: Frequent Itemset Mining Using Apriori

Data Mining by University of Illinois at Urbana-Champaign

®γσ, Eng Lian Hu 白戸則道®

2016-09-22

Abstract

Programming Assignment: Frequent Itemset Mining Using Apriori

1. Instructions

1.1 Input

1.2 Output

Part 1

Part 2

1.3 Important Tips

2. My submission

2.1 Frequent Single Item Mining

2.2 Frequent Itemset Mining using Apriori

3. Conclusion

4. Appendices

4.1 Documenting File Creation

4.2 Speech and Blooper

4.3 References