Segmentation at TiVo

This analysis looks at customer segmentation at TiVo.

Data Preparation

Load, prepare, explore, and analyze the TiVo data.

# Read data

setwd("C:/Users/monic/OneDrive/Desktop/MSBA/Marketing Analytics/TiVo")
tivo_orig<- read_csv("tivo_orig.csv")

## Parsed with column specification:
## cols(
##   ID = col_double(),
##   Gender = col_character(),
##   `Marital Status` = col_character(),
##   `Work Status` = col_character(),
##   Education = col_character(),
##   `Annual Income (x1000 $)` = col_double(),
##   Age = col_double(),
##   Location = col_character(),
##   `Purchasing Decision-maker` = col_character(),
##   `Purchasing Location` = col_character(),
##   `Monthly Electronics Spend` = col_double(),
##   `Monthly Household Spend` = col_double(),
##   `Purchasing Frequency (every x months)` = col_double(),
##   `Technology Adoption` = col_character(),
##   `TV Viewing (hours/day)` = col_double(),
##   `Favorite feature` = col_character()
## )

Question 3: Data Exploration

Create a table for each attribute (e.g., Gender) and record the percentage of responses for each answer (e.g., % of respondents answering “male” and % answering “female”).

# Gender
genderdata <-(prop.table(table(tivo_orig$Gender)))
genderdata

## 
## female   male 
##  0.465  0.535

## % of responses based on Gender: female 46.5% and male 53.5%
tivo_orig %>%
  group_by(Gender) %>%
  summarize(Gender_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 2 x 2
##   Gender Gender_distribution
##   <chr>  <chr>              
## 1 female 46.5%              
## 2 male   53.5%

tivo_orig %>%
  group_by(Gender) %>%
  summarize(Gender_distribution=n())

## # A tibble: 2 x 2
##   Gender Gender_distribution
##   <chr>                <int>
## 1 female                 465
## 2 male                   535

# Marital Status
Marital_Status_data <-prop.table(table(tivo_orig$'Marital Status'))
Marital_Status_data

## 
## married  single 
##    0.72    0.28

## % of responses based on Marital Status: married 72.0% and single 28.0%
tivo_orig %>%
  group_by(`Marital Status`) %>%
  summarize(Marital_Status_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 2 x 2
##   `Marital Status` Marital_Status_distribution
##   <chr>            <chr>                      
## 1 married          72.0%                      
## 2 single           28.0%

tivo_orig %>%
  group_by(`Marital Status`) %>%
  summarize(Marital_Status_distribution=n())

## # A tibble: 2 x 2
##   `Marital Status` Marital_Status_distribution
##   <chr>                                  <int>
## 1 married                                  720
## 2 single                                   280

# Work Status

Work_Status_data <-prop.table(table(tivo_orig$'Work Status'))
Work_Status_data

## 
##         none professional 
##         0.35         0.65

## % of responses based on Work Status: none 35.0% and professional 65.0%
tivo_orig %>%
  group_by(`Work Status`) %>%
  summarize(Work_Status_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 2 x 2
##   `Work Status` Work_Status_distribution
##   <chr>         <chr>                   
## 1 none          35.0%                   
## 2 professional  65.0%

tivo_orig %>%
  group_by(`Work Status`) %>%
  summarize(Work_Status_distribution=n())

## # A tibble: 2 x 2
##   `Work Status` Work_Status_distribution
##   <chr>                            <int>
## 1 none                               350
## 2 professional                       650

# Education
educationdata <-prop.table(table(tivo_orig$Education))
educationdata

## 
##    BA    MA  none   PhD 
## 0.260 0.124 0.506 0.110

## % of responses based on Education: BA 26.0%, MA 12.4%, none 50.6% and PhD 11.0%
tivo_orig %>%
  group_by(Education) %>%
  summarize(Education_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 4 x 2
##   Education Education_distribution
##   <chr>     <chr>                 
## 1 BA        26.0%                 
## 2 MA        12.4%                 
## 3 none      50.6%                 
## 4 PhD       11.0%

tivo_orig %>%
  group_by(Education) %>%
  summarize(Education_distribution=n())

## # A tibble: 4 x 2
##   Education Education_distribution
##   <chr>                      <int>
## 1 BA                           260
## 2 MA                           124
## 3 none                         506
## 4 PhD                          110

# Annual Income
annualincomedata <-prop.table(table(tivo_orig$`Annual Income (x1000 $)`))
annualincomedata

## 
##    21    22    23    24    25    26    27    28    29    30    31    32 
## 0.001 0.002 0.004 0.018 0.016 0.047 0.055 0.075 0.073 0.080 0.066 0.062 
##    33    34    35    36    37    39    40    41    42    43    44    45 
## 0.046 0.038 0.026 0.014 0.005 0.002 0.001 0.002 0.004 0.004 0.011 0.022 
##    46    47    48    49    50    51    52    53    54    55    56    57 
## 0.028 0.022 0.026 0.037 0.019 0.025 0.022 0.020 0.029 0.029 0.021 0.009 
##    58    59    60    61    62    63    64   640   730 
## 0.013 0.006 0.009 0.003 0.002 0.002 0.002 0.001 0.001

## % of responses based on Annual Income: it has 45 different values presented below
tivo_orig %>%
  group_by(`Annual Income (x1000 $)`) %>%
  summarize(Annual_Income_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 45 x 2
##    `Annual Income (x1000 $)` Annual_Income_distribution
##                        <dbl> <chr>                     
##  1                        21 0.100%                    
##  2                        22 0.200%                    
##  3                        23 0.400%                    
##  4                        24 1.80%                     
##  5                        25 1.60%                     
##  6                        26 4.70%                     
##  7                        27 5.50%                     
##  8                        28 7.50%                     
##  9                        29 7.30%                     
## 10                        30 8.00%                     
## # ... with 35 more rows

tivo_orig %>%
  group_by(`Annual Income (x1000 $)`) %>%
  summarize(Annual_Income_distribution=n())

## # A tibble: 45 x 2
##    `Annual Income (x1000 $)` Annual_Income_distribution
##                        <dbl>                      <int>
##  1                        21                          1
##  2                        22                          2
##  3                        23                          4
##  4                        24                         18
##  5                        25                         16
##  6                        26                         47
##  7                        27                         55
##  8                        28                         75
##  9                        29                         73
## 10                        30                         80
## # ... with 35 more rows

# Age
agedata <-prop.table(table(tivo_orig$Age))
agedata

## 
##    18    19    20    21    22    23    24    25    26    27    28    29 
## 0.006 0.009 0.015 0.012 0.022 0.020 0.016 0.025 0.025 0.018 0.019 0.018 
##    30    31    32    33    34    35    36    37    38    39    40    41 
## 0.021 0.014 0.017 0.020 0.013 0.014 0.017 0.016 0.015 0.013 0.016 0.013 
##    42    43    44    45    46    47    48    49    50    51    52    53 
## 0.015 0.013 0.017 0.012 0.008 0.016 0.022 0.014 0.019 0.020 0.019 0.013 
##    54    55    56    57    58    59    60    61    62    63    64    65 
## 0.018 0.018 0.018 0.013 0.019 0.014 0.019 0.019 0.013 0.018 0.019 0.015 
##    66    67    68    69    70    71    72    73    74    75    76    77 
## 0.013 0.027 0.010 0.019 0.012 0.006 0.017 0.009 0.012 0.017 0.021 0.010 
##    78    79    80 
## 0.011 0.011 0.020

## % of responses based on Age: it has 63 different values presented below
tivo_orig %>%
  group_by(Age) %>%
  summarize(Age_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 63 x 2
##      Age Age_distribution
##    <dbl> <chr>           
##  1    18 0.600%          
##  2    19 0.900%          
##  3    20 1.50%           
##  4    21 1.20%           
##  5    22 2.20%           
##  6    23 2.00%           
##  7    24 1.60%           
##  8    25 2.50%           
##  9    26 2.50%           
## 10    27 1.80%           
## # ... with 53 more rows

tivo_orig %>%
  group_by(Age) %>%
  summarize(Age_distribution=n())

## # A tibble: 63 x 2
##      Age Age_distribution
##    <dbl>            <int>
##  1    18                6
##  2    19                9
##  3    20               15
##  4    21               12
##  5    22               22
##  6    23               20
##  7    24               16
##  8    25               25
##  9    26               25
## 10    27               18
## # ... with 53 more rows

# Location
locationdata <-prop.table(table(tivo_orig$Location))
locationdata

## 
##        Alabama         Alaska        Arizona       Arkansas     California 
##          0.006          0.005          0.043          0.004          0.041 
##       Colorado    Connecticut       Delaware        Florida        Georgia 
##          0.039          0.037          0.045          0.035          0.027 
##         Hawaii          Idaho       Illinois        Indiana           Iowa 
##          0.045          0.043          0.034          0.006          0.003 
##         Kansas       Kentucky      Louisiana          Maine       Maryland 
##          0.006          0.006          0.005          0.041          0.039 
##  Massachusetts       Michigan      Minnesota    Mississippi       Missouri 
##          0.038          0.006          0.002          0.002          0.008 
##        Montana       Nebraska         Nevada  New Hampshire     New Jersey 
##          0.007          0.001          0.037          0.035          0.035 
##     New Mexico       New York North Carolina   North Dakota           Ohio 
##          0.005          0.039          0.008          0.010          0.003 
##       Oklahoma         Oregon   Pennsylvania   Rhode Island South Carolina 
##          0.003          0.010          0.050          0.027          0.006 
##   South Dakota      Tennessee          Texas           Utah        Vermont 
##          0.004          0.032          0.007          0.009          0.043 
##       Virginia     Washington  West Virginia      Wisconsin        Wyoming 
##          0.004          0.043          0.008          0.004          0.004

## % of responses based on Annual Income: it has 50 different values presented below
tivo_orig %>%
  group_by(Location) %>%
  summarize(Location_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 50 x 2
##    Location    Location_distribution
##    <chr>       <chr>                
##  1 Alabama     0.600%               
##  2 Alaska      0.500%               
##  3 Arizona     4.30%                
##  4 Arkansas    0.400%               
##  5 California  4.10%                
##  6 Colorado    3.90%                
##  7 Connecticut 3.70%                
##  8 Delaware    4.50%                
##  9 Florida     3.50%                
## 10 Georgia     2.70%                
## # ... with 40 more rows

tivo_orig %>%
  group_by(Location) %>%
  summarize(Location_distribution=n())

## # A tibble: 50 x 2
##    Location    Location_distribution
##    <chr>                       <int>
##  1 Alabama                         6
##  2 Alaska                          5
##  3 Arizona                        43
##  4 Arkansas                        4
##  5 California                     41
##  6 Colorado                       39
##  7 Connecticut                    37
##  8 Delaware                       45
##  9 Florida                        35
## 10 Georgia                        27
## # ... with 40 more rows

# Purchasing decision maker
purchasedecisionmakerdata <-prop.table(table(tivo_orig$`Purchasing Decision-maker`))
purchasedecisionmakerdata

## 
## family single 
##   0.56   0.44

## % of responses based on Purchasing Decision-maker: family 56.0% and single 44.0%
tivo_orig %>%
  group_by(`Purchasing Decision-maker`) %>%
  summarize(Purchasedecisionmaker_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 2 x 2
##   `Purchasing Decision-maker` Purchasedecisionmaker_distribution
##   <chr>                       <chr>                             
## 1 family                      56.0%                             
## 2 single                      44.0%

tivo_orig %>%
  group_by(`Purchasing Decision-maker`) %>%
  summarize(Purchasedecisionmaker_distribution=n())

## # A tibble: 2 x 2
##   `Purchasing Decision-maker` Purchasedecisionmaker_distribution
##   <chr>                                                    <int>
## 1 family                                                     560
## 2 single                                                     440

# Purchasing location
purchaselocationdata <-prop.table(table(tivo_orig$`Purchasing Location`))
purchaselocationdata

## 
##                  discount mass-consumer electronics 
##                     0.293                     0.200 
##                    retail          specialty stores 
##                     0.294                     0.170 
##                web (ebay) 
##                     0.043

## % of responses based on Purchasing Location: discount 29.3%, mass-consumer electronics 20.0%, retail 29.4%, specialty stores 17% and web(ebay) 4.03%
tivo_orig %>%
  group_by(`Purchasing Location`) %>%
  summarize(Purchaselocation_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 5 x 2
##   `Purchasing Location`     Purchaselocation_distribution
##   <chr>                     <chr>                        
## 1 discount                  29.3%                        
## 2 mass-consumer electronics 20.0%                        
## 3 retail                    29.4%                        
## 4 specialty stores          17.0%                        
## 5 web (ebay)                4.30%

tivo_orig %>%
  group_by(`Purchasing Location`) %>%
  summarize(Purchasing_Location_distribution=n())

## # A tibble: 5 x 2
##   `Purchasing Location`     Purchasing_Location_distribution
##   <chr>                                                <int>
## 1 discount                                               293
## 2 mass-consumer electronics                              200
## 3 retail                                                 294
## 4 specialty stores                                       170
## 5 web (ebay)                                              43

# Monthly electronics spend
monthlyelectronicsdata <-prop.table(table(tivo_orig$`Monthly Electronics Spend`))
monthlyelectronicsdata

## 
##     7     9    10    11    12    13    14    15    16    17    18    19 
## 0.001 0.002 0.008 0.007 0.021 0.029 0.044 0.055 0.057 0.055 0.066 0.058 
##    20    21    22    23    24    25    26    27    28    29    30    31 
## 0.041 0.025 0.015 0.009 0.005 0.002 0.004 0.006 0.007 0.007 0.006 0.015 
##    32    33    34    35    36    37    38    39    40    41    42    43 
## 0.021 0.017 0.027 0.034 0.026 0.015 0.015 0.016 0.017 0.011 0.012 0.020 
##    44    45    46    47    48    49    50    51    52    53    54    55 
## 0.018 0.017 0.012 0.018 0.009 0.005 0.009 0.010 0.008 0.002 0.010 0.010 
##    57    58    59    60    61    62    63    64    65    66    67    69 
## 0.005 0.004 0.003 0.013 0.005 0.004 0.001 0.004 0.008 0.003 0.008 0.004 
##    70    71    72    73    74    75    76    77    78    79    80    81 
## 0.001 0.005 0.004 0.002 0.002 0.003 0.002 0.002 0.001 0.001 0.002 0.001 
##    82    83    84    85    88 
## 0.001 0.002 0.002 0.002 0.001

## % of responses based on Annual Income: it has 77 different values presented below
tivo_orig %>%
  group_by(`Monthly Electronics Spend`) %>%
  summarize(monthlyelectronicsspend_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 77 x 2
##    `Monthly Electronics Spend` monthlyelectronicsspend_distribution
##                          <dbl> <chr>                               
##  1                           7 0.100%                              
##  2                           9 0.200%                              
##  3                          10 0.800%                              
##  4                          11 0.700%                              
##  5                          12 2.10%                               
##  6                          13 2.90%                               
##  7                          14 4.40%                               
##  8                          15 5.50%                               
##  9                          16 5.70%                               
## 10                          17 5.50%                               
## # ... with 67 more rows

tivo_orig %>%
  group_by(`Monthly Electronics Spend`) %>%
  summarize(Monthly_Electronics_Spend_distribution=n())

## # A tibble: 77 x 2
##    `Monthly Electronics Spend` Monthly_Electronics_Spend_distribution
##                          <dbl>                                  <int>
##  1                           7                                      1
##  2                           9                                      2
##  3                          10                                      8
##  4                          11                                      7
##  5                          12                                     21
##  6                          13                                     29
##  7                          14                                     44
##  8                          15                                     55
##  9                          16                                     57
## 10                          17                                     55
## # ... with 67 more rows

# Purchasing Frequency
purchasingfrequencydata <-prop.table(table(tivo_orig$`Purchasing Frequency (every x months)`))
purchasingfrequencydata

## 
##     1     2     3     4     5     6     7     8     9    10    11    12 
## 0.016 0.026 0.027 0.026 0.028 0.016 0.023 0.024 0.027 0.028 0.027 0.028 
##    13    14    15    16    17    18    19    20    21    22    23    24 
## 0.021 0.027 0.028 0.020 0.026 0.015 0.021 0.018 0.025 0.017 0.019 0.019 
##    25    26    27    28    29    30    31    32    33    34    35    36 
## 0.020 0.019 0.020 0.011 0.019 0.018 0.020 0.017 0.021 0.020 0.018 0.020 
##    37    38    39    40    41    42    43    44    45    46    47    48 
## 0.022 0.015 0.018 0.021 0.027 0.014 0.022 0.015 0.018 0.024 0.019 0.010

## % of responses based on Annual Income: it has 48 different values presented below
tivo_orig %>%
  group_by(`Purchasing Frequency (every x months)`) %>%
  summarize(purchasing_freq_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 48 x 2
##    `Purchasing Frequency (every x months)` purchasing_freq_distribution
##                                      <dbl> <chr>                       
##  1                                       1 1.60%                       
##  2                                       2 2.60%                       
##  3                                       3 2.70%                       
##  4                                       4 2.60%                       
##  5                                       5 2.80%                       
##  6                                       6 1.60%                       
##  7                                       7 2.30%                       
##  8                                       8 2.40%                       
##  9                                       9 2.70%                       
## 10                                      10 2.80%                       
## # ... with 38 more rows

tivo_orig %>%
  group_by(`Purchasing Frequency (every x months)`) %>%
  summarize(Purchasing_Frequency_distribution=n())

## # A tibble: 48 x 2
##    `Purchasing Frequency (every x months)` Purchasing_Frequency_distributi~
##                                      <dbl>                            <int>
##  1                                       1                               16
##  2                                       2                               26
##  3                                       3                               27
##  4                                       4                               26
##  5                                       5                               28
##  6                                       6                               16
##  7                                       7                               23
##  8                                       8                               24
##  9                                       9                               27
## 10                                      10                               28
## # ... with 38 more rows

# tech adoption
techadoptiondata <-prop.table(table(tivo_orig$`Technology Adoption`))
techadoptiondata

## 
## early  late 
##   0.8   0.2

## % of responses based on Technology Adoption: early 80% and late 20%
tivo_orig %>%
  group_by(`Technology Adoption`) %>%
  summarize(techadoption_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 2 x 2
##   `Technology Adoption` techadoption_distribution
##   <chr>                 <chr>                    
## 1 early                 80.0%                    
## 2 late                  20.0%

tivo_orig %>%
  group_by(`Technology Adoption`) %>%
  summarize(Technology_Adoption_distribution=n())

## # A tibble: 2 x 2
##   `Technology Adoption` Technology_Adoption_distribution
##   <chr>                                            <int>
## 1 early                                              800
## 2 late                                               200

# TV viewing
TVviewingdata <-prop.table(table(tivo_orig$'TV Viewing (hours/day)'))
TVviewingdata

## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
## 0.175 0.362 0.232 0.032 0.045 0.044 0.022 0.009 0.010 0.009 0.015 0.020 
##    12    13    14 
## 0.007 0.014 0.004

## % of responses based on TV Viewing (hours/day): it has 15 different values presented below
tivo_orig %>%
  group_by(`TV Viewing (hours/day)`) %>%
  summarize(TVviewing_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 15 x 2
##    `TV Viewing (hours/day)` TVviewing_distribution
##                       <dbl> <chr>                 
##  1                        0 17.5%                 
##  2                        1 36.2%                 
##  3                        2 23.2%                 
##  4                        3 3.20%                 
##  5                        4 4.50%                 
##  6                        5 4.40%                 
##  7                        6 2.20%                 
##  8                        7 0.900%                
##  9                        8 1.00%                 
## 10                        9 0.900%                
## 11                       10 1.50%                 
## 12                       11 2.00%                 
## 13                       12 0.700%                
## 14                       13 1.40%                 
## 15                       14 0.400%

tivo_orig %>%
  group_by('TV Viewing (hours/day)') %>%
  summarize(TV_Viewing_distribution=n())

## # A tibble: 1 x 2
##   `"TV Viewing (hours/day)"` TV_Viewing_distribution
##   <chr>                                        <int>
## 1 TV Viewing (hours/day)                        1000

# Favorite feature
favoritefeatdata <-prop.table(table(tivo_orig$`Favorite feature`))
favoritefeatdata

## 
##                                cool gadget 
##                                      0.228 
##           programming/interactive features 
##                                      0.130 
## saving favorite shows to watch as a family 
##                                      0.200 
##                           schedule control 
##                                      0.221 
##                              time shifting 
##                                      0.221

## % of responses based on Favorite feature: cool gadget 22.8%, programming/interactive features 13.0%, saving favorite shows to watch as a family 20.0%, schedule control 22.1% and time shifting 22.1%
tivo_orig %>%
  group_by(`Favorite feature`) %>%
  summarize(Favoritefeat_distribution=percent(n()/nrow(tivo_orig)))

## # A tibble: 5 x 2
##   `Favorite feature`                         Favoritefeat_distribution
##   <chr>                                      <chr>                    
## 1 cool gadget                                22.8%                    
## 2 programming/interactive features           13.0%                    
## 3 saving favorite shows to watch as a family 20.0%                    
## 4 schedule control                           22.1%                    
## 5 time shifting                              22.1%

tivo_orig %>%
  group_by(`Favorite feature`) %>%
  summarize(Favorite_Feature_distribution=n())

## # A tibble: 5 x 2
##   `Favorite feature`                         Favorite_Feature_distribution
##   <chr>                                                              <int>
## 1 cool gadget                                                          228
## 2 programming/interactive features                                     130
## 3 saving favorite shows to watch as a family                           200
## 4 schedule control                                                     221
## 5 time shifting                                                        221

Question 4

Answer the following questions.

4a. In the next two years, how many married men who are early adopters can afford to purchase a TiVo for $499, have enough money to purchase another electronic gadget, and are likely to do so?

# filter for males who are married and adopt technology early and who can afford to purchase TiVo ($499) + one more electronic gadget (additional $50 needed) within 2-year period plus their purchasing frequency indicates that they purchase more than 1 item every two years; 85 individuals will meet the previously listed criteria
married_male_early.adopters_population <- tivo_orig %>% 
  filter(Gender == "male" & `Marital Status`== "married" & `Technology Adoption`== "early" & (`Monthly Electronics Spend`*24>549) & `Purchasing Frequency (every x months)`< 13)
         

count(married_male_early.adopters_population)

## # A tibble: 1 x 1
##       n
##   <int>
## 1    85

4b. How many women with education of MA or PhD are making purchasing decisions for electronics without discussing them with a spouse, either because they are single, or because they are making purchasing decisions without the involvement of their spouses?

# filter for women with education of MA or PhD and who are purchasing decision makers; 55 women meet these criteria
women_education.PhD.MA.decisionmakers_population <- tivo_orig %>% 
  filter(Gender == "female" & Education %in% c("PhD", "MA") & `Purchasing Decision-maker` == "single") 


count(women_education.PhD.MA.decisionmakers_population)

## # A tibble: 1 x 1
##       n
##   <int>
## 1    55

4c. Among early adopters, how many purchase electronics at least once every year and do so in stores that specialize in electronics?

# filter for early adopters who purchase electronics at least once every year (Purchasing Frequency is at least every 12 months) in specialty stores; 132 subjects meet the specified criteria

earlyadopters_atleastonceayear_specialtystores_population <- tivo_orig %>% 
  filter(`Technology Adoption` == "early" & `Purchasing Frequency (every x months)`<= 12 & `Purchasing Location` == "specialty stores")

count(earlyadopters_atleastonceayear_specialtystores_population)

## # A tibble: 1 x 1
##       n
##   <int>
## 1   132

4d. How many seniors (above the age of 65) spend more than six hours a day watching TV?

# filter for seniors (above the age of 65) who spend more than six hours a day watching TV; 20 subjects meets the outlined criteria

seniors_watchingTVmorethan6hours_population <- tivo_orig %>% 
  filter(Age > 65 & `TV Viewing (hours/day)` > 6)

count(seniors_watchingTVmorethan6hours_population)

## # A tibble: 1 x 1
##       n
##   <int>
## 1    20

4e. What is the income range for seniors (above the age of 65) who spend more than six hours a day watching TV?

# use the dataset from the previous question, 4d, seniors_watchingTVmorethan6hours_population and pull their income range information; Income rage for seniors (above the age of 65) who spend more than six hours a day watching TV have income range from $41,000 to $55,000

range(seniors_watchingTVmorethan6hours_population$`Annual Income (x1000 $)`)

## [1] 41 55

4f. What is the average annual income for seniors (above the age of 65) who spend more than six hours a day watching TV?

# use the dataset from the question, 4d, seniors_watchingTVmorethan6hours_population and pull their income mean information; Income mean for seniors (above the age of 65) who spend more than six hours a day watching TV have income range from $48,600

mean(seniors_watchingTVmorethan6hours_population$`Annual Income (x1000 $)`)

## [1] 48.6

Question 5: Correlate Annual Income with Age

What is the R^2?

# R^2 of annual income and age

## Multiple R^2 is 0.01755
annualincome_age <- lm(`Annual Income (x1000 $)` ~ Age, data=tivo_orig)

summary(annualincome_age)

## 
## Call:
## lm(formula = `Annual Income (x1000 $)` ~ Age, data = tivo_orig)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -22.29 -10.24  -4.04   8.87 683.72 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept) 27.90012    2.80448   9.948 < 0.0000000000000002 ***
## Age          0.22975    0.05442   4.222            0.0000264 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.73 on 998 degrees of freedom
## Multiple R-squared:  0.01755,    Adjusted R-squared:  0.01656 
## F-statistic: 17.82 on 1 and 998 DF,  p-value: 0.00002643

Question 6: Correlate Gender and Annual Income

# R^2 of annual income and gender

## Multiple R^2 is 0.004553

annualincome_gender <- lm(`Annual Income (x1000 $)` ~ Gender, data=tivo_orig)

summary(annualincome_gender)

## 
## Call:
## lm(formula = `Annual Income (x1000 $)` ~ Gender, data = tivo_orig)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -19.96  -9.96  -5.77   9.23 689.04 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)   36.766      1.434  25.632 <0.0000000000000002 ***
## Gendermale     4.190      1.961   2.136              0.0329 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.93 on 998 degrees of freedom
## Multiple R-squared:  0.004553,   Adjusted R-squared:  0.003555 
## F-statistic: 4.564 on 1 and 998 DF,  p-value: 0.03289

Question 7: Correlation Analysis, Continued

7a: Age and Purchasing Frequency

# R^2 
## Multiple R-squared:  1.671e-06

age_purchasingfrequency <- lm(Age ~ `Purchasing Frequency (every x months)`, data=tivo_orig)

summary(age_purchasingfrequency)

## 
## Call:
## lm(formula = Age ~ `Purchasing Frequency (every x months)`, data = tivo_orig)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.362 -16.332   0.644  14.688  31.688 
## 
## Coefficients:
##                                          Estimate Std. Error t value
## (Intercept)                             48.382247   1.093953  44.227
## `Purchasing Frequency (every x months)` -0.001664   0.040747  -0.041
##                                                    Pr(>|t|)    
## (Intercept)                             <0.0000000000000002 ***
## `Purchasing Frequency (every x months)`               0.967    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.87 on 998 degrees of freedom
## Multiple R-squared:  1.671e-06,  Adjusted R-squared:  -0.001 
## F-statistic: 0.001668 on 1 and 998 DF,  p-value: 0.9674

7b: Annual Income and TV Viewing

# R^2 
## Multiple R-squared:  0.007112
annual_income_tv_viewing <- lm(`Annual Income (x1000 $)` ~ `TV Viewing (hours/day)`, data=tivo_orig)

summary(annual_income_tv_viewing)

## 
## Call:
## lm(formula = `Annual Income (x1000 $)` ~ `TV Viewing (hours/day)`, 
##     data = tivo_orig)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -16.78  -9.47  -5.67   7.22 692.22 
## 
## Coefficients:
##                          Estimate Std. Error t value             Pr(>|t|)
## (Intercept)               36.8789     1.2600  29.268 < 0.0000000000000002
## `TV Viewing (hours/day)`   0.8979     0.3358   2.674              0.00762
##                             
## (Intercept)              ***
## `TV Viewing (hours/day)` ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.89 on 998 degrees of freedom
## Multiple R-squared:  0.007112,   Adjusted R-squared:  0.006118 
## F-statistic: 7.149 on 1 and 998 DF,  p-value: 0.007623

7c: Education and Favorite Feature

# R^2 
## Multiple R-squared:  0.2136
education_favorite_feature <- tivo_orig %>% 
  mutate(Education = recode(Education, "none" = "0")) %>% 
  mutate(Education = recode(Education, "BA" = "1")) %>% 
  mutate(Education = recode(Education, "MA" = "2")) %>% 
  mutate(Education = recode(Education, "PhD" = "3"))

education_favorite_feature <- lm(education_favorite_feature$Education ~ education_favorite_feature$`Favorite feature`)

summary(education_favorite_feature)

## 
## Call:
## lm(formula = education_favorite_feature$Education ~ education_favorite_feature$`Favorite feature`)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5231 -0.5231 -0.4661  0.5068  2.5339 
## 
## Coefficients:
##                                                                                         Estimate
## (Intercept)                                                                              0.56140
## education_favorite_feature$`Favorite feature`programming/interactive features            0.96167
## education_favorite_feature$`Favorite feature`saving favorite shows to watch as a family  0.93860
## education_favorite_feature$`Favorite feature`schedule control                           -0.09534
## education_favorite_feature$`Favorite feature`time shifting                              -0.06819
##                                                                                         Std. Error
## (Intercept)                                                                                0.06015
## education_favorite_feature$`Favorite feature`programming/interactive features              0.09982
## education_favorite_feature$`Favorite feature`saving favorite shows to watch as a family    0.08799
## education_favorite_feature$`Favorite feature`schedule control                              0.08574
## education_favorite_feature$`Favorite feature`time shifting                                 0.08574
##                                                                                         t value
## (Intercept)                                                                               9.333
## education_favorite_feature$`Favorite feature`programming/interactive features             9.634
## education_favorite_feature$`Favorite feature`saving favorite shows to watch as a family  10.667
## education_favorite_feature$`Favorite feature`schedule control                            -1.112
## education_favorite_feature$`Favorite feature`time shifting                               -0.795
##                                                                                                    Pr(>|t|)
## (Intercept)                                                                             <0.0000000000000002
## education_favorite_feature$`Favorite feature`programming/interactive features           <0.0000000000000002
## education_favorite_feature$`Favorite feature`saving favorite shows to watch as a family <0.0000000000000002
## education_favorite_feature$`Favorite feature`schedule control                                         0.266
## education_favorite_feature$`Favorite feature`time shifting                                            0.427
##                                                                                            
## (Intercept)                                                                             ***
## education_favorite_feature$`Favorite feature`programming/interactive features           ***
## education_favorite_feature$`Favorite feature`saving favorite shows to watch as a family ***
## education_favorite_feature$`Favorite feature`schedule control                              
## education_favorite_feature$`Favorite feature`time shifting                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9083 on 995 degrees of freedom
## Multiple R-squared:  0.2136, Adjusted R-squared:  0.2104 
## F-statistic: 67.57 on 4 and 995 DF,  p-value: < 0.00000000000000022

7d: Monthly Electronics Spend and Monthly Household Spend

# R^2

## Multiple R-squared:  0.6461

monthly_electronics_spend_household_spend <- lm(`Monthly Electronics Spend` ~ `Monthly Household Spend`, data=tivo_orig)

summary(monthly_electronics_spend_household_spend)

## 
## Call:
## lm(formula = `Monthly Electronics Spend` ~ `Monthly Household Spend`, 
##     data = tivo_orig)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.651  -7.345  -2.802   3.939  41.515 
## 
## Coefficients:
##                           Estimate Std. Error t value            Pr(>|t|)
## (Intercept)               11.73487    0.55306   21.22 <0.0000000000000002
## `Monthly Household Spend`  0.20402    0.00478   42.69 <0.0000000000000002
##                              
## (Intercept)               ***
## `Monthly Household Spend` ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.15 on 998 degrees of freedom
## Multiple R-squared:  0.6461, Adjusted R-squared:  0.6457 
## F-statistic:  1822 on 1 and 998 DF,  p-value: < 0.00000000000000022

7e: Of the four correlations, are any high enough to make one of the attributes redundant?

Your text answer here

The correlations for the four attributes range from 1.671e-06 (Age and Purchasing Frequency) to 0.6461 (Monthly Electronics Spend and Monthly Household Spend). The last correlation is high and could make one of the attributes redundant.

Question 8: Segmentation

Select one or more attributes to use as the basis for generating two segmentation schemes. For example, you could choose Age as the basis for creating segments based on age ranges. Or you could select two attributes that would help you segment by willingness to buy versus ability to pay.

As you generate segmentation schemes, apply it to the provided data to describe market size, average annual income, most appealing feature(s)/benefit(s), purchase location(s), and average electronics purchase. Remember to try to create segments that are homogenous internally and heterogeneous across segments for the indicated number of segments. Not all attributes will be used.

8a: Segmentation Scheme A

## assumptions: single people watch more TV

glimpse(tivo_orig)

## Observations: 1,000
## Variables: 16
## $ ID                                      <dbl> 1, 2, 3, 4, 5, 6, 7, 8...
## $ Gender                                  <chr> "male", "male", "male"...
## $ `Marital Status`                        <chr> "married", "single", "...
## $ `Work Status`                           <chr> "professional", "none"...
## $ Education                               <chr> "none", "none", "BA", ...
## $ `Annual Income (x1000 $)`               <dbl> 49, 46, 58, 51, 46, 31...
## $ Age                                     <dbl> 30, 36, 66, 78, 52, 72...
## $ Location                                <chr> "Florida", "Alabama", ...
## $ `Purchasing Decision-maker`             <chr> "family", "single", "f...
## $ `Purchasing Location`                   <chr> "mass-consumer electro...
## $ `Monthly Electronics Spend`             <dbl> 35, 35, 64, 33, 45, 14...
## $ `Monthly Household Spend`               <dbl> 150, 163, 103, 154, 16...
## $ `Purchasing Frequency (every x months)` <dbl> 13, 26, 13, 22, 47, 32...
## $ `Technology Adoption`                   <chr> "late", "late", "early...
## $ `TV Viewing (hours/day)`                <dbl> 2, 10, 0, 5, 2, 1, 0, ...
## $ `Favorite feature`                      <chr> "saving favorite shows...

summary(tivo_orig)

##        ID            Gender          Marital Status     Work Status       
##  Min.   :   1.0   Length:1000        Length:1000        Length:1000       
##  1st Qu.: 250.8   Class :character   Class :character   Class :character  
##  Median : 500.5   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 500.5                                                           
##  3rd Qu.: 750.2                                                           
##  Max.   :1000.0                                                           
##   Education         Annual Income (x1000 $)      Age       
##  Length:1000        Min.   : 21.00          Min.   :18.00  
##  Class :character   1st Qu.: 29.00          1st Qu.:32.00  
##  Mode  :character   Median : 33.00          Median :49.00  
##                     Mean   : 39.01          Mean   :48.34  
##                     3rd Qu.: 48.25          3rd Qu.:63.00  
##                     Max.   :730.00          Max.   :80.00  
##    Location         Purchasing Decision-maker Purchasing Location
##  Length:1000        Length:1000               Length:1000        
##  Class :character   Class :character          Class :character   
##  Mode  :character   Mode  :character          Mode  :character   
##                                                                  
##                                                                  
##                                                                  
##  Monthly Electronics Spend Monthly Household Spend
##  Min.   : 7.00             Min.   : 10.00         
##  1st Qu.:17.00             1st Qu.: 47.00         
##  Median :25.50             Median : 73.00         
##  Mean   :30.96             Mean   : 94.24         
##  3rd Qu.:42.00             3rd Qu.:124.00         
##  Max.   :88.00             Max.   :390.00         
##  Purchasing Frequency (every x months) Technology Adoption
##  Min.   : 1.00                         Length:1000        
##  1st Qu.:11.00                         Class :character   
##  Median :22.00                         Mode  :character   
##  Mean   :22.99                                            
##  3rd Qu.:35.00                                            
##  Max.   :48.00                                            
##  TV Viewing (hours/day) Favorite feature  
##  Min.   : 0.00          Length:1000       
##  1st Qu.: 1.00          Class :character  
##  Median : 1.00          Mode  :character  
##  Mean   : 2.37                            
##  3rd Qu.: 2.00                            
##  Max.   :14.00

## no missing values

## eliminate 2 outliers in the Annual Income attribute; ID# 441 corresponds to $730,000 and ID# 923 corresponds to $680,000 while the mean Annual Income is $39,000 and median is $33,000;(they dominate the Annual Income attributes) 


tivo1_orig <- tivo_orig %>%
  mutate(`Annual Income (x1000 $)` = ifelse(`Annual Income (x1000 $)` < 100,`Annual Income (x1000 $)`, NA))

summary(tivo1_orig)

##        ID            Gender          Marital Status     Work Status       
##  Min.   :   1.0   Length:1000        Length:1000        Length:1000       
##  1st Qu.: 250.8   Class :character   Class :character   Class :character  
##  Median : 500.5   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 500.5                                                           
##  3rd Qu.: 750.2                                                           
##  Max.   :1000.0                                                           
##                                                                           
##   Education         Annual Income (x1000 $)      Age       
##  Length:1000        Min.   :21.00           Min.   :18.00  
##  Class :character   1st Qu.:29.00           1st Qu.:32.00  
##  Mode  :character   Median :32.50           Median :49.00  
##                     Mean   :37.71           Mean   :48.34  
##                     3rd Qu.:48.00           3rd Qu.:63.00  
##                     Max.   :64.00           Max.   :80.00  
##                     NA's   :2                              
##    Location         Purchasing Decision-maker Purchasing Location
##  Length:1000        Length:1000               Length:1000        
##  Class :character   Class :character          Class :character   
##  Mode  :character   Mode  :character          Mode  :character   
##                                                                  
##                                                                  
##                                                                  
##                                                                  
##  Monthly Electronics Spend Monthly Household Spend
##  Min.   : 7.00             Min.   : 10.00         
##  1st Qu.:17.00             1st Qu.: 47.00         
##  Median :25.50             Median : 73.00         
##  Mean   :30.96             Mean   : 94.24         
##  3rd Qu.:42.00             3rd Qu.:124.00         
##  Max.   :88.00             Max.   :390.00         
##                                                   
##  Purchasing Frequency (every x months) Technology Adoption
##  Min.   : 1.00                         Length:1000        
##  1st Qu.:11.00                         Class :character   
##  Median :22.00                         Mode  :character   
##  Mean   :22.99                                            
##  3rd Qu.:35.00                                            
##  Max.   :48.00                                            
##                                                           
##  TV Viewing (hours/day) Favorite feature  
##  Min.   : 0.00          Length:1000       
##  1st Qu.: 1.00          Class :character  
##  Median : 1.00          Mode  :character  
##  Mean   : 2.37                            
##  3rd Qu.: 2.00                            
##  Max.   :14.00                            
##

## eliminating 2 outliers in the Annual Income attribute ($680k and $730k) led to meadian of $33k and mean of $38k



mode=function(x){
modeval=names(table(x))[table(x)==max(table(x))]
  return(modeval)
  }

Segment_1_population <- tivo1_orig %>%
  group_by(Gender,`Marital Status`)


summarize(Segment_1_population, Percent=percent(n()/nrow(tivo_orig)), Avg_Annual_Income=format(mean(`Annual Income (x1000 $)`,na.rm = TRUE), digits = 3),Avg_TV_HoursperDay=format(mean(`TV Viewing (hours/day)`),digits = 3), Most_Fav_Feature=mode(`Favorite feature`), Most_Shopped_Store=mode(`Purchasing Location`), Avg_Monthly_Elec_Spend=format(mean(`Monthly Electronics Spend`),digits=3))

## # A tibble: 4 x 8
## # Groups:   Gender [2]
##   Gender `Marital Status` Percent Avg_Annual_Inco~ Avg_TV_Hoursper~
##   <chr>  <chr>            <chr>   <chr>            <chr>           
## 1 female married          32.3%   37.5             2.55            
## 2 female single           14.2%   35.1             2.51            
## 3 male   married          39.7%   40               2.16            
## 4 male   single           13.8%   34.4             2.39            
## # ... with 3 more variables: Most_Fav_Feature <chr>,
## #   Most_Shopped_Store <chr>, Avg_Monthly_Elec_Spend <chr>

## my initial premise that single people watch more TV than married people turned out to be unfounded; however, I noticed a totally unexpected pattern that Purchasing Location is based on Gender; I noted that females in general prefer shopping at retail stores, regardless of their Marital Status (married and single) and men in general prefer to shop at discount stores, irrespective of their Marital Status (also married and single); hence, I removed Marital Status and only used Gender as a grouping criterion, see below; this information could be used to prepare retail store marketing geared towards women and discount store advertising targeting men

Segment_1_population <- tivo1_orig %>%
  group_by(Gender)

summarize(Segment_1_population, Percent=percent(n()/nrow(tivo_orig)), Avg_Annual_Income=format(mean(`Annual Income (x1000 $)`,na.rm = TRUE), digits = 3),Avg_TV_HoursperDay=format(mean(`TV Viewing (hours/day)`),digits = 3), Most_Fav_Feature=mode(`Favorite feature`), Most_Shopped_Store=mode(`Purchasing Location`), Avg_Monthly_Elec_Spend=format(mean(`Monthly Electronics Spend`),digits=3))

## # A tibble: 2 x 7
##   Gender Percent Avg_Annual_Inco~ Avg_TV_Hoursper~ Most_Fav_Feature
##   <chr>  <chr>   <chr>            <chr>            <chr>           
## 1 female 46.5%   36.8             2.54             saving favorite~
## 2 male   53.5%   38.5             2.22             cool gadget     
## # ... with 2 more variables: Most_Shopped_Store <chr>,
## #   Avg_Monthly_Elec_Spend <chr>

Market size (% of TV-involved households) Average annual income Description of segment Most appealing feature/benefit Stores shopped for electronics Average electronics purchase ($)

8b: Segmentation Scheme B

mode=function(x){
modeval=names(table(x))[table(x)==max(table(x))]
  return(modeval)
  }


## created Age Brackets; assumption: there should be some differentiation based on age

tivo1_orig <- tivo_orig %>%
  mutate(`Age Brackets`=ifelse(Age<26,"18-25", ifelse(Age<46,"26-45",ifelse(Age<66,"46-65","over66"))))


Segment_2_population <- tivo1_orig %>% 
  group_by(`Age Brackets`)


summarize(Segment_2_population, Percent=percent(n()/nrow(tivo_orig)), Avg_Annual_Income=format(mean(`Annual Income (x1000 $)`,na.rm = TRUE),digits = 3),Avg_TV_HoursperDay=format(mean(`TV Viewing (hours/day)`),digits = 3), Avg_Monthly_Electronics_Spend=format(mean(`Monthly Electronics Spend`),digits = 3),Trend_Work_Status = mode(`Work Status`))

## # A tibble: 4 x 6
##   `Age Brackets` Percent Avg_Annual_Inco~ Avg_TV_Hoursper~ Avg_Monthly_Ele~
##   <chr>          <chr>   <chr>            <chr>            <chr>           
## 1 18-25          12.5%   32.4             2.46             33.4            
## 2 26-45          32.6%   37.5             2.41             31.1            
## 3 46-65          33.4%   39.3             2.42             30.8            
## 4 over66         21.5%   44.7             2.17             29.6            
## # ... with 1 more variable: Trend_Work_Status <chr>

## not much diversity based on the Age Brackets but by segmenting the population by Age Brackets, I observed that professional work status constitutes majority in all four age groups (18-25, 26-45, 46-65, over66); hence, marketing should be created accordingly (for busy, working individuals who have careers and may want help with aligning the TV programming to their busy lives; for the retired professionals marketing should take a slightly different approach; overall marketing should be created for savvy audience)



## assumption: there should be some differentiation based on favorite feature

Segment_3_population <- tivo1_orig %>% 
  group_by(`Favorite feature`)


summarize(Segment_3_population, Percent=percent(n()/nrow(tivo1_orig)), Avg_Annual_Income=format(mean(`Annual Income (x1000 $)`,na.rm = TRUE),digits = 3),Avg_TV_HoursperDay=format(mean(`TV Viewing (hours/day)`),digits = 3), Avg_Monthly_Electronics_Spend=format(mean(`Monthly Electronics Spend`),digits = 3) )

## # A tibble: 5 x 5
##   `Favorite featu~ Percent Avg_Annual_Inco~ Avg_TV_Hoursper~
##   <chr>            <chr>   <chr>            <chr>           
## 1 cool gadget      22.8%   38.4             1.05            
## 2 programming/int~ 13.0%   29.8             3.11            
## 3 saving favorite~ 20.0%   48.1             6.33            
## 4 schedule control 22.1%   39.6             1.05            
## 5 time shifting    22.1%   36.3             1.04            
## # ... with 1 more variable: Avg_Monthly_Electronics_Spend <chr>

## segmentation by favorite feature clearly illustrates that subjects who listed 'saving favorite shows to watch as a family' on average watch TV about 6.33 hours a day which is significantly higher than the second highest score of 3.11 for 'programming/interactive features', in the third place with 1.05 TV viewing hours per day are subjects who chose 'cool gadget' and 'schedule control', in the last place with 1.04 hours of daily TV are those who selected 'time shifting'

## adding Gender to see if my findings will be different for men vs women who chose Favorite Feature 'Saving favorite shows to watch as a family'; the result is slightly higher for females (6.36 vs 6.29); hence, gender does not matter, both groups are watching TV over 6 hours per day


Segment_4_population <- tivo1_orig %>% 
  group_by(`Favorite feature`,`Gender`)


summarize(Segment_4_population, Percent=percent(n()/nrow(tivo1_orig)), Avg_Annual_Income=format(mean(`Annual Income (x1000 $)`,na.rm = TRUE),digits = 3),Avg_TV_HoursperDay=format(mean(`TV Viewing (hours/day)`),digits = 3), Avg_Monthly_Electronics_Spend=format(mean(`Monthly Electronics Spend`),digits = 3) )

## # A tibble: 10 x 6
## # Groups:   Favorite feature [5]
##    `Favorite featu~ Gender Percent Avg_Annual_Inco~ Avg_TV_Hoursper~
##    <chr>            <chr>  <chr>   <chr>            <chr>           
##  1 cool gadget      female 9.50%   33.2             1.08            
##  2 cool gadget      male   13.3%   42.1             1.03            
##  3 programming/int~ female 6.80%   30.3             3.24            
##  4 programming/int~ male   6.20%   29.3             2.97            
##  5 saving favorite~ female 10.6%   47.9             6.36            
##  6 saving favorite~ male   9.40%   48.3             6.29            
##  7 schedule control female 10.2%   34.9             0.922           
##  8 schedule control male   11.9%   43.5             1.16            
##  9 time shifting    female 9.40%   34.5             0.968           
## 10 time shifting    male   12.7%   37.6             1.09            
## # ... with 1 more variable: Avg_Monthly_Electronics_Spend <chr>

Market size (% of TV-involved households) Average annual income Description of segment Most appealing feature/benefit Stores shopped for electronics Average electronics purchase ($)

Question 9

Write a 150-word summary of your selected segmentation scheme and how you arrived at the segmentation. Include descriptive names for your segments that would be understandable to the marketing manager at TiVo.

##In 8a, I considered Gender and Marital Status. The only significant finding was that females, both married as well as single, mostly purchase their electronics at retails stores while males, married and single, mostly visit electronics stores to make their purchases. Hence, the retail store might concentrate their advertising and product display on female audience while electronics store on male shoppers.

##In 8b, I started testing using age brackets (18-25, 26-45, 46-65, over 65). I assumed that age should be a critical factor in segmenting TiVo market. However, after looking at various attributes segmented by age brackets, I did not see any significant differences. Next, I grouped by Favorite Feature which showed that subjects who listed ‘saving favorite shows to watch as a family’ on average tend to watch TV 6.3 hours every day which is significantly higher than the second group that on average watches TV 3.1 hours daily. Hence, we need to learn more about the 6.3 hours group in order to create appropriate marketing strategy that will resonate with their needs, likes and dislikes. This could be accomplished by using existing data or if not enough information is available, conducting additional surveys of this group in order to collect more information about them.

Your text here

Question 10

As the analyst, what questions would have you have for TiVo to aid you in your analysis?

Your text here I would like to learn more about the data set.

#1. How old is the data? #2. Who compiled the information? Was this data set generated internally by the TiVo company or purchased from a third party or some other source? #3. What is the source? (surveys of TiVo users, electronic store shoppers versus information extracted from a system) #4. Is this a complete set? #5. Were there any changes made to the original set? If changes were made who made them and what methodologies were used for substitution? (It is unusual to see data that is so clean with no missing values, erroneous entries or blank fields.) #6.Has this data set already been used to conduct research? If so, how was it used? Was it helpful? Where can I access the documentation pertaining to the work performed and results? Could I talk to the employee who conducted this analysis? #7. Are there any more data available such as: #7a. What is the family size? #7b. Are the viewing hours pertaining to all the TV watching (live TV and Tivo)? If so, it would be good to have the breakdown between live TV and Tivo. #7c. Can we obtain a Zip code for each of the participant? (rural vs metropolitan, vicinity to electronic store, etc.) #7d.TiVo purchase date? How old is the Tivo unit the individual is using? #7e.What kind of programming is the individual recording, e.g. sitcoms, sports, news, etc.

Segmentation at TiVo

Monika Leedom

8/31/2019

Segmentation at TiVo

Data Preparation

Question 3: Data Exploration

Question 4

4a. In the next two years, how many married men who are early adopters can afford to purchase a TiVo for $499, have enough money to purchase another electronic gadget, and are likely to do so?

4b. How many women with education of MA or PhD are making purchasing decisions for electronics without discussing them with a spouse, either because they are single, or because they are making purchasing decisions without the involvement of their spouses?

4c. Among early adopters, how many purchase electronics at least once every year and do so in stores that specialize in electronics?

4d. How many seniors (above the age of 65) spend more than six hours a day watching TV?

4e. What is the income range for seniors (above the age of 65) who spend more than six hours a day watching TV?

4f. What is the average annual income for seniors (above the age of 65) who spend more than six hours a day watching TV?

Question 5: Correlate Annual Income with Age

Question 6: Correlate Gender and Annual Income

Question 7: Correlation Analysis, Continued

7a: Age and Purchasing Frequency

7b: Annual Income and TV Viewing

7c: Education and Favorite Feature

7d: Monthly Electronics Spend and Monthly Household Spend

7e: Of the four correlations, are any high enough to make one of the attributes redundant?

Question 8: Segmentation

8a: Segmentation Scheme A

8b: Segmentation Scheme B

Question 9

Question 10