This analysis looks at customer segmentation at TiVo.
Load, prepare, explore, and analyze the TiVo data.
# Read data
setwd("C:/Users/monic/OneDrive/Desktop/MSBA/Marketing Analytics/TiVo")
tivo_orig<- read_csv("tivo_orig.csv")
## Parsed with column specification:
## cols(
## ID = col_double(),
## Gender = col_character(),
## `Marital Status` = col_character(),
## `Work Status` = col_character(),
## Education = col_character(),
## `Annual Income (x1000 $)` = col_double(),
## Age = col_double(),
## Location = col_character(),
## `Purchasing Decision-maker` = col_character(),
## `Purchasing Location` = col_character(),
## `Monthly Electronics Spend` = col_double(),
## `Monthly Household Spend` = col_double(),
## `Purchasing Frequency (every x months)` = col_double(),
## `Technology Adoption` = col_character(),
## `TV Viewing (hours/day)` = col_double(),
## `Favorite feature` = col_character()
## )
Create a table for each attribute (e.g., Gender) and record the percentage of responses for each answer (e.g., % of respondents answering “male” and % answering “female”).
# Gender
genderdata <-(prop.table(table(tivo_orig$Gender)))
genderdata
##
## female male
## 0.465 0.535
## % of responses based on Gender: female 46.5% and male 53.5%
tivo_orig %>%
group_by(Gender) %>%
summarize(Gender_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 2 x 2
## Gender Gender_distribution
## <chr> <chr>
## 1 female 46.5%
## 2 male 53.5%
tivo_orig %>%
group_by(Gender) %>%
summarize(Gender_distribution=n())
## # A tibble: 2 x 2
## Gender Gender_distribution
## <chr> <int>
## 1 female 465
## 2 male 535
# Marital Status
Marital_Status_data <-prop.table(table(tivo_orig$'Marital Status'))
Marital_Status_data
##
## married single
## 0.72 0.28
## % of responses based on Marital Status: married 72.0% and single 28.0%
tivo_orig %>%
group_by(`Marital Status`) %>%
summarize(Marital_Status_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 2 x 2
## `Marital Status` Marital_Status_distribution
## <chr> <chr>
## 1 married 72.0%
## 2 single 28.0%
tivo_orig %>%
group_by(`Marital Status`) %>%
summarize(Marital_Status_distribution=n())
## # A tibble: 2 x 2
## `Marital Status` Marital_Status_distribution
## <chr> <int>
## 1 married 720
## 2 single 280
# Work Status
Work_Status_data <-prop.table(table(tivo_orig$'Work Status'))
Work_Status_data
##
## none professional
## 0.35 0.65
## % of responses based on Work Status: none 35.0% and professional 65.0%
tivo_orig %>%
group_by(`Work Status`) %>%
summarize(Work_Status_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 2 x 2
## `Work Status` Work_Status_distribution
## <chr> <chr>
## 1 none 35.0%
## 2 professional 65.0%
tivo_orig %>%
group_by(`Work Status`) %>%
summarize(Work_Status_distribution=n())
## # A tibble: 2 x 2
## `Work Status` Work_Status_distribution
## <chr> <int>
## 1 none 350
## 2 professional 650
# Education
educationdata <-prop.table(table(tivo_orig$Education))
educationdata
##
## BA MA none PhD
## 0.260 0.124 0.506 0.110
## % of responses based on Education: BA 26.0%, MA 12.4%, none 50.6% and PhD 11.0%
tivo_orig %>%
group_by(Education) %>%
summarize(Education_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 4 x 2
## Education Education_distribution
## <chr> <chr>
## 1 BA 26.0%
## 2 MA 12.4%
## 3 none 50.6%
## 4 PhD 11.0%
tivo_orig %>%
group_by(Education) %>%
summarize(Education_distribution=n())
## # A tibble: 4 x 2
## Education Education_distribution
## <chr> <int>
## 1 BA 260
## 2 MA 124
## 3 none 506
## 4 PhD 110
# Annual Income
annualincomedata <-prop.table(table(tivo_orig$`Annual Income (x1000 $)`))
annualincomedata
##
## 21 22 23 24 25 26 27 28 29 30 31 32
## 0.001 0.002 0.004 0.018 0.016 0.047 0.055 0.075 0.073 0.080 0.066 0.062
## 33 34 35 36 37 39 40 41 42 43 44 45
## 0.046 0.038 0.026 0.014 0.005 0.002 0.001 0.002 0.004 0.004 0.011 0.022
## 46 47 48 49 50 51 52 53 54 55 56 57
## 0.028 0.022 0.026 0.037 0.019 0.025 0.022 0.020 0.029 0.029 0.021 0.009
## 58 59 60 61 62 63 64 640 730
## 0.013 0.006 0.009 0.003 0.002 0.002 0.002 0.001 0.001
## % of responses based on Annual Income: it has 45 different values presented below
tivo_orig %>%
group_by(`Annual Income (x1000 $)`) %>%
summarize(Annual_Income_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 45 x 2
## `Annual Income (x1000 $)` Annual_Income_distribution
## <dbl> <chr>
## 1 21 0.100%
## 2 22 0.200%
## 3 23 0.400%
## 4 24 1.80%
## 5 25 1.60%
## 6 26 4.70%
## 7 27 5.50%
## 8 28 7.50%
## 9 29 7.30%
## 10 30 8.00%
## # ... with 35 more rows
tivo_orig %>%
group_by(`Annual Income (x1000 $)`) %>%
summarize(Annual_Income_distribution=n())
## # A tibble: 45 x 2
## `Annual Income (x1000 $)` Annual_Income_distribution
## <dbl> <int>
## 1 21 1
## 2 22 2
## 3 23 4
## 4 24 18
## 5 25 16
## 6 26 47
## 7 27 55
## 8 28 75
## 9 29 73
## 10 30 80
## # ... with 35 more rows
# Age
agedata <-prop.table(table(tivo_orig$Age))
agedata
##
## 18 19 20 21 22 23 24 25 26 27 28 29
## 0.006 0.009 0.015 0.012 0.022 0.020 0.016 0.025 0.025 0.018 0.019 0.018
## 30 31 32 33 34 35 36 37 38 39 40 41
## 0.021 0.014 0.017 0.020 0.013 0.014 0.017 0.016 0.015 0.013 0.016 0.013
## 42 43 44 45 46 47 48 49 50 51 52 53
## 0.015 0.013 0.017 0.012 0.008 0.016 0.022 0.014 0.019 0.020 0.019 0.013
## 54 55 56 57 58 59 60 61 62 63 64 65
## 0.018 0.018 0.018 0.013 0.019 0.014 0.019 0.019 0.013 0.018 0.019 0.015
## 66 67 68 69 70 71 72 73 74 75 76 77
## 0.013 0.027 0.010 0.019 0.012 0.006 0.017 0.009 0.012 0.017 0.021 0.010
## 78 79 80
## 0.011 0.011 0.020
## % of responses based on Age: it has 63 different values presented below
tivo_orig %>%
group_by(Age) %>%
summarize(Age_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 63 x 2
## Age Age_distribution
## <dbl> <chr>
## 1 18 0.600%
## 2 19 0.900%
## 3 20 1.50%
## 4 21 1.20%
## 5 22 2.20%
## 6 23 2.00%
## 7 24 1.60%
## 8 25 2.50%
## 9 26 2.50%
## 10 27 1.80%
## # ... with 53 more rows
tivo_orig %>%
group_by(Age) %>%
summarize(Age_distribution=n())
## # A tibble: 63 x 2
## Age Age_distribution
## <dbl> <int>
## 1 18 6
## 2 19 9
## 3 20 15
## 4 21 12
## 5 22 22
## 6 23 20
## 7 24 16
## 8 25 25
## 9 26 25
## 10 27 18
## # ... with 53 more rows
# Location
locationdata <-prop.table(table(tivo_orig$Location))
locationdata
##
## Alabama Alaska Arizona Arkansas California
## 0.006 0.005 0.043 0.004 0.041
## Colorado Connecticut Delaware Florida Georgia
## 0.039 0.037 0.045 0.035 0.027
## Hawaii Idaho Illinois Indiana Iowa
## 0.045 0.043 0.034 0.006 0.003
## Kansas Kentucky Louisiana Maine Maryland
## 0.006 0.006 0.005 0.041 0.039
## Massachusetts Michigan Minnesota Mississippi Missouri
## 0.038 0.006 0.002 0.002 0.008
## Montana Nebraska Nevada New Hampshire New Jersey
## 0.007 0.001 0.037 0.035 0.035
## New Mexico New York North Carolina North Dakota Ohio
## 0.005 0.039 0.008 0.010 0.003
## Oklahoma Oregon Pennsylvania Rhode Island South Carolina
## 0.003 0.010 0.050 0.027 0.006
## South Dakota Tennessee Texas Utah Vermont
## 0.004 0.032 0.007 0.009 0.043
## Virginia Washington West Virginia Wisconsin Wyoming
## 0.004 0.043 0.008 0.004 0.004
## % of responses based on Annual Income: it has 50 different values presented below
tivo_orig %>%
group_by(Location) %>%
summarize(Location_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 50 x 2
## Location Location_distribution
## <chr> <chr>
## 1 Alabama 0.600%
## 2 Alaska 0.500%
## 3 Arizona 4.30%
## 4 Arkansas 0.400%
## 5 California 4.10%
## 6 Colorado 3.90%
## 7 Connecticut 3.70%
## 8 Delaware 4.50%
## 9 Florida 3.50%
## 10 Georgia 2.70%
## # ... with 40 more rows
tivo_orig %>%
group_by(Location) %>%
summarize(Location_distribution=n())
## # A tibble: 50 x 2
## Location Location_distribution
## <chr> <int>
## 1 Alabama 6
## 2 Alaska 5
## 3 Arizona 43
## 4 Arkansas 4
## 5 California 41
## 6 Colorado 39
## 7 Connecticut 37
## 8 Delaware 45
## 9 Florida 35
## 10 Georgia 27
## # ... with 40 more rows
# Purchasing decision maker
purchasedecisionmakerdata <-prop.table(table(tivo_orig$`Purchasing Decision-maker`))
purchasedecisionmakerdata
##
## family single
## 0.56 0.44
## % of responses based on Purchasing Decision-maker: family 56.0% and single 44.0%
tivo_orig %>%
group_by(`Purchasing Decision-maker`) %>%
summarize(Purchasedecisionmaker_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 2 x 2
## `Purchasing Decision-maker` Purchasedecisionmaker_distribution
## <chr> <chr>
## 1 family 56.0%
## 2 single 44.0%
tivo_orig %>%
group_by(`Purchasing Decision-maker`) %>%
summarize(Purchasedecisionmaker_distribution=n())
## # A tibble: 2 x 2
## `Purchasing Decision-maker` Purchasedecisionmaker_distribution
## <chr> <int>
## 1 family 560
## 2 single 440
# Purchasing location
purchaselocationdata <-prop.table(table(tivo_orig$`Purchasing Location`))
purchaselocationdata
##
## discount mass-consumer electronics
## 0.293 0.200
## retail specialty stores
## 0.294 0.170
## web (ebay)
## 0.043
## % of responses based on Purchasing Location: discount 29.3%, mass-consumer electronics 20.0%, retail 29.4%, specialty stores 17% and web(ebay) 4.03%
tivo_orig %>%
group_by(`Purchasing Location`) %>%
summarize(Purchaselocation_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 5 x 2
## `Purchasing Location` Purchaselocation_distribution
## <chr> <chr>
## 1 discount 29.3%
## 2 mass-consumer electronics 20.0%
## 3 retail 29.4%
## 4 specialty stores 17.0%
## 5 web (ebay) 4.30%
tivo_orig %>%
group_by(`Purchasing Location`) %>%
summarize(Purchasing_Location_distribution=n())
## # A tibble: 5 x 2
## `Purchasing Location` Purchasing_Location_distribution
## <chr> <int>
## 1 discount 293
## 2 mass-consumer electronics 200
## 3 retail 294
## 4 specialty stores 170
## 5 web (ebay) 43
# Monthly electronics spend
monthlyelectronicsdata <-prop.table(table(tivo_orig$`Monthly Electronics Spend`))
monthlyelectronicsdata
##
## 7 9 10 11 12 13 14 15 16 17 18 19
## 0.001 0.002 0.008 0.007 0.021 0.029 0.044 0.055 0.057 0.055 0.066 0.058
## 20 21 22 23 24 25 26 27 28 29 30 31
## 0.041 0.025 0.015 0.009 0.005 0.002 0.004 0.006 0.007 0.007 0.006 0.015
## 32 33 34 35 36 37 38 39 40 41 42 43
## 0.021 0.017 0.027 0.034 0.026 0.015 0.015 0.016 0.017 0.011 0.012 0.020
## 44 45 46 47 48 49 50 51 52 53 54 55
## 0.018 0.017 0.012 0.018 0.009 0.005 0.009 0.010 0.008 0.002 0.010 0.010
## 57 58 59 60 61 62 63 64 65 66 67 69
## 0.005 0.004 0.003 0.013 0.005 0.004 0.001 0.004 0.008 0.003 0.008 0.004
## 70 71 72 73 74 75 76 77 78 79 80 81
## 0.001 0.005 0.004 0.002 0.002 0.003 0.002 0.002 0.001 0.001 0.002 0.001
## 82 83 84 85 88
## 0.001 0.002 0.002 0.002 0.001
## % of responses based on Annual Income: it has 77 different values presented below
tivo_orig %>%
group_by(`Monthly Electronics Spend`) %>%
summarize(monthlyelectronicsspend_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 77 x 2
## `Monthly Electronics Spend` monthlyelectronicsspend_distribution
## <dbl> <chr>
## 1 7 0.100%
## 2 9 0.200%
## 3 10 0.800%
## 4 11 0.700%
## 5 12 2.10%
## 6 13 2.90%
## 7 14 4.40%
## 8 15 5.50%
## 9 16 5.70%
## 10 17 5.50%
## # ... with 67 more rows
tivo_orig %>%
group_by(`Monthly Electronics Spend`) %>%
summarize(Monthly_Electronics_Spend_distribution=n())
## # A tibble: 77 x 2
## `Monthly Electronics Spend` Monthly_Electronics_Spend_distribution
## <dbl> <int>
## 1 7 1
## 2 9 2
## 3 10 8
## 4 11 7
## 5 12 21
## 6 13 29
## 7 14 44
## 8 15 55
## 9 16 57
## 10 17 55
## # ... with 67 more rows
# Purchasing Frequency
purchasingfrequencydata <-prop.table(table(tivo_orig$`Purchasing Frequency (every x months)`))
purchasingfrequencydata
##
## 1 2 3 4 5 6 7 8 9 10 11 12
## 0.016 0.026 0.027 0.026 0.028 0.016 0.023 0.024 0.027 0.028 0.027 0.028
## 13 14 15 16 17 18 19 20 21 22 23 24
## 0.021 0.027 0.028 0.020 0.026 0.015 0.021 0.018 0.025 0.017 0.019 0.019
## 25 26 27 28 29 30 31 32 33 34 35 36
## 0.020 0.019 0.020 0.011 0.019 0.018 0.020 0.017 0.021 0.020 0.018 0.020
## 37 38 39 40 41 42 43 44 45 46 47 48
## 0.022 0.015 0.018 0.021 0.027 0.014 0.022 0.015 0.018 0.024 0.019 0.010
## % of responses based on Annual Income: it has 48 different values presented below
tivo_orig %>%
group_by(`Purchasing Frequency (every x months)`) %>%
summarize(purchasing_freq_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 48 x 2
## `Purchasing Frequency (every x months)` purchasing_freq_distribution
## <dbl> <chr>
## 1 1 1.60%
## 2 2 2.60%
## 3 3 2.70%
## 4 4 2.60%
## 5 5 2.80%
## 6 6 1.60%
## 7 7 2.30%
## 8 8 2.40%
## 9 9 2.70%
## 10 10 2.80%
## # ... with 38 more rows
tivo_orig %>%
group_by(`Purchasing Frequency (every x months)`) %>%
summarize(Purchasing_Frequency_distribution=n())
## # A tibble: 48 x 2
## `Purchasing Frequency (every x months)` Purchasing_Frequency_distributi~
## <dbl> <int>
## 1 1 16
## 2 2 26
## 3 3 27
## 4 4 26
## 5 5 28
## 6 6 16
## 7 7 23
## 8 8 24
## 9 9 27
## 10 10 28
## # ... with 38 more rows
# tech adoption
techadoptiondata <-prop.table(table(tivo_orig$`Technology Adoption`))
techadoptiondata
##
## early late
## 0.8 0.2
## % of responses based on Technology Adoption: early 80% and late 20%
tivo_orig %>%
group_by(`Technology Adoption`) %>%
summarize(techadoption_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 2 x 2
## `Technology Adoption` techadoption_distribution
## <chr> <chr>
## 1 early 80.0%
## 2 late 20.0%
tivo_orig %>%
group_by(`Technology Adoption`) %>%
summarize(Technology_Adoption_distribution=n())
## # A tibble: 2 x 2
## `Technology Adoption` Technology_Adoption_distribution
## <chr> <int>
## 1 early 800
## 2 late 200
# TV viewing
TVviewingdata <-prop.table(table(tivo_orig$'TV Viewing (hours/day)'))
TVviewingdata
##
## 0 1 2 3 4 5 6 7 8 9 10 11
## 0.175 0.362 0.232 0.032 0.045 0.044 0.022 0.009 0.010 0.009 0.015 0.020
## 12 13 14
## 0.007 0.014 0.004
## % of responses based on TV Viewing (hours/day): it has 15 different values presented below
tivo_orig %>%
group_by(`TV Viewing (hours/day)`) %>%
summarize(TVviewing_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 15 x 2
## `TV Viewing (hours/day)` TVviewing_distribution
## <dbl> <chr>
## 1 0 17.5%
## 2 1 36.2%
## 3 2 23.2%
## 4 3 3.20%
## 5 4 4.50%
## 6 5 4.40%
## 7 6 2.20%
## 8 7 0.900%
## 9 8 1.00%
## 10 9 0.900%
## 11 10 1.50%
## 12 11 2.00%
## 13 12 0.700%
## 14 13 1.40%
## 15 14 0.400%
tivo_orig %>%
group_by('TV Viewing (hours/day)') %>%
summarize(TV_Viewing_distribution=n())
## # A tibble: 1 x 2
## `"TV Viewing (hours/day)"` TV_Viewing_distribution
## <chr> <int>
## 1 TV Viewing (hours/day) 1000
# Favorite feature
favoritefeatdata <-prop.table(table(tivo_orig$`Favorite feature`))
favoritefeatdata
##
## cool gadget
## 0.228
## programming/interactive features
## 0.130
## saving favorite shows to watch as a family
## 0.200
## schedule control
## 0.221
## time shifting
## 0.221
## % of responses based on Favorite feature: cool gadget 22.8%, programming/interactive features 13.0%, saving favorite shows to watch as a family 20.0%, schedule control 22.1% and time shifting 22.1%
tivo_orig %>%
group_by(`Favorite feature`) %>%
summarize(Favoritefeat_distribution=percent(n()/nrow(tivo_orig)))
## # A tibble: 5 x 2
## `Favorite feature` Favoritefeat_distribution
## <chr> <chr>
## 1 cool gadget 22.8%
## 2 programming/interactive features 13.0%
## 3 saving favorite shows to watch as a family 20.0%
## 4 schedule control 22.1%
## 5 time shifting 22.1%
tivo_orig %>%
group_by(`Favorite feature`) %>%
summarize(Favorite_Feature_distribution=n())
## # A tibble: 5 x 2
## `Favorite feature` Favorite_Feature_distribution
## <chr> <int>
## 1 cool gadget 228
## 2 programming/interactive features 130
## 3 saving favorite shows to watch as a family 200
## 4 schedule control 221
## 5 time shifting 221
Answer the following questions.
# filter for males who are married and adopt technology early and who can afford to purchase TiVo ($499) + one more electronic gadget (additional $50 needed) within 2-year period plus their purchasing frequency indicates that they purchase more than 1 item every two years; 85 individuals will meet the previously listed criteria
married_male_early.adopters_population <- tivo_orig %>%
filter(Gender == "male" & `Marital Status`== "married" & `Technology Adoption`== "early" & (`Monthly Electronics Spend`*24>549) & `Purchasing Frequency (every x months)`< 13)
count(married_male_early.adopters_population)
## # A tibble: 1 x 1
## n
## <int>
## 1 85
# filter for women with education of MA or PhD and who are purchasing decision makers; 55 women meet these criteria
women_education.PhD.MA.decisionmakers_population <- tivo_orig %>%
filter(Gender == "female" & Education %in% c("PhD", "MA") & `Purchasing Decision-maker` == "single")
count(women_education.PhD.MA.decisionmakers_population)
## # A tibble: 1 x 1
## n
## <int>
## 1 55
# filter for early adopters who purchase electronics at least once every year (Purchasing Frequency is at least every 12 months) in specialty stores; 132 subjects meet the specified criteria
earlyadopters_atleastonceayear_specialtystores_population <- tivo_orig %>%
filter(`Technology Adoption` == "early" & `Purchasing Frequency (every x months)`<= 12 & `Purchasing Location` == "specialty stores")
count(earlyadopters_atleastonceayear_specialtystores_population)
## # A tibble: 1 x 1
## n
## <int>
## 1 132
# filter for seniors (above the age of 65) who spend more than six hours a day watching TV; 20 subjects meets the outlined criteria
seniors_watchingTVmorethan6hours_population <- tivo_orig %>%
filter(Age > 65 & `TV Viewing (hours/day)` > 6)
count(seniors_watchingTVmorethan6hours_population)
## # A tibble: 1 x 1
## n
## <int>
## 1 20
# use the dataset from the previous question, 4d, seniors_watchingTVmorethan6hours_population and pull their income range information; Income rage for seniors (above the age of 65) who spend more than six hours a day watching TV have income range from $41,000 to $55,000
range(seniors_watchingTVmorethan6hours_population$`Annual Income (x1000 $)`)
## [1] 41 55
# use the dataset from the question, 4d, seniors_watchingTVmorethan6hours_population and pull their income mean information; Income mean for seniors (above the age of 65) who spend more than six hours a day watching TV have income range from $48,600
mean(seniors_watchingTVmorethan6hours_population$`Annual Income (x1000 $)`)
## [1] 48.6
What is the R^2?
# R^2 of annual income and age
## Multiple R^2 is 0.01755
annualincome_age <- lm(`Annual Income (x1000 $)` ~ Age, data=tivo_orig)
summary(annualincome_age)
##
## Call:
## lm(formula = `Annual Income (x1000 $)` ~ Age, data = tivo_orig)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.29 -10.24 -4.04 8.87 683.72
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.90012 2.80448 9.948 < 0.0000000000000002 ***
## Age 0.22975 0.05442 4.222 0.0000264 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.73 on 998 degrees of freedom
## Multiple R-squared: 0.01755, Adjusted R-squared: 0.01656
## F-statistic: 17.82 on 1 and 998 DF, p-value: 0.00002643
# R^2 of annual income and gender
## Multiple R^2 is 0.004553
annualincome_gender <- lm(`Annual Income (x1000 $)` ~ Gender, data=tivo_orig)
summary(annualincome_gender)
##
## Call:
## lm(formula = `Annual Income (x1000 $)` ~ Gender, data = tivo_orig)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.96 -9.96 -5.77 9.23 689.04
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.766 1.434 25.632 <0.0000000000000002 ***
## Gendermale 4.190 1.961 2.136 0.0329 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.93 on 998 degrees of freedom
## Multiple R-squared: 0.004553, Adjusted R-squared: 0.003555
## F-statistic: 4.564 on 1 and 998 DF, p-value: 0.03289
# R^2
## Multiple R-squared: 1.671e-06
age_purchasingfrequency <- lm(Age ~ `Purchasing Frequency (every x months)`, data=tivo_orig)
summary(age_purchasingfrequency)
##
## Call:
## lm(formula = Age ~ `Purchasing Frequency (every x months)`, data = tivo_orig)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.362 -16.332 0.644 14.688 31.688
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 48.382247 1.093953 44.227
## `Purchasing Frequency (every x months)` -0.001664 0.040747 -0.041
## Pr(>|t|)
## (Intercept) <0.0000000000000002 ***
## `Purchasing Frequency (every x months)` 0.967
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.87 on 998 degrees of freedom
## Multiple R-squared: 1.671e-06, Adjusted R-squared: -0.001
## F-statistic: 0.001668 on 1 and 998 DF, p-value: 0.9674
# R^2
## Multiple R-squared: 0.007112
annual_income_tv_viewing <- lm(`Annual Income (x1000 $)` ~ `TV Viewing (hours/day)`, data=tivo_orig)
summary(annual_income_tv_viewing)
##
## Call:
## lm(formula = `Annual Income (x1000 $)` ~ `TV Viewing (hours/day)`,
## data = tivo_orig)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.78 -9.47 -5.67 7.22 692.22
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.8789 1.2600 29.268 < 0.0000000000000002
## `TV Viewing (hours/day)` 0.8979 0.3358 2.674 0.00762
##
## (Intercept) ***
## `TV Viewing (hours/day)` **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 30.89 on 998 degrees of freedom
## Multiple R-squared: 0.007112, Adjusted R-squared: 0.006118
## F-statistic: 7.149 on 1 and 998 DF, p-value: 0.007623
# R^2
## Multiple R-squared: 0.2136
education_favorite_feature <- tivo_orig %>%
mutate(Education = recode(Education, "none" = "0")) %>%
mutate(Education = recode(Education, "BA" = "1")) %>%
mutate(Education = recode(Education, "MA" = "2")) %>%
mutate(Education = recode(Education, "PhD" = "3"))
education_favorite_feature <- lm(education_favorite_feature$Education ~ education_favorite_feature$`Favorite feature`)
summary(education_favorite_feature)
##
## Call:
## lm(formula = education_favorite_feature$Education ~ education_favorite_feature$`Favorite feature`)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5231 -0.5231 -0.4661 0.5068 2.5339
##
## Coefficients:
## Estimate
## (Intercept) 0.56140
## education_favorite_feature$`Favorite feature`programming/interactive features 0.96167
## education_favorite_feature$`Favorite feature`saving favorite shows to watch as a family 0.93860
## education_favorite_feature$`Favorite feature`schedule control -0.09534
## education_favorite_feature$`Favorite feature`time shifting -0.06819
## Std. Error
## (Intercept) 0.06015
## education_favorite_feature$`Favorite feature`programming/interactive features 0.09982
## education_favorite_feature$`Favorite feature`saving favorite shows to watch as a family 0.08799
## education_favorite_feature$`Favorite feature`schedule control 0.08574
## education_favorite_feature$`Favorite feature`time shifting 0.08574
## t value
## (Intercept) 9.333
## education_favorite_feature$`Favorite feature`programming/interactive features 9.634
## education_favorite_feature$`Favorite feature`saving favorite shows to watch as a family 10.667
## education_favorite_feature$`Favorite feature`schedule control -1.112
## education_favorite_feature$`Favorite feature`time shifting -0.795
## Pr(>|t|)
## (Intercept) <0.0000000000000002
## education_favorite_feature$`Favorite feature`programming/interactive features <0.0000000000000002
## education_favorite_feature$`Favorite feature`saving favorite shows to watch as a family <0.0000000000000002
## education_favorite_feature$`Favorite feature`schedule control 0.266
## education_favorite_feature$`Favorite feature`time shifting 0.427
##
## (Intercept) ***
## education_favorite_feature$`Favorite feature`programming/interactive features ***
## education_favorite_feature$`Favorite feature`saving favorite shows to watch as a family ***
## education_favorite_feature$`Favorite feature`schedule control
## education_favorite_feature$`Favorite feature`time shifting
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9083 on 995 degrees of freedom
## Multiple R-squared: 0.2136, Adjusted R-squared: 0.2104
## F-statistic: 67.57 on 4 and 995 DF, p-value: < 0.00000000000000022
# R^2
## Multiple R-squared: 0.6461
monthly_electronics_spend_household_spend <- lm(`Monthly Electronics Spend` ~ `Monthly Household Spend`, data=tivo_orig)
summary(monthly_electronics_spend_household_spend)
##
## Call:
## lm(formula = `Monthly Electronics Spend` ~ `Monthly Household Spend`,
## data = tivo_orig)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.651 -7.345 -2.802 3.939 41.515
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.73487 0.55306 21.22 <0.0000000000000002
## `Monthly Household Spend` 0.20402 0.00478 42.69 <0.0000000000000002
##
## (Intercept) ***
## `Monthly Household Spend` ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.15 on 998 degrees of freedom
## Multiple R-squared: 0.6461, Adjusted R-squared: 0.6457
## F-statistic: 1822 on 1 and 998 DF, p-value: < 0.00000000000000022
Your text answer here
The correlations for the four attributes range from 1.671e-06 (Age and Purchasing Frequency) to 0.6461 (Monthly Electronics Spend and Monthly Household Spend). The last correlation is high and could make one of the attributes redundant.
Select one or more attributes to use as the basis for generating two segmentation schemes. For example, you could choose Age as the basis for creating segments based on age ranges. Or you could select two attributes that would help you segment by willingness to buy versus ability to pay.
As you generate segmentation schemes, apply it to the provided data to describe market size, average annual income, most appealing feature(s)/benefit(s), purchase location(s), and average electronics purchase. Remember to try to create segments that are homogenous internally and heterogeneous across segments for the indicated number of segments. Not all attributes will be used.
## assumptions: single people watch more TV
glimpse(tivo_orig)
## Observations: 1,000
## Variables: 16
## $ ID <dbl> 1, 2, 3, 4, 5, 6, 7, 8...
## $ Gender <chr> "male", "male", "male"...
## $ `Marital Status` <chr> "married", "single", "...
## $ `Work Status` <chr> "professional", "none"...
## $ Education <chr> "none", "none", "BA", ...
## $ `Annual Income (x1000 $)` <dbl> 49, 46, 58, 51, 46, 31...
## $ Age <dbl> 30, 36, 66, 78, 52, 72...
## $ Location <chr> "Florida", "Alabama", ...
## $ `Purchasing Decision-maker` <chr> "family", "single", "f...
## $ `Purchasing Location` <chr> "mass-consumer electro...
## $ `Monthly Electronics Spend` <dbl> 35, 35, 64, 33, 45, 14...
## $ `Monthly Household Spend` <dbl> 150, 163, 103, 154, 16...
## $ `Purchasing Frequency (every x months)` <dbl> 13, 26, 13, 22, 47, 32...
## $ `Technology Adoption` <chr> "late", "late", "early...
## $ `TV Viewing (hours/day)` <dbl> 2, 10, 0, 5, 2, 1, 0, ...
## $ `Favorite feature` <chr> "saving favorite shows...
summary(tivo_orig)
## ID Gender Marital Status Work Status
## Min. : 1.0 Length:1000 Length:1000 Length:1000
## 1st Qu.: 250.8 Class :character Class :character Class :character
## Median : 500.5 Mode :character Mode :character Mode :character
## Mean : 500.5
## 3rd Qu.: 750.2
## Max. :1000.0
## Education Annual Income (x1000 $) Age
## Length:1000 Min. : 21.00 Min. :18.00
## Class :character 1st Qu.: 29.00 1st Qu.:32.00
## Mode :character Median : 33.00 Median :49.00
## Mean : 39.01 Mean :48.34
## 3rd Qu.: 48.25 3rd Qu.:63.00
## Max. :730.00 Max. :80.00
## Location Purchasing Decision-maker Purchasing Location
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## Monthly Electronics Spend Monthly Household Spend
## Min. : 7.00 Min. : 10.00
## 1st Qu.:17.00 1st Qu.: 47.00
## Median :25.50 Median : 73.00
## Mean :30.96 Mean : 94.24
## 3rd Qu.:42.00 3rd Qu.:124.00
## Max. :88.00 Max. :390.00
## Purchasing Frequency (every x months) Technology Adoption
## Min. : 1.00 Length:1000
## 1st Qu.:11.00 Class :character
## Median :22.00 Mode :character
## Mean :22.99
## 3rd Qu.:35.00
## Max. :48.00
## TV Viewing (hours/day) Favorite feature
## Min. : 0.00 Length:1000
## 1st Qu.: 1.00 Class :character
## Median : 1.00 Mode :character
## Mean : 2.37
## 3rd Qu.: 2.00
## Max. :14.00
## no missing values
## eliminate 2 outliers in the Annual Income attribute; ID# 441 corresponds to $730,000 and ID# 923 corresponds to $680,000 while the mean Annual Income is $39,000 and median is $33,000;(they dominate the Annual Income attributes)
tivo1_orig <- tivo_orig %>%
mutate(`Annual Income (x1000 $)` = ifelse(`Annual Income (x1000 $)` < 100,`Annual Income (x1000 $)`, NA))
summary(tivo1_orig)
## ID Gender Marital Status Work Status
## Min. : 1.0 Length:1000 Length:1000 Length:1000
## 1st Qu.: 250.8 Class :character Class :character Class :character
## Median : 500.5 Mode :character Mode :character Mode :character
## Mean : 500.5
## 3rd Qu.: 750.2
## Max. :1000.0
##
## Education Annual Income (x1000 $) Age
## Length:1000 Min. :21.00 Min. :18.00
## Class :character 1st Qu.:29.00 1st Qu.:32.00
## Mode :character Median :32.50 Median :49.00
## Mean :37.71 Mean :48.34
## 3rd Qu.:48.00 3rd Qu.:63.00
## Max. :64.00 Max. :80.00
## NA's :2
## Location Purchasing Decision-maker Purchasing Location
## Length:1000 Length:1000 Length:1000
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Monthly Electronics Spend Monthly Household Spend
## Min. : 7.00 Min. : 10.00
## 1st Qu.:17.00 1st Qu.: 47.00
## Median :25.50 Median : 73.00
## Mean :30.96 Mean : 94.24
## 3rd Qu.:42.00 3rd Qu.:124.00
## Max. :88.00 Max. :390.00
##
## Purchasing Frequency (every x months) Technology Adoption
## Min. : 1.00 Length:1000
## 1st Qu.:11.00 Class :character
## Median :22.00 Mode :character
## Mean :22.99
## 3rd Qu.:35.00
## Max. :48.00
##
## TV Viewing (hours/day) Favorite feature
## Min. : 0.00 Length:1000
## 1st Qu.: 1.00 Class :character
## Median : 1.00 Mode :character
## Mean : 2.37
## 3rd Qu.: 2.00
## Max. :14.00
##
## eliminating 2 outliers in the Annual Income attribute ($680k and $730k) led to meadian of $33k and mean of $38k
mode=function(x){
modeval=names(table(x))[table(x)==max(table(x))]
return(modeval)
}
Segment_1_population <- tivo1_orig %>%
group_by(Gender,`Marital Status`)
summarize(Segment_1_population, Percent=percent(n()/nrow(tivo_orig)), Avg_Annual_Income=format(mean(`Annual Income (x1000 $)`,na.rm = TRUE), digits = 3),Avg_TV_HoursperDay=format(mean(`TV Viewing (hours/day)`),digits = 3), Most_Fav_Feature=mode(`Favorite feature`), Most_Shopped_Store=mode(`Purchasing Location`), Avg_Monthly_Elec_Spend=format(mean(`Monthly Electronics Spend`),digits=3))
## # A tibble: 4 x 8
## # Groups: Gender [2]
## Gender `Marital Status` Percent Avg_Annual_Inco~ Avg_TV_Hoursper~
## <chr> <chr> <chr> <chr> <chr>
## 1 female married 32.3% 37.5 2.55
## 2 female single 14.2% 35.1 2.51
## 3 male married 39.7% 40 2.16
## 4 male single 13.8% 34.4 2.39
## # ... with 3 more variables: Most_Fav_Feature <chr>,
## # Most_Shopped_Store <chr>, Avg_Monthly_Elec_Spend <chr>
## my initial premise that single people watch more TV than married people turned out to be unfounded; however, I noticed a totally unexpected pattern that Purchasing Location is based on Gender; I noted that females in general prefer shopping at retail stores, regardless of their Marital Status (married and single) and men in general prefer to shop at discount stores, irrespective of their Marital Status (also married and single); hence, I removed Marital Status and only used Gender as a grouping criterion, see below; this information could be used to prepare retail store marketing geared towards women and discount store advertising targeting men
Segment_1_population <- tivo1_orig %>%
group_by(Gender)
summarize(Segment_1_population, Percent=percent(n()/nrow(tivo_orig)), Avg_Annual_Income=format(mean(`Annual Income (x1000 $)`,na.rm = TRUE), digits = 3),Avg_TV_HoursperDay=format(mean(`TV Viewing (hours/day)`),digits = 3), Most_Fav_Feature=mode(`Favorite feature`), Most_Shopped_Store=mode(`Purchasing Location`), Avg_Monthly_Elec_Spend=format(mean(`Monthly Electronics Spend`),digits=3))
## # A tibble: 2 x 7
## Gender Percent Avg_Annual_Inco~ Avg_TV_Hoursper~ Most_Fav_Feature
## <chr> <chr> <chr> <chr> <chr>
## 1 female 46.5% 36.8 2.54 saving favorite~
## 2 male 53.5% 38.5 2.22 cool gadget
## # ... with 2 more variables: Most_Shopped_Store <chr>,
## # Avg_Monthly_Elec_Spend <chr>
Market size (% of TV-involved households) Average annual income Description of segment Most appealing feature/benefit Stores shopped for electronics Average electronics purchase ($)
mode=function(x){
modeval=names(table(x))[table(x)==max(table(x))]
return(modeval)
}
## created Age Brackets; assumption: there should be some differentiation based on age
tivo1_orig <- tivo_orig %>%
mutate(`Age Brackets`=ifelse(Age<26,"18-25", ifelse(Age<46,"26-45",ifelse(Age<66,"46-65","over66"))))
Segment_2_population <- tivo1_orig %>%
group_by(`Age Brackets`)
summarize(Segment_2_population, Percent=percent(n()/nrow(tivo_orig)), Avg_Annual_Income=format(mean(`Annual Income (x1000 $)`,na.rm = TRUE),digits = 3),Avg_TV_HoursperDay=format(mean(`TV Viewing (hours/day)`),digits = 3), Avg_Monthly_Electronics_Spend=format(mean(`Monthly Electronics Spend`),digits = 3),Trend_Work_Status = mode(`Work Status`))
## # A tibble: 4 x 6
## `Age Brackets` Percent Avg_Annual_Inco~ Avg_TV_Hoursper~ Avg_Monthly_Ele~
## <chr> <chr> <chr> <chr> <chr>
## 1 18-25 12.5% 32.4 2.46 33.4
## 2 26-45 32.6% 37.5 2.41 31.1
## 3 46-65 33.4% 39.3 2.42 30.8
## 4 over66 21.5% 44.7 2.17 29.6
## # ... with 1 more variable: Trend_Work_Status <chr>
## not much diversity based on the Age Brackets but by segmenting the population by Age Brackets, I observed that professional work status constitutes majority in all four age groups (18-25, 26-45, 46-65, over66); hence, marketing should be created accordingly (for busy, working individuals who have careers and may want help with aligning the TV programming to their busy lives; for the retired professionals marketing should take a slightly different approach; overall marketing should be created for savvy audience)
## assumption: there should be some differentiation based on favorite feature
Segment_3_population <- tivo1_orig %>%
group_by(`Favorite feature`)
summarize(Segment_3_population, Percent=percent(n()/nrow(tivo1_orig)), Avg_Annual_Income=format(mean(`Annual Income (x1000 $)`,na.rm = TRUE),digits = 3),Avg_TV_HoursperDay=format(mean(`TV Viewing (hours/day)`),digits = 3), Avg_Monthly_Electronics_Spend=format(mean(`Monthly Electronics Spend`),digits = 3) )
## # A tibble: 5 x 5
## `Favorite featu~ Percent Avg_Annual_Inco~ Avg_TV_Hoursper~
## <chr> <chr> <chr> <chr>
## 1 cool gadget 22.8% 38.4 1.05
## 2 programming/int~ 13.0% 29.8 3.11
## 3 saving favorite~ 20.0% 48.1 6.33
## 4 schedule control 22.1% 39.6 1.05
## 5 time shifting 22.1% 36.3 1.04
## # ... with 1 more variable: Avg_Monthly_Electronics_Spend <chr>
## segmentation by favorite feature clearly illustrates that subjects who listed 'saving favorite shows to watch as a family' on average watch TV about 6.33 hours a day which is significantly higher than the second highest score of 3.11 for 'programming/interactive features', in the third place with 1.05 TV viewing hours per day are subjects who chose 'cool gadget' and 'schedule control', in the last place with 1.04 hours of daily TV are those who selected 'time shifting'
## adding Gender to see if my findings will be different for men vs women who chose Favorite Feature 'Saving favorite shows to watch as a family'; the result is slightly higher for females (6.36 vs 6.29); hence, gender does not matter, both groups are watching TV over 6 hours per day
Segment_4_population <- tivo1_orig %>%
group_by(`Favorite feature`,`Gender`)
summarize(Segment_4_population, Percent=percent(n()/nrow(tivo1_orig)), Avg_Annual_Income=format(mean(`Annual Income (x1000 $)`,na.rm = TRUE),digits = 3),Avg_TV_HoursperDay=format(mean(`TV Viewing (hours/day)`),digits = 3), Avg_Monthly_Electronics_Spend=format(mean(`Monthly Electronics Spend`),digits = 3) )
## # A tibble: 10 x 6
## # Groups: Favorite feature [5]
## `Favorite featu~ Gender Percent Avg_Annual_Inco~ Avg_TV_Hoursper~
## <chr> <chr> <chr> <chr> <chr>
## 1 cool gadget female 9.50% 33.2 1.08
## 2 cool gadget male 13.3% 42.1 1.03
## 3 programming/int~ female 6.80% 30.3 3.24
## 4 programming/int~ male 6.20% 29.3 2.97
## 5 saving favorite~ female 10.6% 47.9 6.36
## 6 saving favorite~ male 9.40% 48.3 6.29
## 7 schedule control female 10.2% 34.9 0.922
## 8 schedule control male 11.9% 43.5 1.16
## 9 time shifting female 9.40% 34.5 0.968
## 10 time shifting male 12.7% 37.6 1.09
## # ... with 1 more variable: Avg_Monthly_Electronics_Spend <chr>
Market size (% of TV-involved households) Average annual income Description of segment Most appealing feature/benefit Stores shopped for electronics Average electronics purchase ($)
Write a 150-word summary of your selected segmentation scheme and how you arrived at the segmentation. Include descriptive names for your segments that would be understandable to the marketing manager at TiVo.
##In 8a, I considered Gender and Marital Status. The only significant finding was that females, both married as well as single, mostly purchase their electronics at retails stores while males, married and single, mostly visit electronics stores to make their purchases. Hence, the retail store might concentrate their advertising and product display on female audience while electronics store on male shoppers.
##In 8b, I started testing using age brackets (18-25, 26-45, 46-65, over 65). I assumed that age should be a critical factor in segmenting TiVo market. However, after looking at various attributes segmented by age brackets, I did not see any significant differences. Next, I grouped by Favorite Feature which showed that subjects who listed ‘saving favorite shows to watch as a family’ on average tend to watch TV 6.3 hours every day which is significantly higher than the second group that on average watches TV 3.1 hours daily. Hence, we need to learn more about the 6.3 hours group in order to create appropriate marketing strategy that will resonate with their needs, likes and dislikes. This could be accomplished by using existing data or if not enough information is available, conducting additional surveys of this group in order to collect more information about them.
Your text here
As the analyst, what questions would have you have for TiVo to aid you in your analysis?
Your text here I would like to learn more about the data set.
#1. How old is the data? #2. Who compiled the information? Was this data set generated internally by the TiVo company or purchased from a third party or some other source? #3. What is the source? (surveys of TiVo users, electronic store shoppers versus information extracted from a system) #4. Is this a complete set? #5. Were there any changes made to the original set? If changes were made who made them and what methodologies were used for substitution? (It is unusual to see data that is so clean with no missing values, erroneous entries or blank fields.) #6.Has this data set already been used to conduct research? If so, how was it used? Was it helpful? Where can I access the documentation pertaining to the work performed and results? Could I talk to the employee who conducted this analysis? #7. Are there any more data available such as: #7a. What is the family size? #7b. Are the viewing hours pertaining to all the TV watching (live TV and Tivo)? If so, it would be good to have the breakdown between live TV and Tivo. #7c. Can we obtain a Zip code for each of the participant? (rural vs metropolitan, vicinity to electronic store, etc.) #7d.TiVo purchase date? How old is the Tivo unit the individual is using? #7e.What kind of programming is the individual recording, e.g. sitcoms, sports, news, etc.