The Data

The data set is from market research that I did for a client who was developing a device to read how well done a piece of meat was.

This is an excessively wide dataframe with nested column headers, many blank and unneccessary columns, and likert scale results across columns.

Plan of Attack

  1. import data

  2. trim unneccesary columns

  3. separate likert columns in new DF and unify the responses, then join them to remaining columns of first DF

  4. convert columns that are “which of these would you want” types of questions to logical/binary

  5. analysis and conclusions


Load data

## [1] 308  98
##   Respondent.ID Collector.ID    Start.Date      End.Date IP.Address
## 1            NA           NA                                     NA
## 2    6480248644    164398360 10/27/17 6:21 10/27/17 6:25         NA
## 3    6480245815    164398360 10/27/17 6:19 10/27/17 6:22         NA
## 4    6480223781    164398360 10/27/17 6:03 10/27/17 6:06         NA
## 5    6480184966    164398360 10/27/17 5:35 10/27/17 5:38         NA
## 6    6480150943    164398360 10/27/17 5:08 10/27/17 5:13         NA
##   Email.Address First.Name Last.Name              Custom.Data.1
## 1            NA         NA        NA                           
## 2            NA         NA        NA 6c0ff50b90493d245d05e76820
## 3            NA         NA        NA a4d0316f64e7241723a4d4d748
## 4            NA         NA        NA 5313424229a9a797ba2e62dd01
## 5            NA         NA        NA 75a8e42d1f310a62c339455e54
## 6            NA         NA        NA ac46d6d26096ecc135d0b809bf
##   What.is.your.first.reaction.to.the.product.                 X     X.1
## 1                               Very positive Somewhat positive Neutral
## 2                                           1                          
## 3                                                             2        
## 4                                                                     3
## 5                                           1                          
## 6                                                             2        
##                 X.2           X.3 How.innovative.is.the.product.
## 1 Somewhat negative Very negative           Extremely innovative
## 2                                                              1
## 3                                                              1
## 4                                                               
## 5                                                               
## 6                                                               
##               X.4                 X.5               X.6
## 1 Very innovative Somewhat innovative Not so innovative
## 2                                                      
## 3                                                      
## 4                                   3                  
## 5                                   3                  
## 6               2                                      
##                     X.7
## 1 Not at all innovative
## 2                      
## 3                      
## 4                      
## 5                      
## 6                      
##   When.you.think.about.the.product..do.you.think.of.it.as.something.you.need.or.don.t.need.
## 1                                                                           Definitely need
## 2                                                                                          
## 3                                                                                          
## 4                                                                                          
## 5                                                                                          
## 6                                                                                          
##             X.8     X.9                X.10                  X.11
## 1 Probably need Neutral Probably don’t need Definitely don’t need
## 2             2                                                  
## 3             2                                                  
## 4                                                               5
## 5                     3                                          
## 6                                         4                      
##   If.the.product.were.available.today..how.likely.would.you.be.to.buy.the.product.
## 1                                                                 Extremely likely
## 2                                                                                1
## 3                                                                                 
## 4                                                                                 
## 5                                                                                 
## 6                                                                                 
##          X.12            X.13          X.14              X.15
## 1 Very likely Somewhat likely Not so likely Not at all likely
## 2                                                            
## 3           2                                                
## 4                                                           5
## 5                           3                                
## 6                           3                                
##   How.likely.is.it.that.you.would.recommend.the.Steak.Rite.to.a.friend.or.colleague.
## 1                                                                                 NA
## 2                                                                                  8
## 3                                                                                  8
## 4                                                                                  1
## 5                                                                                  8
## 6                                                                                  7
##   How.much.would.you.expect.to.pay.for.an.item.like.this.
## 1                                     Open-Ended Response
## 2                                                      25
## 3                                                      15
## 4                                                      10
## 5                                                      15
## 6                                                      13
##   What.are.the.most.important.features.for.an.item.like.this.one..choose.at.least.3..
## 1                                                                        No batteries
## 2                                                                                   1
## 3                                                                                   1
## 4                                                                                   1
## 5                                                                                   1
## 6                                                                                   1
##              X.16          X.17         X.18        X.19      X.20
## 1 Water resistant Easy to clean Easy to read Easy to use High tech
## 2               2             3            4           5         6
## 3                             3                                   
## 4               2             3                        5          
## 5                                                      5          
## 6               2             3            4           5          
##         X.21         X.22                X.23                X.24
## 1 Attractive Easy to hold Doesn't damage food Doesn't lose juices
## 2                       8                   9                  10
## 3                                                                
## 4                                                                
## 5                                           9                  10
## 6                                                                
##                   X.25               X.26            X.27            X.28
## 1 Quality of materials Usable for poultry Usable for fish Usable for pork
## 2                   11                                                   
## 3                   11                                                   
## 4                   11                                                   
## 5                                                                        
## 6                   11                                                   
##   What.is.the.single.most.important.feature.            X.29          X.30
## 1                               No batteries Water resistant Easy to clean
## 2                                                                         
## 3                                                                         
## 4                                                                         
## 5                                                                         
## 6                                          1                              
##           X.31        X.32      X.33       X.34         X.35
## 1 Easy to read Easy to use High tech Attractive Easy to hold
## 2                                                           
## 3                                                           
## 4                                                           
## 5                                                           
## 6                                                           
##                  X.36                X.37                 X.38
## 1 Doesn't damage food Doesn't lose juices Quality of materials
## 2                   9                                         
## 3                   9                                         
## 4                                                           11
## 5                                      10                     
## 6                                                             
##                 X.39            X.40            X.41
## 1 Usable for poultry Usable for fish Usable for pork
## 2                                                   
## 3                                                   
## 4                                                   
## 5                                                   
## 6                                                   
##   Why.do.you.think.someone.would.buy.this.product.          X.42
## 1                         Use for self / household Birthday gift
## 2                                                1              
## 3                                                1              
## 4                                                1              
## 5                                                1              
## 6                                                               
##           X.43              X.44       X.45                     X.46
## 1 Holiday gift Housewarming gift Other gift For cooking professional
## 2                                                                   
## 3                                                                   
## 4                                                                   
## 5                                                                   
## 6                              4                                    
##   What.is.your.age.    X.47    X.48    X.49 X.50 What.is.your.gender. X.51
## 1          Under 18 18 - 29 30 - 44 45 - 59  60+               Female Male
## 2                                 3                                      2
## 3                                 3                                      2
## 4                                 3                                      2
## 5                                 3                                      2
## 6                                 3                                      2
##   How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year.
## 1                                                                    $0 to $9,999
## 2                                                                                
## 3                                                                                
## 4                                                                                
## 5                                                                                
## 6                                                                               1
##                 X.52               X.53               X.54
## 1 $10,000 to $24,999 $25,000 to $49,999 $50,000 to $74,999
## 2                                     3                   
## 3                                                         
## 4                                                         
## 5                                                         
## 6                                                         
##                 X.55                 X.56                 X.57
## 1 $75,000 to $99,999 $100,000 to $124,999 $125,000 to $149,999
## 2                                                             
## 3                  5                                          
## 4                                                             
## 5                  5                                          
## 6                                                             
##                   X.58                 X.59            X.60
## 1 $150,000 to $174,999 $175,000 to $199,999 $200,000 and up
## 2                                                          
## 3                                                          
## 4                                                        10
## 5                                                          
## 6                                                          
##                   X.61   US.Region            X.62               X.63
## 1 Prefer not to answer New England Middle Atlantic East North Central
## 2                                                                    
## 3                                                                   3
## 4                                                                    
## 5                                                2                   
## 6                                                                    
##                 X.64           X.65               X.66               X.67
## 1 West North Central South Atlantic East South Central West South Central
## 2                                                    6                   
## 3                                                                        
## 4                                                                        
## 5                                                                        
## 6                                                                        
##       X.68    X.69       Device.Types                   X.70
## 1 Mountain Pacific iOS Phone / Tablet Android Phone / Tablet
## 2                                                          2
## 3                                                           
## 4                9                                          
## 5                                                           
## 6                9                                          
##                   X.71                     X.72                   X.73
## 1 Other Phone / Tablet Windows Desktop / Laptop MacOS Desktop / Laptop
## 2                                                                     
## 3                                             4                       
## 4                                                                     
## 5                                             4                       
## 6                                             4                       
##    X.74
## 1 Other
## 2      
## 3      
## 4     6
## 5      
## 6

Trim columns

There are columns here that are used to identify individual responses that are not neccessary. I also don’t need to know when the survey was taken.


Create likert df

Create the initial dataframe

##   What.is.your.first.reaction.to.the.product.  X X.1 X.2 X.3
## 2                                           1 NA  NA  NA  NA
## 3                                          NA  2  NA  NA  NA
## 4                                          NA NA   3  NA  NA
## 5                                           1 NA  NA  NA  NA
## 6                                          NA  2  NA  NA  NA
## 7                                           1 NA  NA  NA  NA
##   How.innovative.is.the.product. X.4 X.5 X.6 X.7
## 2                              1  NA  NA  NA  NA
## 3                              1  NA  NA  NA  NA
## 4                             NA  NA   3  NA  NA
## 5                             NA  NA   3  NA  NA
## 6                             NA   2  NA  NA  NA
## 7                              1  NA  NA  NA  NA
##   When.you.think.about.the.product..do.you.think.of.it.as.something.you.need.or.don.t.need.
## 2                                                                                        NA
## 3                                                                                        NA
## 4                                                                                        NA
## 5                                                                                        NA
## 6                                                                                        NA
## 7                                                                                         1
##   X.8 X.9 X.10 X.11
## 2   2  NA   NA   NA
## 3   2  NA   NA   NA
## 4  NA  NA   NA    5
## 5  NA   3   NA   NA
## 6  NA  NA    4   NA
## 7  NA  NA   NA   NA
##   If.the.product.were.available.today..how.likely.would.you.be.to.buy.the.product.
## 2                                                                                1
## 3                                                                               NA
## 4                                                                               NA
## 5                                                                               NA
## 6                                                                               NA
## 7                                                                               NA
##   X.12 X.13 X.14 X.15
## 2   NA   NA   NA   NA
## 3    2   NA   NA   NA
## 4   NA   NA   NA    5
## 5   NA    3   NA   NA
## 6   NA    3   NA   NA
## 7    2   NA   NA   NA

Condense the answers into a single row

## [1] 307  24
## [1] 308  90

Because the main DF has 308 rows and likert only has 307 I have to either insert an artificial row into likert or remove a row from the main df.

As I will eventually be cutting a secondary header row out of df I will do that work now and bring likert back in later.


Create important features df

While the likert scale columns needed to be condensed, the columns that ask what the most important features are need to be converted to logical / binary.

##   No.batteries Water.resistant Easy.to.clean Easy.to.read Easy.to.use
## 1            1               1             1            1           1
## 2            1               2             1            2           2
## 3            1               1             1            2           1
## 4            1               2             2            2           1
## 5            1               1             1            1           1
## 6            2               2             2            1           2
##   High.tech Attractive Easy.to.hold Doesn.t.damage.food
## 1         1          2            1                   1
## 2         2          2            2                   2
## 3         2          2            2                   2
## 4         2          2            2                   1
## 5         2          2            2                   2
## 6         2          2            1                   2
##   Doesn.t.lose.juices Quality.of.materials Usable.for.poultry
## 1                   1                    1                  2
## 2                   2                    1                  2
## 3                   2                    1                  2
## 4                   1                    2                  2
## 5                   2                    1                  2
## 6                   2                    2                  2
##   Usable.for.fish Usable.for.pork
## 1               2               2
## 2               2               2
## 3               2               2
## 4               2               2
## 5               2               2
## 6               1               2


Single most important feature, who would buy, and demographics–a decision must be made

For this section there are two ways to think about storing the data.

If you are going to be doing any kind of modeling on it then it is best to keep each of these features in a separate column as they have no relationship to each other.

If you are going to be creating a relational database or just generating summary data it would be better to consolidate them in the same way that was done with the likert scales.

As I will not be modeling on this any time soon, and as it is easy enough to dummy the variables out if I decide to later, I’ll go with consolidation

##   What.is.the.single.most.important.feature. X.29 X.30 X.31 X.32 X.33 X.34
## 2                                         NA   NA   NA   NA   NA   NA   NA
## 3                                         NA   NA   NA   NA   NA   NA   NA
## 4                                         NA   NA   NA   NA   NA   NA   NA
## 5                                         NA   NA   NA   NA   NA   NA   NA
## 6                                          1   NA   NA   NA   NA   NA   NA
## 7                                         NA   NA   NA   NA   NA   NA   NA
##   X.35 X.36 X.37 X.38 X.39 X.40 X.41 Most_Important_Feature
## 2   NA    9   NA   NA   NA   NA   NA                      9
## 3   NA    9   NA   NA   NA   NA   NA                      9
## 4   NA   NA   NA   11   NA   NA   NA                     11
## 5   NA   NA   10   NA   NA   NA   NA                     10
## 6   NA   NA   NA   NA   NA   NA   NA                      1
## 7   NA    9   NA   NA   NA   NA   NA                      9
##   Why.do.you.think.someone.would.buy.this.product. X.42 X.43 X.44 X.45
## 2                                                1   NA   NA   NA   NA
## 3                                                1   NA   NA   NA   NA
## 4                                                1   NA   NA   NA   NA
## 5                                                1   NA   NA   NA   NA
## 6                                               NA   NA   NA    4   NA
## 7                                               NA    2   NA   NA   NA
##   X.46 Who_Would_Buy
## 2   NA             1
## 3   NA             1
## 4   NA             1
## 5   NA             1
## 6   NA             4
## 7   NA             2
##   What.is.your.age. X.47 X.48 X.49 X.50 AgeRange
## 2                NA   NA    3   NA   NA        3
## 3                NA   NA    3   NA   NA        3
## 4                NA   NA    3   NA   NA        3
## 5                NA   NA    3   NA   NA        3
## 6                NA   NA    3   NA   NA        3
## 7                NA    2   NA   NA   NA        2
##   What.is.your.gender. X.51 Gender
## 2                   NA    2      2
## 3                   NA    2      2
## 4                   NA    2      2
## 5                   NA    2      2
## 6                   NA    2      2
## 7                   NA    2      2
##   How.much.total.combined.money.did.all.members.of.your.HOUSEHOLD.earn.last.year.
## 2                                                                              NA
## 3                                                                              NA
## 4                                                                              NA
## 5                                                                              NA
## 6                                                                               1
## 7                                                                               1
##   X.52 X.53 X.54 X.55 X.56 X.57 X.58 X.59 X.60 X.61 IncomeRange
## 2   NA    3   NA   NA   NA   NA   NA   NA   NA   NA           3
## 3   NA   NA   NA    5   NA   NA   NA   NA   NA   NA           5
## 4   NA   NA   NA   NA   NA   NA   NA   NA   10   NA          10
## 5   NA   NA   NA    5   NA   NA   NA   NA   NA   NA           5
## 6   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA           1
## 7   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA           1
##   US.Region X.62 X.63 X.64 X.65 X.66 X.67 X.68 X.69 Region
## 2         0    0    0    0    0    6    0    0    0      6
## 3         0    0    3    0    0    0    0    0    0      3
## 4         0    0    0    0    0    0    0    0    9      9
## 5         0    2    0    0    0    0    0    0    0      2
## 6         0    0    0    0    0    0    0    0    9      9
## 7         0    2    0    0    0    0    0    0    0      2
##   Device.Types X.70 X.71 X.72 X.73 X.74 Device
## 2           NA    2   NA   NA   NA   NA      2
## 3           NA   NA   NA    4   NA   NA      4
## 4           NA   NA   NA   NA   NA    6      6
## 5           NA   NA   NA    4   NA   NA      4
## 6           NA   NA   NA    4   NA   NA      4
## 7            1   NA   NA   NA   NA   NA      1


Create short dfs for remaining columns

At this point the majority of columns have been processed in such a way that only 307 rows exist. For the 2 remaining I’ll do the same and then reconstitute the dataframe

##   How.likely.is.it.that.you.would.recommend.the.Steak.Rite.to.a.friend.or.colleague.
## 2                                                                                  8
## 3                                                                                  8
## 4                                                                                  1
## 5                                                                                  8
## 6                                                                                  7
## 7                                                                                  4
##   How.much.would.you.expect.to.pay.for.an.item.like.this.
## 2                                                      25
## 3                                                      15
## 4                                                      10
## 5                                                      15
## 6                                                      13
## 7                                                      53
##   Reaction Innovative Need Likely2Buy
## 2        1          1    2          1
## 3        2          1    2          2
## 4        3          3    5          5
## 5        1          3    3          3
## 6        2          2    4          3
## 7        1          1    1          2
##   How.likely.is.it.that.you.would.recommend.the.Steak.Rite.to.a.friend.or.colleague.
## 2                                                                                  8
## 3                                                                                  8
## 4                                                                                  1
## 5                                                                                  8
## 6                                                                                  7
## 7                                                                                  4
##   How.much.would.you.expect.to.pay.for.an.item.like.this. No.batteries
## 2                                                      25            1
## 3                                                      15            1
## 4                                                      10            1
## 5                                                      15            1
## 6                                                      13            1
## 7                                                      53            2
##   Water.resistant Easy.to.clean Easy.to.read Easy.to.use High.tech
## 2               1             1            1           1         1
## 3               2             1            2           2         2
## 4               1             1            2           1         2
## 5               2             2            2           1         2
## 6               1             1            1           1         2
## 7               2             2            1           2         2
##   Attractive Easy.to.hold Doesn.t.damage.food Doesn.t.lose.juices
## 2          2            1                   1                   1
## 3          2            2                   2                   2
## 4          2            2                   2                   2
## 5          2            2                   1                   1
## 6          2            2                   2                   2
## 7          2            1                   2                   2
##   Quality.of.materials Usable.for.poultry Usable.for.fish Usable.for.pork
## 2                    1                  2               2               2
## 3                    1                  2               2               2
## 4                    1                  2               2               2
## 5                    2                  2               2               2
## 6                    1                  2               2               2
## 7                    2                  2               1               2
##   mif.1.307..15. who.1.307..7. age.1.307..6. gen.1.307..3.
## 2              9             1             3             2
## 3              9             1             3             2
## 4             11             1             3             2
## 5             10             1             3             2
## 6              1             4             3             2
## 7              9             2             2             2
##   income.1.307..12. region.1.307..10. device.1.307..7.
## 2                 3                 6                2
## 3                 5                 3                4
## 4                10                 9                6
## 5                 5                 2                4
## 6                 1                 9                4
## 7                 1                 2                1


Polishing

We now have a solid dataframe we can use for analysis, but there are a few tweaks to make.

  • Many column headings need to be cleaned

  • Codes in many columns can be replaced by their values


Clean Column Names

##  [1] "Reaction"             "Innovative"           "Need"                
##  [4] "Likely2Buy"           "Recommend"            "ExpPrice"            
##  [7] "No.batteries"         "Water.resistant"      "Easy.to.clean"       
## [10] "Easy.to.read"         "Easy.to.use"          "High.tech"           
## [13] "Attractive"           "Easy.to.hold"         "Doesn.t.damage.food" 
## [16] "Doesn.t.lose.juices"  "Quality.of.materials" "Usable.for.poultry"  
## [19] "Usable.for.fish"      "Usable.for.pork"      "MostImpFeat"         
## [22] "WhyBuy"               "AgeRange"             "Gender"              
## [25] "IncomeRange"          "Region"               "DeviceUsed"
##   Reaction Innovative Need Likely2Buy Recommend ExpPrice No.batteries
## 2        1          1    2          1         8       25            1
## 3        2          1    2          2         8       15            1
## 4        3          3    5          5         1       10            1
## 5        1          3    3          3         8       15            1
## 6        2          2    4          3         7       13            1
## 7        1          1    1          2         4       53            2
##   Water.resistant Easy.to.clean Easy.to.read Easy.to.use High.tech
## 2               1             1            1           1         1
## 3               2             1            2           2         2
## 4               1             1            2           1         2
## 5               2             2            2           1         2
## 6               1             1            1           1         2
## 7               2             2            1           2         2
##   Attractive Easy.to.hold Doesn.t.damage.food Doesn.t.lose.juices
## 2          2            1                   1                   1
## 3          2            2                   2                   2
## 4          2            2                   2                   2
## 5          2            2                   1                   1
## 6          2            2                   2                   2
## 7          2            1                   2                   2
##   Quality.of.materials Usable.for.poultry Usable.for.fish Usable.for.pork
## 2                    1                  2               2               2
## 3                    1                  2               2               2
## 4                    1                  2               2               2
## 5                    2                  2               2               2
## 6                    1                  2               2               2
## 7                    2                  2               1               2
##   MostImpFeat WhyBuy AgeRange Gender IncomeRange Region DeviceUsed
## 2           9      1        3      2           3      6          2
## 3           9      1        3      2           5      3          4
## 4          11      1        3      2          10      9          6
## 5          10      1        3      2           5      2          4
## 6           1      4        3      2           1      9          4
## 7           9      2        2      2           1      2          1


Replace number keys with values

## The following `from` values were not present in `x`: 1
## The following `from` values were not present in `x`: 3


Analysis & Conclusions

1 - Pull averages for Reaction, Innovative, Need, Likely to Buy, Would Recommend, and Expected price

##   Reaction Innovative     Need Likely2Buy Recommend ExpPrice
## 1 2.228013    2.34202 3.175896   3.254072  5.504886       NA


Reaction through Likely2Buy 1 is best, 5 is worst.

Recommend 10 is best 1 is worst

Reaction was positive and people saw the product as innovative, but not overwhelmingly so in either case.

They do not perceive the product as something they need, nor are they likely to buy it and they aren’t eager to recommend it.

The client had hoped to sell the item for around $20-$25 so the expected price isn’t far off.


2 - What are the most important features and who would buy this?

## # A tibble: 14 x 2
##    MostImpFeat          no_rows
##    <fct>                  <int>
##  1 No batteries              30
##  2 Water resistant            8
##  3 Easy to clean             25
##  4 Easy to read              28
##  5 Easy to use               84
##  6 High tech                  8
##  7 Attractive                 2
##  8 Easy to hold              13
##  9 Doesn't damage food       38
## 10 Doesn't lose juices       31
## 11 Quality of materials      20
## 12 Usable for poultry        12
## 13 Usable for fish            3
## 14 Usable for pork            5

The most important thing, by far, is that the unit is easy to use. People also like that it wouldn’t damage the food or let juices out, as well as that it doesn’t need batteries.


3 - What person would someone buy this for?

## # A tibble: 6 x 2
##   WhyBuy                   no_rows
##   <fct>                      <int>
## 1 Use for self / household     143
## 2 Birthday gift                 22
## 3 Holiday gift                  42
## 4 Housewarming gift             29
## 5 Other gift                    25
## 6 For cooking professional      46

Most folks think that this is something that someone would buy for themselves, not as a gift.