## readxl works best with a newer version of the tibble package.
## You currently have tibble v1.4.2.
## Falling back to column name repair from tibble <= v1.4.2.
## Message displays once per session.

Data Exploration

  1. Sales of Riding Mowers

A company that manufactures riding mowers wants to identify the best sales prospects for an intensive sales campaign. In particular, the manufacturer is interested in classifying households as prospective owners or nonowners on the basis of Income ( in $ 1000s) and Lot Size ( in 1000 ft2). The marketing expert looked at a random sample of 24 households, included in the file RidingMowers.xls.

  1. Laptop Sales at a London Computer Chain: Bar Charts and Boxplots.

The file LaptopSalesJanuary2008.xls contains data for all sales of laptops at a computer chain in London in January 2008. This is a subset of the full dataset that includes data for the entire year.

  1. Breakfast Cereals.

Data file: Cereals.csv “Data were collected on the nutritional information and consumer rating of 77 breakfast cereals. The consumer rating is a rating of cereal”healthiness" for consumer information (not a rating by consumers). For each cereal, the data include 13 numerical variables, the information is based on a bowl of cereal rather than a serving size, because most people simply fill a cereal bowl (resulting in constant volume, but not weight). The description of the different variables is given in The following Table."

INSTRUCTIONS

Part I: Numerical Exploration

  1. For each of the dataset described above,

  2. Create a variable dictionary (description, Type of Variable, number of null (missing) values, unit)

ESTA PARTE SE ENCUENTRA EN DOCUMENTO WORD

  1. Create a table with descriptive statistics: mean, median, standard deviation, minimum and maximum values.
##                                mean     median      Std Dev       max
## Retail Price           4.879357e+02    490.000   61.5305867    665.00
## Screen Size (Inches)   1.500000e+01     15.000    0.0000000     15.00
## Battery Life (Hours)   5.138707e+00      5.000    0.8152124      6.00
## RAM (GB)               1.547409e+00      2.000    0.4977786      2.00
## Processor Speeds (GHz) 1.757482e+00      2.000    0.2499037      2.00
## HD Size (GB)           1.503823e+02    120.000  102.5037081    300.00
## OS X Customer          5.308678e+05 531150.500 4414.4817427 549065.00
## OS Y Customer          1.798869e+05 181106.000 4647.0432672 199846.00
## OS X Store             5.307478e+05 529902.000 4159.4985492 541428.00
## OS Y Store             1.798077e+05 179641.000 3995.1250317 190628.00
## CustomerStoreDistance  3.679882e+03   3382.458 2068.9050186  19892.14
##                             min
## Retail Price              300.0
## Screen Size (Inches)       15.0
## Battery Life (Hours)        4.0
## RAM (GB)                    1.0
## Processor Speeds (GHz)      1.5
## HD Size (GB)               40.0
## OS X Customer          512253.0
## OS Y Customer          164886.0
## OS X Store             517917.0
## OS Y Store             168302.0
## CustomerStoreDistance       0.0
##             mean median   Std Dev   max min
## Income   68.4375   64.8 19.793144 110.1  33
## Lot_Size 18.9500   19.0  2.428275  23.6  14
##                 mean    median    Std Dev       max      min
## calories 107.0270270 110.00000 19.8438928 160.00000 50.00000
## protein    2.5135135   2.50000  1.0758016   6.00000  1.00000
## fat        1.0000000   1.00000  1.0068260   5.00000  0.00000
## sodium   162.3648649 180.00000 82.7697871 320.00000  0.00000
## fiber      2.1756757   2.00000  2.4233912  14.00000  0.00000
## carbo     14.7297297  14.50000  3.8916746  23.00000  5.00000
## sugars     7.1081081   7.00000  4.3591113  15.00000  0.00000
## potass    98.5135135  90.00000 70.8786815 330.00000 15.00000
## vitamins  29.0540541  25.00000 22.2943521 100.00000  0.00000
## shelf      2.2162162   2.00000  0.8320674   3.00000  1.00000
## weight     1.0308108   1.00000  0.1534155   1.50000  0.50000
## cups       0.8216216   0.75000  0.2357153   1.50000  0.25000
## rating    42.3717869  40.25309 14.0337125  93.70491 18.04285
  1. Create a correlation table. Comment on the relationship between one pair of variables.
## Warning in cor(LaptopSalesJanuary2008): the standard deviation is zero
##                        Retail Price Screen Size (Inches)
## Retail Price            1.000000000                   NA
## Screen Size (Inches)             NA                    1
## Battery Life (Hours)    0.491384279                   NA
## RAM (GB)                0.288121734                   NA
## Processor Speeds (GHz)  0.151104411                   NA
## HD Size (GB)            0.486015284                   NA
## OS X Customer           0.003470388                   NA
## OS Y Customer          -0.005723961                   NA
## OS X Store             -0.005961426                   NA
## OS Y Store             -0.010147738                   NA
## CustomerStoreDistance   0.012849144                   NA
##                        Battery Life (Hours)     RAM (GB)
## Retail Price                    0.491384279  0.288121734
## Screen Size (Inches)                     NA           NA
## Battery Life (Hours)            1.000000000 -0.080518951
## RAM (GB)                       -0.080518951  1.000000000
## Processor Speeds (GHz)         -0.028400218 -0.013973477
## HD Size (GB)                   -0.165880556 -0.059340743
## OS X Customer                  -0.007483103  0.015278243
## OS Y Customer                  -0.001145769 -0.009150758
## OS X Store                     -0.013375189 -0.002301917
## OS Y Store                     -0.003097381 -0.011536512
## CustomerStoreDistance           0.002879215  0.003550472
##                        Processor Speeds (GHz)  HD Size (GB) OS X Customer
## Retail Price                      0.151104411  0.4860152838   0.003470388
## Screen Size (Inches)                       NA            NA            NA
## Battery Life (Hours)             -0.028400218 -0.1658805562  -0.007483103
## RAM (GB)                         -0.013973477 -0.0593407434   0.015278243
## Processor Speeds (GHz)            1.000000000 -0.0249308433  -0.006822880
## HD Size (GB)                     -0.024930843  1.0000000000  -0.001047220
## OS X Customer                    -0.006822880 -0.0010472202   1.000000000
## OS Y Customer                     0.015387950 -0.0003341631   0.127147965
## OS X Store                       -0.002632854 -0.0003881928   0.791713058
## OS Y Store                        0.003843190  0.0012811391   0.128052036
## CustomerStoreDistance            -0.011324659  0.0128208013  -0.085233169
##                        OS Y Customer    OS X Store   OS Y Store
## Retail Price           -0.0057239606 -0.0059614262 -0.010147738
## Screen Size (Inches)              NA            NA           NA
## Battery Life (Hours)   -0.0011457691 -0.0133751891 -0.003097381
## RAM (GB)               -0.0091507576 -0.0023019169 -0.011536512
## Processor Speeds (GHz)  0.0153879499 -0.0026328536  0.003843190
## HD Size (GB)           -0.0003341631 -0.0003881928  0.001281139
## OS X Customer           0.1271479651  0.7917130579  0.128052036
## OS Y Customer           1.0000000000  0.2295129854  0.739738605
## OS X Store              0.2295129854  1.0000000000  0.214007940
## OS Y Store              0.7397386048  0.2140079397  1.000000000
## CustomerStoreDistance  -0.2649425080 -0.1049821186 -0.073016796
##                        CustomerStoreDistance
## Retail Price                     0.012849144
## Screen Size (Inches)                      NA
## Battery Life (Hours)             0.002879215
## RAM (GB)                         0.003550472
## Processor Speeds (GHz)          -0.011324659
## HD Size (GB)                     0.012820801
## OS X Customer                   -0.085233169
## OS Y Customer                   -0.264942508
## OS X Store                      -0.104982119
## OS Y Store                      -0.073016796
## CustomerStoreDistance            1.000000000

##            Income Lot_Size
## Income   1.000000 0.172151
## Lot_Size 0.172151 1.000000

##             calories     protein           fat        sodium       fiber
## calories  1.00000000  0.03399166  0.5073732397  0.2962474981 -0.29521183
## protein   0.03399166  1.00000000  0.2023533963  0.0115588913  0.51400610
## fat       0.50737324  0.20235340  1.0000000000  0.0008219036  0.01403587
## sodium    0.29624750  0.01155889  0.0008219036  1.0000000000 -0.07073492
## fiber    -0.29521183  0.51400610  0.0140358654 -0.0707349230  1.00000000
## carbo     0.27060605 -0.03674326 -0.2849336855  0.3284091857 -0.37908370
## sugars    0.56912054 -0.28658397  0.2871524866  0.0370589612 -0.15094850
## potass   -0.07136125  0.57874284  0.1996367171 -0.0394380876  0.91150392
## vitamins  0.25984556  0.05479952 -0.0305139099  0.3315759640 -0.03871734
## shelf     0.08924278  0.19563468  0.2779797246 -0.1218968162  0.31378736
## weight    0.69645215  0.23067141  0.2217141647  0.3125335701  0.24629218
## cups      0.08919615 -0.24209861 -0.1575787041  0.1195841083 -0.51369716
## rating   -0.69378466  0.46716218 -0.4050501988 -0.3830123581  0.60341090
##                carbo       sugars       potass    vitamins       shelf
## calories  0.27060605  0.569120535 -0.071361247  0.25984556  0.08924278
## protein  -0.03674326 -0.286583967  0.578742837  0.05479952  0.19563468
## fat      -0.28493369  0.287152487  0.199636717 -0.03051391  0.27797972
## sodium    0.32840919  0.037058961 -0.039438088  0.33157596 -0.12189682
## fiber    -0.37908370 -0.150948502  0.911503921 -0.03871734  0.31378736
## carbo     1.00000000 -0.452069189 -0.365002934  0.25357897 -0.18899627
## sugars   -0.45206919  1.000000000  0.001413982  0.07295438  0.06144909
## potass   -0.36500293  0.001413982  1.000000000 -0.00263583  0.39458548
## vitamins  0.25357897  0.072954382 -0.002635830  1.00000000  0.28440479
## shelf    -0.18899627  0.061449088  0.394585485  0.28440479  1.00000000
## weight    0.14480528  0.460547135  0.420561534  0.32043480  0.19284304
## cups      0.35828371 -0.032436100 -0.501688318  0.13362965 -0.35103354
## rating    0.05594129 -0.755955089  0.415782443 -0.21448095  0.05103975
##              weight        cups      rating
## calories  0.6964521  0.08919615 -0.69378466
## protein   0.2306714 -0.24209861  0.46716218
## fat       0.2217142 -0.15757870 -0.40505020
## sodium    0.3125336  0.11958411 -0.38301236
## fiber     0.2462922 -0.51369716  0.60341090
## carbo     0.1448053  0.35828371  0.05594129
## sugars    0.4605471 -0.03243610 -0.75595509
## potass    0.4205615 -0.50168832  0.41578244
## vitamins  0.3204348  0.13362965 -0.21448095
## shelf     0.1928430 -0.35103354  0.05103975
## weight    1.0000000 -0.20171465 -0.30046104
## cups     -0.2017146  1.00000000 -0.22250440
## rating   -0.3004610 -0.22250440  1.00000000

En el caso de los cereales observamos que existe una correlacion fuerte entre el potasio y la fibra mientras que observamos una correlacion fuerte negativa entre las calorias y el raiting.

Part II: Graphical Exploration

  1. Sales of Riding Mowers: Scatterplots.
    Create a scatterplot of Lot Size vs. Income, color coded by the outcome variable owner/ nonowner. Create legible labels and a legend.
## The following objects are masked from RidingMowers:
## 
##     Income, Lot_Size

ggplot2.scatterplot(data=RidingMowers_,xName=‘Lotsize(in 1000ft2)’, yName=‘income(in $1000s)’, groupName=“owner/nonowner” , main=“Movers data”)

  1. Laptop Sales at a London Computer Chain: Bar Charts and Boxplots.
  1. Create a bar chart, showing the average retail price by store. Which store has the highest average? Which has the lowest?

## [1] 168302
## [1] 517917
## [1] 190628
## [1] 541428

OS X Store tiene el promedio menor (168302) mientras que OS Y Store tiene el promedio mayor (541428).

  1. To better compare retail prices across stores, create side- by- side boxplots of retail price by store. Now compare the prices in the two stores above. Do you see a difference between their price distributions? Explain.

La distribucion de OS Y Store parece ser mas normal con outliyers relativamente simetricos aunque todavia refleja un sesgo hacia la izquierda. La distribucion de OS X store solo tiene outliyers a la derecha y esta sesgada significativamente hacia la derecha.

  1. Breakfast Cereals.

Use the data for the breakfast cereal cereals2.

  1. Plot a histogram for each of the quantitative variables. Answer the following questions:

  1. Which variables have the largest variability?

Las variables con mas variabilidad incluyen a “shelf”, “protein”, “weight”, “vitamins”, y “sugars”

  1. Which variables seem skewed?

Las variables con segos incluyen a “fiber”, “fat”, “potass”, y “raiting”

  1. Plot a side- by- side boxplot comparing the calories in hot versus cold cereals. What does this plot show us?

La distribucion de los “hot-cereals” muestra una media sesgasa a la izquierda aunque el rseto de la distribucion pareciera tener un comportamiento relativamente simetrico con excepcion de los outliyers. Estos outliyers deben analizarse con mas cuidado al ser relativamente demasiados. La dsitribucion de los “cold-cereals” parece tener pocas observaciones como para ser analizada a treves de un boxplot. No presenta variabilidad ni outliyers.

  1. Plot a side- by- side boxplot of consumer rating as a function of the shelf height. If we were to predict consumer rating from shelf height, does it appear that we need to keep all three categories of shelf height?
## [1] 3 1 2

Al examinar los boxplots observamos que sus distribuciones en funcion al shelf-height no varian significativamente entre si por tanto el self-height no pareciera influenciar el consumer raiting.