Data Exploration

This should include summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text.

Summary Data

##        X            price          speed              hd        
##  Min.   :   1   Min.   : 949   Min.   : 25.00   Min.   :  80.0  
##  1st Qu.:1566   1st Qu.:1794   1st Qu.: 33.00   1st Qu.: 214.0  
##  Median :3130   Median :2144   Median : 50.00   Median : 340.0  
##  Mean   :3130   Mean   :2220   Mean   : 52.01   Mean   : 416.6  
##  3rd Qu.:4694   3rd Qu.:2595   3rd Qu.: 66.00   3rd Qu.: 528.0  
##  Max.   :6259   Max.   :5399   Max.   :100.00   Max.   :2100.0  
##       ram             screen           cd               multi          
##  Min.   : 2.000   Min.   :14.00   Length:6259        Length:6259       
##  1st Qu.: 4.000   1st Qu.:14.00   Class :character   Class :character  
##  Median : 8.000   Median :14.00   Mode  :character   Mode  :character  
##  Mean   : 8.287   Mean   :14.61                                        
##  3rd Qu.: 8.000   3rd Qu.:15.00                                        
##  Max.   :32.000   Max.   :17.00                                        
##    premium               ads            trend      
##  Length:6259        Min.   : 39.0   Min.   : 1.00  
##  Class :character   1st Qu.:162.5   1st Qu.:10.00  
##  Mode  :character   Median :246.0   Median :16.00  
##                     Mean   :221.3   Mean   :15.93  
##                     3rd Qu.:275.0   3rd Qu.:21.50  
##                     Max.   :339.0   Max.   :35.00

Summary Conclusion

  • From the summary data, we can deduce that the average computer from this dataset will cost $2,220
  • That $2,200 will get the use a computer with 52 MHz of speed, a 416 MB hard drive, 8.2 MB RAM, a ~15 in monitor, no CD-drive, and no multimedia kit.
  • The computer will also be from a premium computer manufacturer that puts out 221 ads a month.

Data wrangling

Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example - if it makes sense you could sum two columns together)

compactData <-
    select(data, -(ads:trend)) %>%
    rename(pcid = X, speed_mhz = speed, hd_mb = hd, ram_mb = ram, screen_size = screen, premium_manufacturer = premium)
abvAvgSpeed <- filter(compactData, speed_mhz > 52.01)
summary(abvAvgSpeed)
##       pcid          price        speed_mhz          hd_mb       
##  Min.   :   6   Min.   :1245   Min.   : 66.00   Min.   :  85.0  
##  1st Qu.:2444   1st Qu.:1949   1st Qu.: 66.00   1st Qu.: 340.0  
##  Median :4041   Median :2335   Median : 66.00   Median : 450.0  
##  Mean   :3792   Mean   :2412   Mean   : 72.99   Mean   : 510.7  
##  3rd Qu.:5439   3rd Qu.:2799   3rd Qu.: 66.00   3rd Qu.: 545.0  
##  Max.   :6259   Max.   :5399   Max.   :100.00   Max.   :2100.0  
##      ram_mb        screen_size        cd               multi          
##  Min.   : 2.000   Min.   :14.0   Length:2666        Length:2666       
##  1st Qu.: 4.000   1st Qu.:14.0   Class :character   Class :character  
##  Median : 8.000   Median :15.0   Mode  :character   Mode  :character  
##  Mean   : 9.623   Mean   :14.8                                        
##  3rd Qu.:16.000   3rd Qu.:15.0                                        
##  Max.   :32.000   Max.   :17.0                                        
##  premium_manufacturer
##  Length:2666         
##  Class :character    
##  Mode  :character    
##                      
##                      
## 
round(mean(abvAvgSpeed$price),2)
## [1] 2412.49

Graphics

Please make sure to display at least one scatter plot, box plot and histogram. Don’t be limited to this. Please explore the many other options in R packages such as ggplot2.

Meaningful question for analysis

Please state at the beginning a meaningful question for analysis. Use the first three steps and anything else that would be helpful to answer the question you are posing from the data set you chose. Please write a brief conclusion paragraph in R markdown at the end.