Project title: “Project on Product category(cameras)”

Name: “N.Girish”

Email:“ch15b084@smail.iitm.ac.in

College:IIT Madras

Introduction

Product category usually refers to a wide range of varied products in the market.Here we have considered the cameras to be our product.Cameras nowadays have their prices varying in a range from thousands to lakhs in INR.Now what does this price actually depends upon matters a lot.With what all features does this price usually change with is our subject of study. So we have collected a dataset from an online source:kaggle.It shows a dataset with 1000 models of cameras and their varied features.Also it shows a column consisting of the prices of each model.To be particular there are 13 properties listed for each model of camera.Also we’ll be looking at the features which are in high correlation with price and if we can neglect on the features which are less correlated or negatively correlated with price. From this study we can tell a person who wants to buy a camera,the reasons for changes in prices of different models and one may also find out answers for some questions such as, the company with a model showing its best features which separates it from the other cameras and many others.

Overview of our study.

First we are going to consider a dataset of 1000 cameras with 13 features for each.So going straight into the dataset,lets know what each column name in the dataset refers to: The 13 properties of each camera:

Model

Release date

Max resolution

Low resolution

Effective pixels

Zoom wide (W)

Zoom tele (T)

Normal focus range

Macro focus range

Storage included

Weight (inc. batteries)

Dimensions

Price

Our analysis would be telling us some summary stastics and description of each feature.Then we would be mainly concentrating upon the question that how a camera’s price is dependent on these features.How do these prices change with the features of a camera.Also during this study we may find out the models offering a particular feature to its best.

Acknowledgements.

This dataset was taken from an online source:https://www.kaggle.com/crawford/1000-cameras-dataset. These datasets have been gathered and cleaned up by Petra Isenberg, Pierre Dragicevic and Yvonne Jansen. The original source can be found at https://perso.telecom-paristech.fr/eagan/class/igr204/datasets.

This dataset has been converted to CSV.

Cameras dataset.

setwd("C:/Users/Kalyan/Downloads")
cameras<-read.csv(paste("cameras_dataset.csv",sep=""))
View(cameras)
dim(cameras)
## [1] 1038   14

The length stands out to be 1038 rows i.e. 1038 observations with 14 columns i.e. 14 variables.

Summarizing the dataset.

summary(cameras)
##      Company                       Model       Release.date 
##  Olympus :122   Agfa ePhoto 1280      :   1   Min.   :1994  
##  Sony    :116   Agfa ePhoto 1680      :   1   1st Qu.:2002  
##  Canon   :115   Agfa ePhoto CL18      :   1   Median :2004  
##  Kodak   :102   Agfa ePhoto CL30      :   1   Mean   :2004  
##  Fujifilm: 99   Agfa ePhoto CL30 Clik!:   1   3rd Qu.:2006  
##  Nikon   : 90   Agfa ePhoto CL45      :   1   Max.   :2007  
##  (Other) :394   (Other)               :1032                 
##  Max.resolution Low.resolution Effective.pixels Zoom.wide..W.  
##  Min.   :   0   Min.   :   0   Min.   : 0.000   Min.   : 0.00  
##  1st Qu.:2048   1st Qu.:1120   1st Qu.: 3.000   1st Qu.:35.00  
##  Median :2560   Median :2048   Median : 4.000   Median :36.00  
##  Mean   :2475   Mean   :1774   Mean   : 4.596   Mean   :32.96  
##  3rd Qu.:3072   3rd Qu.:2560   3rd Qu.: 7.000   3rd Qu.:38.00  
##  Max.   :5616   Max.   :4992   Max.   :21.000   Max.   :52.00  
##                                                                
##  Zoom.tele..T.   Normal.focus.range Macro.focus.range Storage.included
##  Min.   :  0.0   Min.   :  0.00     Min.   : 0.000    Min.   :  0.00  
##  1st Qu.: 96.0   1st Qu.: 30.00     1st Qu.: 3.000    1st Qu.:  8.00  
##  Median :108.0   Median : 50.00     Median : 6.000    Median : 16.00  
##  Mean   :121.5   Mean   : 44.15     Mean   : 7.788    Mean   : 17.45  
##  3rd Qu.:117.0   3rd Qu.: 60.00     3rd Qu.:10.000    3rd Qu.: 20.00  
##  Max.   :518.0   Max.   :120.00     Max.   :85.000    Max.   :450.00  
##                                     NA's   :1         NA's   :2       
##  Weight..inc..batteries.   Dimensions        Price       
##  Min.   :   0.0          Min.   :  0.0   Min.   :  14.0  
##  1st Qu.: 180.0          1st Qu.: 92.0   1st Qu.: 149.0  
##  Median : 226.0          Median :101.0   Median : 199.0  
##  Mean   : 319.3          Mean   :105.4   Mean   : 457.4  
##  3rd Qu.: 350.0          3rd Qu.:115.0   3rd Qu.: 399.0  
##  Max.   :1860.0          Max.   :240.0   Max.   :7999.0  
##  NA's   :2               NA's   :2
str(cameras)
## 'data.frame':    1038 obs. of  14 variables:
##  $ Company                : Factor w/ 21 levels "Agfa","Canon",..: 1 1 1 1 1 1 1 2 2 2 ...
##  $ Model                  : Factor w/ 1038 levels "Agfa ePhoto 1280",..: 1 2 3 4 5 6 7 26 27 28 ...
##  $ Release.date           : int  1997 1998 2000 1999 1999 2001 1999 1997 1996 2001 ...
##  $ Max.resolution         : int  1024 1280 640 1152 1152 1600 1280 640 832 1280 ...
##  $ Low.resolution         : int  640 640 0 640 640 640 640 0 640 1024 ...
##  $ Effective.pixels       : int  0 1 0 0 0 1 1 0 0 1 ...
##  $ Zoom.wide..W.          : int  38 38 45 35 43 51 34 42 50 35 ...
##  $ Zoom.tele..T.          : int  114 114 45 35 43 51 102 42 50 105 ...
##  $ Normal.focus.range     : int  70 50 0 0 50 50 0 70 40 76 ...
##  $ Macro.focus.range      : int  40 0 0 0 0 20 0 3 10 16 ...
##  $ Storage.included       : int  4 4 2 4 40 8 8 2 1 8 ...
##  $ Weight..inc..batteries.: int  420 420 0 0 300 270 0 320 460 375 ...
##  $ Dimensions             : num  95 158 0 0 128 119 0 93 160 110 ...
##  $ Price                  : int  179 179 179 269 1299 179 179 149 139 139 ...
library(psych)
describe(cameras)
##                         vars    n    mean     sd median trimmed    mad
## Company*                   1 1038   10.92   5.80   12.0   10.89   7.41
## Model*                     2 1038  519.50 299.79  519.5  519.50 384.73
## Release.date               3 1038 2003.59   2.72 2004.0 2003.83   2.97
## Max.resolution             4 1038 2474.67 759.51 2560.0 2478.18 759.09
## Low.resolution             5 1038 1773.94 830.90 2048.0 1806.58 782.81
## Effective.pixels           6 1038    4.60   2.84    4.0    4.43   2.97
## Zoom.wide..W.              7 1038   32.96  10.33   36.0   35.55   2.97
## Zoom.tele..T.              8 1038  121.53  93.46  108.0  106.26  13.34
## Normal.focus.range         9 1038   44.15  24.14   50.0   44.96  14.83
## Macro.focus.range         10 1037    7.79   8.10    6.0    6.63   5.93
## Storage.included          11 1036   17.45  27.44   16.0   14.78  11.86
## Weight..inc..batteries.   12 1036  319.27 260.41  226.0  266.73  92.66
## Dimensions                13 1036  105.36  24.26  101.0  104.31  14.83
## Price                     14 1038  457.38 760.45  199.0  288.25 103.78
##                          min  max range  skew kurtosis    se
## Company*                   1   21    20 -0.03    -1.08  0.18
## Model*                     1 1038  1037  0.00    -1.20  9.31
## Release.date            1994 2007    13 -0.61    -0.47  0.08
## Max.resolution             0 5616  5616  0.01     0.06 23.57
## Low.resolution             0 4992  4992 -0.30    -0.40 25.79
## Effective.pixels           0   21    21  0.63     0.77  0.09
## Zoom.wide..W.              0   52    52 -2.57     5.58  0.32
## Zoom.tele..T.              0  518   518  1.87     3.93  2.90
## Normal.focus.range         0  120   120 -0.41    -0.43  0.75
## Macro.focus.range          0   85    85  3.64    25.51  0.25
## Storage.included           0  450   450 10.69   147.05  0.85
## Weight..inc..batteries.    0 1860  1860  2.82     9.75  8.09
## Dimensions                 0  240   240 -0.31     5.33  0.75
## Price                     14 7999  7985  5.17    36.82 23.60

One way contingency table to show the no. of models under each particular product company.

company<-with(cameras,table(Company))
company
## Company
##      Agfa     Canon     Casio    Contax     Epson  Fujifilm        HP 
##         7       115        63         2        15        99        46 
##    JVC GC     Kodak   Kyocera     Leica     Nikon   Olympus Panasonic 
##         2       102        15        11        90       122        55 
##    Pentax     Ricoh   Samsung     Sanyo     Sigma      Sony   Toshiba 
##        68        26        54         8         4       116        18
prop.table(company)*100
## Company
##       Agfa      Canon      Casio     Contax      Epson   Fujifilm 
##  0.6743738 11.0789981  6.0693642  0.1926782  1.4450867  9.5375723 
##         HP     JVC GC      Kodak    Kyocera      Leica      Nikon 
##  4.4315992  0.1926782  9.8265896  1.4450867  1.0597303  8.6705202 
##    Olympus  Panasonic     Pentax      Ricoh    Samsung      Sanyo 
## 11.7533719  5.2986513  6.5510597  2.5048170  5.2023121  0.7707129 
##      Sigma       Sony    Toshiba 
##  0.3853565 11.1753372  1.7341040

So Olympus,Sony and Canon provides the largest variety of products,making upto around 34% of the products together, in the given dataset.

Two way contingency table to show the no. of models with a particular pixel no. under each particular product company.

companycost<-xtabs(~Effective.pixels+Company,data=cameras)
companycost
##                 Company
## Effective.pixels Agfa Canon Casio Contax Epson Fujifilm HP JVC GC Kodak
##               0     4     4     3      0     4        0  0      0     7
##               1     3    15    13      0     4       21  8      0     6
##               2     0     1     0      0     0        9  3      0    17
##               3     0    26    10      0     3       20  7      2    12
##               4     0     4    10      1     3        6  5      0    13
##               5     0    15     5      0     0       10 12      0    15
##               6     0     9     6      1     1       19  6      0    13
##               7     0    14     9      0     0        2  2      0     7
##               8     0    14     3      0     0        5  3      0     7
##               9     0     0     0      0     0        6  0      0     0
##               10    0     6     3      0     0        0  0      0     1
##               11    0     1     0      0     0        0  0      0     0
##               12    0     4     1      0     0        1  0      0     1
##               13    0     0     0      0     0        0  0      0     3
##               16    0     1     0      0     0        0  0      0     0
##               21    0     1     0      0     0        0  0      0     0
##                 Company
## Effective.pixels Kyocera Leica Nikon Olympus Panasonic Pentax Ricoh
##               0        0     0     3       3         0      0     0
##               1        0     2    10      24         2      4     2
##               2        0     1     2       1         2      0     3
##               3       10     1    15      26         6     13     6
##               4        3     1     6      11        13      9     5
##               5        2     0     9       8         5      8     0
##               6        0     1    15       9         8     17     3
##               7        0     1    12      22        10      9     2
##               8        0     1     9      11         5      4     4
##               9        0     0     0       0         0      0     0
##               10       0     3     4       5         3      4     1
##               11       0     0     0       0         0      0     0
##               12       0     0     5       2         1      0     0
##               13       0     0     0       0         0      0     0
##               16       0     0     0       0         0      0     0
##               21       0     0     0       0         0      0     0
##                 Company
## Effective.pixels Samsung Sanyo Sigma Sony Toshiba
##               0        0     1     0    6       0
##               1        4     2     0   25       7
##               2        0     0     0    8       3
##               3        9     5     2   17       7
##               4        2     0     2    6       1
##               5        7     0     0   18       0
##               6        5     0     0    6       0
##               7       11     0     0   14       0
##               8        9     0     0   11       0
##               9        0     0     0    0       0
##               10       6     0     0    3       0
##               11       0     0     0    0       0
##               12       1     0     0    2       0
##               13       0     0     0    0       0
##               16       0     0     0    0       0
##               21       0     0     0    0       0

Canon has models with pixels ranging from 0-21.Also only Canon has models with high pixels of 16 and 21 with 1 model in each pixel no.

Visualizing the variables independently.

For Maximum Resolution

boxplot(cameras$Max.resolution,horizontal = TRUE,main="Maximum Resolution",xlab="Maximum Resolution",col="yellow")

hist(cameras$Max.resolution,main="Maximum Resolution",xlab="Maximum Resolution",col="yellow")

For Minimum Resolution

boxplot(cameras$Low.resolution,horizontal = TRUE,main="Minimum Resolution",xlab="Minimum Resolution",col="yellow")

hist(cameras$Low.resolution,main="Minimum Resolution",xlab="Minimum Resolution",col="yellow")

For effective pixels

boxplot(cameras$Effective.pixels,horizontal = TRUE,main="Effective Pixels",xlab="Effective pixels",col="yellow")

hist(cameras$Effective.pixels,main="Effective pixels",xlab="Effective Pixels",col="yellow")

For Zoom Wide

boxplot(cameras$Zoom.wide..W.,horizontal = TRUE,main="Zoom Wide angle",xlab="Zoom Wide angle",col="yellow")

hist(cameras$Zoom.wide..W.,main="Zoom Wide angle",xlab="Zoom Wide angle",col="yellow")

For Zoom Tele

boxplot(cameras$Zoom.tele..T.,horizontal = TRUE,main="Zoom Tele angle",xlab="Zoom Tele angle",col="yellow")

hist(cameras$Zoom.tele..T.,main="Zoom Tele angle",xlab="Zoom Tele angle",col="yellow")

For Normal focus range

boxplot(cameras$Normal.focus.range,horizontal = TRUE,main="Normal Focus Range ",xlab="Normal Focus Range",col="yellow")

hist(cameras$Normal.focus.range,main="Normal Focus Range",xlab="Normal Focus Range",col="yellow")

For Macro focus range

boxplot(cameras$Macro.focus.range,horizontal = TRUE,main="Macro Focus Range ",xlab="Macro Focus Range",col="yellow")

hist(cameras$Macro.focus.range,main="Macro Focus Range",xlab="Macro Focus Range",col="yellow")

For Storage

boxplot(cameras$Storage.included,horizontal = TRUE,main="Storage Capacity",xlab="Storage capacity",col="yellow")

hist(cameras$Storage.included,main="Storage Capacity",xlab="Storage Capacity",xlim=c(0,120),col="yellow")

For Weight

boxplot(cameras$Weight..inc..batteries.,horizontal = TRUE,main="Camera Weight",xlab="Camera Weight",col="yellow")

hist(cameras$Weight..inc..batteries.,main="Camera Weight",xlab="Camera Weight",col="yellow")

For Dimensions

boxplot(cameras$Dimensions,horizontal = TRUE,main="Camera Dimensions",xlab="Camera Dimensions",col="yellow")

hist(cameras$Dimensions,main="Camera Dimensions",xlab="Camera Dimensions",col="yellow")

For Price

boxplot(cameras$Price,horizontal = TRUE,main="Price in $",xlab="Price",col="yellow")

hist(cameras$Price,main="Price",xlab="Price in $",col="yellow")

Visualizing the variables correlated pairwise.

For the changes in Price with Maximum Resolution.

boxplot(cameras$Price~cameras$Max.resolution,ylab = "Maximum Resolution",xlab ="Price",main = "Changes in price with maximum resolution",horizontal=TRUE,col="yellow")

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplot(cameras$Price~cameras$Max.resolution,ylab = "Maximum Resolution",xlab ="Price",main = "Changes in price with maximum resolution",spread=FALSE)

For the changes in Price with Minimum Resolution.

boxplot(cameras$Price~cameras$Low.resolution,ylab = "Minimum Resolution",xlab ="Price",main = "Changes in price with minimum resolution",horizontal=TRUE,col="yellow")

library(car)
scatterplot(cameras$Price~cameras$Low.resolution,ylab = "Minimum Resolution",xlab ="Price",main = "Changes in price with minimum resolution",spread=FALSE)

For the changes in Price with Effective pixels.

boxplot(cameras$Price~cameras$Effective.pixels,ylab = "Effective pixels",xlab ="Price",main = "Changes in price with Effective pixels",horizontal=TRUE,col="yellow")

library(car)
scatterplot(cameras$Price~cameras$Effective.pixels,ylab = "Effective pixels",xlab ="Price",main = "Changes in price with effective pixels",spread=FALSE)

For the changes in Price with Wide Zoom Angle.

boxplot(cameras$Price~cameras$Zoom.wide..W.,ylab = "Zoom Wide angle",xlab ="Price",main = "Changes in price with zoom wide angle",horizontal=TRUE,col="yellow")

library(car)
scatterplot(cameras$Price~cameras$Zoom.wide..W.,ylab = "Zoom wide angle",xlab ="Price",main = "Changes in price with Zoom wide angle",spread=FALSE)

For the changes in Price with Zoom Tele Angle.

boxplot(cameras$Price~cameras$Zoom.tele..T.,ylab = "Zoom Tele angle",xlab ="Price",main = "Changes in price with zoom Tele angle",horizontal=TRUE,col="yellow")

library(car)
scatterplot(cameras$Price~cameras$Zoom.tele..T.,ylab = "Zoom Tele angle",xlab ="Price",main = "Changes in price with Zoom Tele angle",spread=FALSE)

For the changes in Price with Normal Focus Range.

boxplot(cameras$Price~cameras$Normal.focus.range,ylab = "Normal focus range",xlab ="Price",main = "Changes in price with normal focus range",horizontal=TRUE,col="yellow")

library(car)
scatterplot(cameras$Price~cameras$Normal.focus.range,ylab = "Normal focus range",xlab ="Price",main = "Changes in price with Normal focus range",spread=FALSE)

For the changes in Price with Macro Focus Range.

boxplot(cameras$Price~cameras$Macro.focus.range,ylab = "Macro focus range",xlab ="Price",main = "Changes in price with Macro focus range",horizontal=TRUE,col="yellow")

library(car)
scatterplot(cameras$Price~cameras$Macro.focus.range,ylab = "Macro focus range",xlab ="Price",main = "Changes in price with Macro focus range",spread=FALSE)

For the changes in Price with Storage capacity.

boxplot(cameras$Price~cameras$Storage.included,ylab = "Storage Capacity",xlab ="Price",main = "Changes in price with Storage Capacity",horizontal=TRUE,col="yellow")

library(car)
scatterplot(cameras$Price~cameras$Storage.included,ylab = "Storage Capacity",xlab ="Price",main = "Changes in price with Storage Capacity",spread=FALSE)

For the changes in Price with Weight(inclusive of batteries).

boxplot(cameras$Price~cameras$Weight..inc..batteries.,ylab = "Weight inclusive of batteries",xlab ="Price",main = "Changes in price with Weight inclusive of batteries",horizontal=TRUE,col="yellow")

library(car)
scatterplot(cameras$Price~cameras$Weight..inc..batteries.,ylab = "Weight inclusive of batteries",xlab ="Price",main = "Changes in price with Weight inclusive of batteries",spread=FALSE)

For the changes in Price with Dimensions.

boxplot(cameras$Price~cameras$Dimensions,ylab = "Dimensions",xlab ="Price",main = "Changes in price with Dimensions",horizontal=TRUE,col="yellow")

library(car)
scatterplot(cameras$Price~cameras$Dimensions,ylab = "Dimensions",xlab ="Price",main = "Changes in price with Dimensions",spread=FALSE)

Correlations

cor(cameras[,4:14],use = "complete.obs")
##                         Max.resolution Low.resolution Effective.pixels
## Max.resolution            1.000000e+00     0.84279021      0.953845857
## Low.resolution            8.427902e-01     1.00000000      0.820321877
## Effective.pixels          9.538459e-01     0.82032188      1.000000000
## Zoom.wide..W.            -3.739583e-01    -0.20647701     -0.328514362
## Zoom.tele..T.             6.937710e-02     0.15478456      0.084643956
## Normal.focus.range       -2.004487e-01    -0.12543612     -0.193252402
## Macro.focus.range        -3.473012e-01    -0.31578690     -0.321930012
## Storage.included          1.662223e-01     0.15665176      0.157844345
## Weight..inc..batteries.   1.066780e-01    -0.04492655      0.078198194
## Dimensions               -5.660724e-05    -0.10558457     -0.004076899
## Price                     1.842009e-01     0.15420406      0.190284008
##                         Zoom.wide..W. Zoom.tele..T. Normal.focus.range
## Max.resolution             -0.3739583    0.06937710         -0.2004487
## Low.resolution             -0.2064770    0.15478456         -0.1254361
## Effective.pixels           -0.3285144    0.08464396         -0.1932524
## Zoom.wide..W.               1.0000000    0.36464730          0.5385105
## Zoom.tele..T.               0.3646473    1.00000000          0.1676415
## Normal.focus.range          0.5385105    0.16764149          1.0000000
## Macro.focus.range           0.2933308   -0.07455414          0.3978316
## Storage.included            0.1509680    0.11407498          0.1596049
## Weight..inc..batteries.    -0.6987818   -0.06590409         -0.3933585
## Dimensions                 -0.4865417   -0.11817082         -0.2371075
## Price                      -0.4591034   -0.18948005         -0.2738539
##                         Macro.focus.range Storage.included
## Max.resolution                -0.34730122       0.16622226
## Low.resolution                -0.31578690       0.15665176
## Effective.pixels              -0.32193001       0.15784435
## Zoom.wide..W.                  0.29333084       0.15096797
## Zoom.tele..T.                 -0.07455414       0.11407498
## Normal.focus.range             0.39783162       0.15960485
## Macro.focus.range              1.00000000      -0.04376886
## Storage.included              -0.04376886       1.00000000
## Weight..inc..batteries.       -0.21055418      -0.15556845
## Dimensions                    -0.09033582      -0.11428012
## Price                         -0.12757671      -0.10304587
##                         Weight..inc..batteries.    Dimensions      Price
## Max.resolution                       0.10667805 -5.660724e-05  0.1842009
## Low.resolution                      -0.04492655 -1.055846e-01  0.1542041
## Effective.pixels                     0.07819819 -4.076899e-03  0.1902840
## Zoom.wide..W.                       -0.69878180 -4.865417e-01 -0.4591034
## Zoom.tele..T.                       -0.06590409 -1.181708e-01 -0.1894801
## Normal.focus.range                  -0.39335854 -2.371075e-01 -0.2738539
## Macro.focus.range                   -0.21055418 -9.033582e-02 -0.1275767
## Storage.included                    -0.15556845 -1.142801e-01 -0.1030459
## Weight..inc..batteries.              1.00000000  6.778848e-01  0.4647604
## Dimensions                           0.67788481  1.000000e+00  0.2642562
## Price                                0.46476035  2.642562e-01  1.0000000

Corrgram plot

library(corrgram)
corrgram(cameras, order=FALSE, lower.panel=panel.shade,
         upper.panel=panel.pie, text.panel=panel.txt,
         main="Corrgram of variables in cameras")

So we can see from the scatterplots,corrgram plot that price is positively correlated with the resolutions,Effective pixels,weight and dimensions whereas it is negatively correlated to the zoom angles,focus ranges and the storage.

Scatterplot Matrix

library(car)
scatterplotMatrix(formula = ~Max.resolution+Low.resolution+Effective.pixels+Zoom.wide..W.+Zoom.tele..T.+Normal.focus.range+Macro.focus.range+Storage.included+Weight..inc..batteries.+Dimensions+Price,data=cameras)

Correlation test to check our correlations.

cor.test(cameras$Max.resolution,cameras$Price)
## 
##  Pearson's product-moment correlation
## 
## data:  cameras$Max.resolution and cameras$Price
## t = 5.9982, df = 1036, p-value = 2.753e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1237345 0.2413592
## sample estimates:
##       cor 
## 0.1832025
cor.test(cameras$Low.resolution,cameras$Price)
## 
##  Pearson's product-moment correlation
## 
## data:  cameras$Low.resolution and cameras$Price
## t = 5.0226, df = 1036, p-value = 5.999e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.09421604 0.21302819
## sample estimates:
##       cor 
## 0.1541794
cor.test(cameras$Effective.pixels,cameras$Price)
## 
##  Pearson's product-moment correlation
## 
## data:  cameras$Effective.pixels and cameras$Price
## t = 6.2001, df = 1036, p-value = 8.139e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1297959 0.2471521
## sample estimates:
##       cor 
## 0.1891493
cor.test(cameras$Zoom.wide..W.,cameras$Price)
## 
##  Pearson's product-moment correlation
## 
## data:  cameras$Zoom.wide..W. and cameras$Price
## t = -16.64, df = 1036, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5059455 -0.4098410
## sample estimates:
##       cor 
## -0.459236
cor.test(cameras$Zoom.tele..T.,cameras$Price)
## 
##  Pearson's product-moment correlation
## 
## data:  cameras$Zoom.tele..T. and cameras$Price
## t = -6.2078, df = 1036, p-value = 7.762e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2473734 -0.1300277
## sample estimates:
##        cor 
## -0.1893766
cor.test(cameras$Normal.focus.range,cameras$Price)
## 
##  Pearson's product-moment correlation
## 
## data:  cameras$Normal.focus.range and cameras$Price
## t = -9.1692, df = 1036, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3293316 -0.2167404
## sample estimates:
##        cor 
## -0.2739745
cor.test(cameras$Macro.focus.range,cameras$Price)
## 
##  Pearson's product-moment correlation
## 
## data:  cameras$Macro.focus.range and cameras$Price
## t = -4.1409, df = 1035, p-value = 3.741e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.18708318 -0.06730696
## sample estimates:
##        cor 
## -0.1276605
cor.test(cameras$Storage.included,cameras$Price)
## 
##  Pearson's product-moment correlation
## 
## data:  cameras$Storage.included and cameras$Price
## t = -3.3313, df = 1034, p-value = 0.0008951
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.16292930 -0.04240602
## sample estimates:
##        cor 
## -0.1030459
cor.test(cameras$Weight..inc..batteries.,cameras$Price)
## 
##  Pearson's product-moment correlation
## 
## data:  cameras$Weight..inc..batteries. and cameras$Price
## t = 16.878, df = 1034, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4156192 0.5111961
## sample estimates:
##       cor 
## 0.4647604
cor.test(cameras$Dimensions,cameras$Price)
## 
##  Pearson's product-moment correlation
## 
## data:  cameras$Dimensions and cameras$Price
## t = 8.8106, df = 1034, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2066766 0.3200116
## sample estimates:
##       cor 
## 0.2642562

So we can see that price is correlated with all the features since in all these cases p values are less than 0.05 and hence the alternative hypothesis is accepted rejecting the null hypothesis.

Chi-squared tests

chisq.test(xtabs(~cameras$Max.resolution+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Max.resolution + cameras$Price)): Chi-
## squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  xtabs(~cameras$Max.resolution + cameras$Price)
## X-squared = 9455.4, df = 4116, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Low.resolution+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Low.resolution + cameras$Price)): Chi-
## squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  xtabs(~cameras$Low.resolution + cameras$Price)
## X-squared = 9881.3, df = 2898, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Effective.pixels+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Effective.pixels + cameras$Price)):
## Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  xtabs(~cameras$Effective.pixels + cameras$Price)
## X-squared = 2123.4, df = 630, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Zoom.wide..W.+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Zoom.wide..W. + cameras$Price)): Chi-
## squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  xtabs(~cameras$Zoom.wide..W. + cameras$Price)
## X-squared = 1756.7, df = 1008, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Zoom.tele..T.+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Zoom.tele..T. + cameras$Price)): Chi-
## squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  xtabs(~cameras$Zoom.tele..T. + cameras$Price)
## X-squared = 5931.7, df = 4158, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Normal.focus.range+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Normal.focus.range + cameras$Price)):
## Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  xtabs(~cameras$Normal.focus.range + cameras$Price)
## X-squared = 2917.1, df = 1302, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Macro.focus.range+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Macro.focus.range + cameras$Price)):
## Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  xtabs(~cameras$Macro.focus.range + cameras$Price)
## X-squared = 2320.7, df = 1176, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Storage.included+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Storage.included + cameras$Price)):
## Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  xtabs(~cameras$Storage.included + cameras$Price)
## X-squared = 3898.5, df = 1806, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Weight..inc..batteries.+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Weight..inc..batteries. + cameras
## $Price)): Chi-squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  xtabs(~cameras$Weight..inc..batteries. + cameras$Price)
## X-squared = 16805, df = 9912, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Dimensions+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Dimensions + cameras$Price)): Chi-
## squared approximation may be incorrect
## 
##  Pearson's Chi-squared test
## 
## data:  xtabs(~cameras$Dimensions + cameras$Price)
## X-squared = 8089.7, df = 4200, p-value < 2.2e-16

So even here we could see that in all the cases the p value is p<0.05 and hence we reject the null hypothesis and accept the interdependence of price with all the features.

T Test to analyze our hypothesis.

t.test(cameras$Max.resolution,cameras$Price)
## 
##  Welch Two Sample t-test
## 
## data:  cameras$Max.resolution and cameras$Price
## t = 60.471, df = 2074, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1951.866 2082.710
## sample estimates:
## mean of x mean of y 
## 2474.6724  457.3844
t.test(cameras$Low.resolution,cameras$Price)
## 
##  Welch Two Sample t-test
## 
## data:  cameras$Low.resolution and cameras$Price
## t = 37.658, df = 2057.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1247.990 1385.114
## sample estimates:
## mean of x mean of y 
## 1773.9364  457.3844
t.test(cameras$Effective.pixels,cameras$Price)
## 
##  Welch Two Sample t-test
## 
## data:  cameras$Effective.pixels and cameras$Price
## t = -19.183, df = 1037, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -499.1042 -406.4720
## sample estimates:
##  mean of x  mean of y 
##   4.596339 457.384393
t.test(cameras$Zoom.wide..W.,cameras$Price)
## 
##  Welch Two Sample t-test
## 
## data:  cameras$Zoom.wide..W. and cameras$Price
## t = -17.98, df = 1037.4, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -470.741 -378.101
## sample estimates:
## mean of x mean of y 
##  32.96339 457.38439
t.test(cameras$Zoom.tele..T.,cameras$Price)
## 
##  Welch Two Sample t-test
## 
## data:  cameras$Zoom.tele..T. and cameras$Price
## t = -14.123, df = 1068.3, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -382.5220 -289.1967
## sample estimates:
## mean of x mean of y 
##  121.5250  457.3844
t.test(cameras$Normal.focus.range,cameras$Price)
## 
##  Welch Two Sample t-test
## 
## data:  cameras$Normal.focus.range and cameras$Price
## t = -17.499, df = 1039.1, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -459.5779 -366.8999
## sample estimates:
## mean of x mean of y 
##  44.14547 457.38439
t.test(cameras$Macro.focus.range,cameras$Price)
## 
##  Welch Two Sample t-test
## 
## data:  cameras$Macro.focus.range and cameras$Price
## t = -19.047, df = 1037.2, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -495.9149 -403.2782
## sample estimates:
## mean of x mean of y 
##   7.78785 457.38439
t.test(cameras$Storage.included,cameras$Price)
## 
##  Welch Two Sample t-test
## 
## data:  cameras$Storage.included and cameras$Price
## t = -18.627, df = 1039.7, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -486.2824 -393.5907
## sample estimates:
## mean of x mean of y 
##  17.44788 457.38439
t.test(cameras$Weight..inc..batteries.,cameras$Price)
## 
##  Welch Two Sample t-test
## 
## data:  cameras$Weight..inc..batteries. and cameras$Price
## t = -5.5355, df = 1277.3, p-value = 3.762e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -187.06929  -89.16861
## sample estimates:
## mean of x mean of y 
##  319.2654  457.3844
t.test(cameras$Dimensions,cameras$Price)
## 
##  Welch Two Sample t-test
## 
## data:  cameras$Dimensions and cameras$Price
## t = -14.906, df = 1039.1, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -398.3603 -305.6817
## sample estimates:
## mean of x mean of y 
##  105.3634  457.3844

Here we could see that p value in all cases is p<0.05 and hence our correlations are significant.We reject the null hypothesis which rejects the dependence of factors on each other,From this we could finally conclude that price is correlated with and dependent upon all the features of a camera.

Regression analysis

We will be formulating a regression model as:Price=b0+b1(Max resolution)+b2(Low resolution)+b3(Effective pixels)+b4(Zoom wide angle)+b5(Zoom Tele)+b6(Normal focus range)+b7(Macro focus range)+b8(Storage included)+b9(Weight)+b10(Dimensions).So here we are including all features of the camera to check with its price.

fit1<-lm(formula=Price~Max.resolution+Low.resolution+Effective.pixels+Zoom.wide..W.+Zoom.tele..T.+Normal.focus.range+Macro.focus.range+Storage.included+Weight..inc..batteries.+Dimensions, data = cameras)
summary(fit1)
## 
## Call:
## lm(formula = Price ~ Max.resolution + Low.resolution + Effective.pixels + 
##     Zoom.wide..W. + Zoom.tele..T. + Normal.focus.range + Macro.focus.range + 
##     Storage.included + Weight..inc..batteries. + Dimensions, 
##     data = cameras)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2378.0  -233.1  -105.4    58.9  5562.3 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             967.26665  242.59736   3.987 7.16e-05 ***
## Max.resolution           -0.38814    0.09798  -3.961 7.97e-05 ***
## Low.resolution            0.24205    0.04697   5.153 3.07e-07 ***
## Effective.pixels         78.91679   23.58787   3.346 0.000851 ***
## Zoom.wide..W.            -7.78381    3.72525  -2.089 0.036911 *  
## Zoom.tele..T.            -1.31217    0.25707  -5.104 3.96e-07 ***
## Normal.focus.range       -0.79766    1.03596  -0.770 0.441490    
## Macro.focus.range         3.25358    2.84521   1.144 0.253086    
## Storage.included         -0.68525    0.76011  -0.902 0.367525    
## Weight..inc..batteries.   1.38801    0.13815  10.047  < 2e-16 ***
## Dimensions               -3.28937    1.12876  -2.914 0.003644 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 638.7 on 1025 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.3026, Adjusted R-squared:  0.2958 
## F-statistic: 44.48 on 10 and 1025 DF,  p-value: < 2.2e-16

Now lets consider only those features with which the price is positively correlated.So for this our regression model would be:Price=b0+b1(Max resolution)+b2(Low resolution)+b3(Effective pixels)+b4(Weight)+b5*(Dimensions).

fit2<-lm(formula=Price~Max.resolution+Low.resolution+Effective.pixels+Weight..inc..batteries.+Dimensions, data = cameras)
summary(fit2)
## 
## Call:
## lm(formula = Price ~ Max.resolution + Low.resolution + Effective.pixels + 
##     Weight..inc..batteries. + Dimensions, data = cameras)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2278.6  -247.1  -104.4    57.3  5648.4 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             288.75059  165.78776   1.742  0.08186 .  
## Max.resolution           -0.29474    0.09775  -3.015  0.00263 ** 
## Low.resolution            0.18667    0.04754   3.926  9.2e-05 ***
## Effective.pixels         70.02774   24.16005   2.898  0.00383 ** 
## Weight..inc..batteries.   1.57042    0.10982  14.299  < 2e-16 ***
## Dimensions               -2.42978    1.15491  -2.104  0.03563 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 658 on 1030 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.2562, Adjusted R-squared:  0.2526 
## F-statistic: 70.97 on 5 and 1030 DF,  p-value: < 2.2e-16

Now lets consider only those features with which the price is negatively correlated.So for this our regression model would be:Price=b0+b1(Zoom wide angle)+b2(Zoom Tele)+b3(Normal focus range)+b4(Macro focus range)+b5*(Storage included).

fit3<-lm(formula=Price~Zoom.wide..W.+Zoom.tele..T.+Normal.focus.range+Macro.focus.range+Storage.included, data = cameras)
summary(fit3)
## 
## Call:
## lm(formula = Price ~ Zoom.wide..W. + Zoom.tele..T. + Normal.focus.range + 
##     Macro.focus.range + Storage.included, data = cameras)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1561.3  -220.5  -132.2    11.8  6418.7 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1580.2804    70.4360  22.436   <2e-16 ***
## Zoom.wide..W.       -31.5536     2.5994 -12.139   <2e-16 ***
## Zoom.tele..T.        -0.1833     0.2470  -0.742    0.458    
## Normal.focus.range   -1.2221     1.0913  -1.120    0.263    
## Macro.focus.range     1.0019     2.9239   0.343    0.732    
## Storage.included     -0.8069     0.7860  -1.027    0.305    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 676.7 on 1030 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.2134, Adjusted R-squared:  0.2095 
## F-statistic: 55.87 on 5 and 1030 DF,  p-value: < 2.2e-16

After all this we will be considering only those features which are in a better correlation with price when compared to other factors.Here we will be considering both positive and negative correlations.Also those features which have somewhat stonger correlations with price i.e. correlation > 0.2 or near to the number.So our regression model will be:Price=b0+b1(Max resolution)+b2(Zoom wide angle)+b3(Zoom Tele)+b4(Normal focus range)+b5(Effective pixels)+b6(Weight)+b7*(Dimensions).

fit4<-lm(formula=Price~Max.resolution+Effective.pixels+Zoom.wide..W.+Zoom.tele..T.+Normal.focus.range+Weight..inc..batteries.+Dimensions, data = cameras)
summary(fit4)
## 
## Call:
## lm(formula = Price ~ Max.resolution + Effective.pixels + Zoom.wide..W. + 
##     Zoom.tele..T. + Normal.focus.range + Weight..inc..batteries. + 
##     Dimensions, data = cameras)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2453.7  -213.0  -114.0    32.9  5422.3 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             994.00263  242.83353   4.093 4.59e-05 ***
## Max.resolution           -0.22106    0.09135  -2.420 0.015698 *  
## Effective.pixels         89.88447   23.77622   3.780 0.000166 ***
## Zoom.wide..W.            -8.55472    3.75212  -2.280 0.022814 *  
## Zoom.tele..T.            -1.14531    0.25523  -4.487 8.02e-06 ***
## Normal.focus.range       -0.65687    0.99190  -0.662 0.507971    
## Weight..inc..batteries.   1.27750    0.13803   9.255  < 2e-16 ***
## Dimensions               -3.41371    1.14186  -2.990 0.002860 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 646.5 on 1028 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.2834, Adjusted R-squared:  0.2785 
## F-statistic: 58.08 on 7 and 1028 DF,  p-value: < 2.2e-16

Finally we will be fitting the model with features which are inbuilt and hence hold a vital role in a camera.Regression model:Price=b0+b1(Max resolution)+b2(Low resolution)+b3(Effective pixels)+b4(Zoom wide angle)+b5(Zoom Tele)+b6(Normal focus range)+b7(Macro focus range)+b8(Storage included).

fit5<-lm(formula=Price~Max.resolution+Low.resolution+Effective.pixels+Zoom.wide..W.+Zoom.tele..T.+Normal.focus.range+Macro.focus.range+Storage.included, data = cameras)
summary(fit5)
## 
## Call:
## lm(formula = Price ~ Max.resolution + Low.resolution + Effective.pixels + 
##     Zoom.wide..W. + Zoom.tele..T. + Normal.focus.range + Macro.focus.range + 
##     Storage.included, data = cameras)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1533.4  -236.7  -132.3    24.7  6469.1 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1942.68192  175.60691  11.063  < 2e-16 ***
## Max.resolution       -0.37831    0.10254  -3.689 0.000237 ***
## Low.resolution        0.15544    0.04838   3.213 0.001355 ** 
## Effective.pixels     73.52505   24.70609   2.976 0.002989 ** 
## Zoom.wide..W.       -32.71498    2.82258 -11.590  < 2e-16 ***
## Zoom.tele..T.        -0.32390    0.25015  -1.295 0.195679    
## Normal.focus.range   -1.03518    1.08618  -0.953 0.340791    
## Macro.focus.range     2.09637    2.98439   0.702 0.482561    
## Storage.included     -0.89641    0.79764  -1.124 0.261347    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 670.5 on 1027 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:   0.23,  Adjusted R-squared:  0.224 
## F-statistic: 38.34 on 8 and 1027 DF,  p-value: < 2.2e-16

Inferences from the Regression analysis.

So we could see that from the first regression model we are getting the maximum value of multiple and adjusted r-squared values.Multiple R-squared value:0.3026 and Adjusted R-squared value:0.2958.Also in each model the p-value was less than 0.05,showing the results and each fit being significant enough with their particular r-squared values but the first fit being the best.So we would say that the first model would be the best fit model.Hence we conclude that the price of a camera is dependent on all the features considered.

Final Results and conclusions:

Finally we could say that the price is dependent upon all the ten features given in the dataset.A person should check out on all these features before buying a particular product.If he wants to buy a particular product with a better feature then obviously price will increase.Even we found out that Olympus,Sony and Canon provides the largest variety of products,making upto around 34% of the products together, in the given dataset.Then we could check from the summary statistics that what is the highest number i.e. maximum of each feature.