Product category usually refers to a wide range of varied products in the market.Here we have considered the cameras to be our product.Cameras nowadays have their prices varying in a range from thousands to lakhs in INR.Now what does this price actually depends upon matters a lot.With what all features does this price usually change with is our subject of study. So we have collected a dataset from an online source:kaggle.It shows a dataset with 1000 models of cameras and their varied features.Also it shows a column consisting of the prices of each model.To be particular there are 13 properties listed for each model of camera.Also we’ll be looking at the features which are in high correlation with price and if we can neglect on the features which are less correlated or negatively correlated with price. From this study we can tell a person who wants to buy a camera,the reasons for changes in prices of different models and one may also find out answers for some questions such as, the company with a model showing its best features which separates it from the other cameras and many others.
First we are going to consider a dataset of 1000 cameras with 13 features for each.So going straight into the dataset,lets know what each column name in the dataset refers to: The 13 properties of each camera:
Our analysis would be telling us some summary stastics and description of each feature.Then we would be mainly concentrating upon the question that how a camera’s price is dependent on these features.How do these prices change with the features of a camera.Also during this study we may find out the models offering a particular feature to its best.
This dataset was taken from an online source:https://www.kaggle.com/crawford/1000-cameras-dataset. These datasets have been gathered and cleaned up by Petra Isenberg, Pierre Dragicevic and Yvonne Jansen. The original source can be found at https://perso.telecom-paristech.fr/eagan/class/igr204/datasets.
This dataset has been converted to CSV.
setwd("C:/Users/Kalyan/Downloads")
cameras<-read.csv(paste("cameras_dataset.csv",sep=""))
View(cameras)
dim(cameras)
## [1] 1038 14
The length stands out to be 1038 rows i.e. 1038 observations with 14 columns i.e. 14 variables.
summary(cameras)
## Company Model Release.date
## Olympus :122 Agfa ePhoto 1280 : 1 Min. :1994
## Sony :116 Agfa ePhoto 1680 : 1 1st Qu.:2002
## Canon :115 Agfa ePhoto CL18 : 1 Median :2004
## Kodak :102 Agfa ePhoto CL30 : 1 Mean :2004
## Fujifilm: 99 Agfa ePhoto CL30 Clik!: 1 3rd Qu.:2006
## Nikon : 90 Agfa ePhoto CL45 : 1 Max. :2007
## (Other) :394 (Other) :1032
## Max.resolution Low.resolution Effective.pixels Zoom.wide..W.
## Min. : 0 Min. : 0 Min. : 0.000 Min. : 0.00
## 1st Qu.:2048 1st Qu.:1120 1st Qu.: 3.000 1st Qu.:35.00
## Median :2560 Median :2048 Median : 4.000 Median :36.00
## Mean :2475 Mean :1774 Mean : 4.596 Mean :32.96
## 3rd Qu.:3072 3rd Qu.:2560 3rd Qu.: 7.000 3rd Qu.:38.00
## Max. :5616 Max. :4992 Max. :21.000 Max. :52.00
##
## Zoom.tele..T. Normal.focus.range Macro.focus.range Storage.included
## Min. : 0.0 Min. : 0.00 Min. : 0.000 Min. : 0.00
## 1st Qu.: 96.0 1st Qu.: 30.00 1st Qu.: 3.000 1st Qu.: 8.00
## Median :108.0 Median : 50.00 Median : 6.000 Median : 16.00
## Mean :121.5 Mean : 44.15 Mean : 7.788 Mean : 17.45
## 3rd Qu.:117.0 3rd Qu.: 60.00 3rd Qu.:10.000 3rd Qu.: 20.00
## Max. :518.0 Max. :120.00 Max. :85.000 Max. :450.00
## NA's :1 NA's :2
## Weight..inc..batteries. Dimensions Price
## Min. : 0.0 Min. : 0.0 Min. : 14.0
## 1st Qu.: 180.0 1st Qu.: 92.0 1st Qu.: 149.0
## Median : 226.0 Median :101.0 Median : 199.0
## Mean : 319.3 Mean :105.4 Mean : 457.4
## 3rd Qu.: 350.0 3rd Qu.:115.0 3rd Qu.: 399.0
## Max. :1860.0 Max. :240.0 Max. :7999.0
## NA's :2 NA's :2
str(cameras)
## 'data.frame': 1038 obs. of 14 variables:
## $ Company : Factor w/ 21 levels "Agfa","Canon",..: 1 1 1 1 1 1 1 2 2 2 ...
## $ Model : Factor w/ 1038 levels "Agfa ePhoto 1280",..: 1 2 3 4 5 6 7 26 27 28 ...
## $ Release.date : int 1997 1998 2000 1999 1999 2001 1999 1997 1996 2001 ...
## $ Max.resolution : int 1024 1280 640 1152 1152 1600 1280 640 832 1280 ...
## $ Low.resolution : int 640 640 0 640 640 640 640 0 640 1024 ...
## $ Effective.pixels : int 0 1 0 0 0 1 1 0 0 1 ...
## $ Zoom.wide..W. : int 38 38 45 35 43 51 34 42 50 35 ...
## $ Zoom.tele..T. : int 114 114 45 35 43 51 102 42 50 105 ...
## $ Normal.focus.range : int 70 50 0 0 50 50 0 70 40 76 ...
## $ Macro.focus.range : int 40 0 0 0 0 20 0 3 10 16 ...
## $ Storage.included : int 4 4 2 4 40 8 8 2 1 8 ...
## $ Weight..inc..batteries.: int 420 420 0 0 300 270 0 320 460 375 ...
## $ Dimensions : num 95 158 0 0 128 119 0 93 160 110 ...
## $ Price : int 179 179 179 269 1299 179 179 149 139 139 ...
library(psych)
describe(cameras)
## vars n mean sd median trimmed mad
## Company* 1 1038 10.92 5.80 12.0 10.89 7.41
## Model* 2 1038 519.50 299.79 519.5 519.50 384.73
## Release.date 3 1038 2003.59 2.72 2004.0 2003.83 2.97
## Max.resolution 4 1038 2474.67 759.51 2560.0 2478.18 759.09
## Low.resolution 5 1038 1773.94 830.90 2048.0 1806.58 782.81
## Effective.pixels 6 1038 4.60 2.84 4.0 4.43 2.97
## Zoom.wide..W. 7 1038 32.96 10.33 36.0 35.55 2.97
## Zoom.tele..T. 8 1038 121.53 93.46 108.0 106.26 13.34
## Normal.focus.range 9 1038 44.15 24.14 50.0 44.96 14.83
## Macro.focus.range 10 1037 7.79 8.10 6.0 6.63 5.93
## Storage.included 11 1036 17.45 27.44 16.0 14.78 11.86
## Weight..inc..batteries. 12 1036 319.27 260.41 226.0 266.73 92.66
## Dimensions 13 1036 105.36 24.26 101.0 104.31 14.83
## Price 14 1038 457.38 760.45 199.0 288.25 103.78
## min max range skew kurtosis se
## Company* 1 21 20 -0.03 -1.08 0.18
## Model* 1 1038 1037 0.00 -1.20 9.31
## Release.date 1994 2007 13 -0.61 -0.47 0.08
## Max.resolution 0 5616 5616 0.01 0.06 23.57
## Low.resolution 0 4992 4992 -0.30 -0.40 25.79
## Effective.pixels 0 21 21 0.63 0.77 0.09
## Zoom.wide..W. 0 52 52 -2.57 5.58 0.32
## Zoom.tele..T. 0 518 518 1.87 3.93 2.90
## Normal.focus.range 0 120 120 -0.41 -0.43 0.75
## Macro.focus.range 0 85 85 3.64 25.51 0.25
## Storage.included 0 450 450 10.69 147.05 0.85
## Weight..inc..batteries. 0 1860 1860 2.82 9.75 8.09
## Dimensions 0 240 240 -0.31 5.33 0.75
## Price 14 7999 7985 5.17 36.82 23.60
company<-with(cameras,table(Company))
company
## Company
## Agfa Canon Casio Contax Epson Fujifilm HP
## 7 115 63 2 15 99 46
## JVC GC Kodak Kyocera Leica Nikon Olympus Panasonic
## 2 102 15 11 90 122 55
## Pentax Ricoh Samsung Sanyo Sigma Sony Toshiba
## 68 26 54 8 4 116 18
prop.table(company)*100
## Company
## Agfa Canon Casio Contax Epson Fujifilm
## 0.6743738 11.0789981 6.0693642 0.1926782 1.4450867 9.5375723
## HP JVC GC Kodak Kyocera Leica Nikon
## 4.4315992 0.1926782 9.8265896 1.4450867 1.0597303 8.6705202
## Olympus Panasonic Pentax Ricoh Samsung Sanyo
## 11.7533719 5.2986513 6.5510597 2.5048170 5.2023121 0.7707129
## Sigma Sony Toshiba
## 0.3853565 11.1753372 1.7341040
So Olympus,Sony and Canon provides the largest variety of products,making upto around 34% of the products together, in the given dataset.
companycost<-xtabs(~Effective.pixels+Company,data=cameras)
companycost
## Company
## Effective.pixels Agfa Canon Casio Contax Epson Fujifilm HP JVC GC Kodak
## 0 4 4 3 0 4 0 0 0 7
## 1 3 15 13 0 4 21 8 0 6
## 2 0 1 0 0 0 9 3 0 17
## 3 0 26 10 0 3 20 7 2 12
## 4 0 4 10 1 3 6 5 0 13
## 5 0 15 5 0 0 10 12 0 15
## 6 0 9 6 1 1 19 6 0 13
## 7 0 14 9 0 0 2 2 0 7
## 8 0 14 3 0 0 5 3 0 7
## 9 0 0 0 0 0 6 0 0 0
## 10 0 6 3 0 0 0 0 0 1
## 11 0 1 0 0 0 0 0 0 0
## 12 0 4 1 0 0 1 0 0 1
## 13 0 0 0 0 0 0 0 0 3
## 16 0 1 0 0 0 0 0 0 0
## 21 0 1 0 0 0 0 0 0 0
## Company
## Effective.pixels Kyocera Leica Nikon Olympus Panasonic Pentax Ricoh
## 0 0 0 3 3 0 0 0
## 1 0 2 10 24 2 4 2
## 2 0 1 2 1 2 0 3
## 3 10 1 15 26 6 13 6
## 4 3 1 6 11 13 9 5
## 5 2 0 9 8 5 8 0
## 6 0 1 15 9 8 17 3
## 7 0 1 12 22 10 9 2
## 8 0 1 9 11 5 4 4
## 9 0 0 0 0 0 0 0
## 10 0 3 4 5 3 4 1
## 11 0 0 0 0 0 0 0
## 12 0 0 5 2 1 0 0
## 13 0 0 0 0 0 0 0
## 16 0 0 0 0 0 0 0
## 21 0 0 0 0 0 0 0
## Company
## Effective.pixels Samsung Sanyo Sigma Sony Toshiba
## 0 0 1 0 6 0
## 1 4 2 0 25 7
## 2 0 0 0 8 3
## 3 9 5 2 17 7
## 4 2 0 2 6 1
## 5 7 0 0 18 0
## 6 5 0 0 6 0
## 7 11 0 0 14 0
## 8 9 0 0 11 0
## 9 0 0 0 0 0
## 10 6 0 0 3 0
## 11 0 0 0 0 0
## 12 1 0 0 2 0
## 13 0 0 0 0 0
## 16 0 0 0 0 0
## 21 0 0 0 0 0
Canon has models with pixels ranging from 0-21.Also only Canon has models with high pixels of 16 and 21 with 1 model in each pixel no.
For Maximum Resolution
boxplot(cameras$Max.resolution,horizontal = TRUE,main="Maximum Resolution",xlab="Maximum Resolution",col="yellow")
hist(cameras$Max.resolution,main="Maximum Resolution",xlab="Maximum Resolution",col="yellow")
For Minimum Resolution
boxplot(cameras$Low.resolution,horizontal = TRUE,main="Minimum Resolution",xlab="Minimum Resolution",col="yellow")
hist(cameras$Low.resolution,main="Minimum Resolution",xlab="Minimum Resolution",col="yellow")
For effective pixels
boxplot(cameras$Effective.pixels,horizontal = TRUE,main="Effective Pixels",xlab="Effective pixels",col="yellow")
hist(cameras$Effective.pixels,main="Effective pixels",xlab="Effective Pixels",col="yellow")
For Zoom Wide
boxplot(cameras$Zoom.wide..W.,horizontal = TRUE,main="Zoom Wide angle",xlab="Zoom Wide angle",col="yellow")
hist(cameras$Zoom.wide..W.,main="Zoom Wide angle",xlab="Zoom Wide angle",col="yellow")
For Zoom Tele
boxplot(cameras$Zoom.tele..T.,horizontal = TRUE,main="Zoom Tele angle",xlab="Zoom Tele angle",col="yellow")
hist(cameras$Zoom.tele..T.,main="Zoom Tele angle",xlab="Zoom Tele angle",col="yellow")
For Normal focus range
boxplot(cameras$Normal.focus.range,horizontal = TRUE,main="Normal Focus Range ",xlab="Normal Focus Range",col="yellow")
hist(cameras$Normal.focus.range,main="Normal Focus Range",xlab="Normal Focus Range",col="yellow")
For Macro focus range
boxplot(cameras$Macro.focus.range,horizontal = TRUE,main="Macro Focus Range ",xlab="Macro Focus Range",col="yellow")
hist(cameras$Macro.focus.range,main="Macro Focus Range",xlab="Macro Focus Range",col="yellow")
For Storage
boxplot(cameras$Storage.included,horizontal = TRUE,main="Storage Capacity",xlab="Storage capacity",col="yellow")
hist(cameras$Storage.included,main="Storage Capacity",xlab="Storage Capacity",xlim=c(0,120),col="yellow")
For Weight
boxplot(cameras$Weight..inc..batteries.,horizontal = TRUE,main="Camera Weight",xlab="Camera Weight",col="yellow")
hist(cameras$Weight..inc..batteries.,main="Camera Weight",xlab="Camera Weight",col="yellow")
For Dimensions
boxplot(cameras$Dimensions,horizontal = TRUE,main="Camera Dimensions",xlab="Camera Dimensions",col="yellow")
hist(cameras$Dimensions,main="Camera Dimensions",xlab="Camera Dimensions",col="yellow")
For Price
boxplot(cameras$Price,horizontal = TRUE,main="Price in $",xlab="Price",col="yellow")
hist(cameras$Price,main="Price",xlab="Price in $",col="yellow")
cor(cameras[,4:14],use = "complete.obs")
## Max.resolution Low.resolution Effective.pixels
## Max.resolution 1.000000e+00 0.84279021 0.953845857
## Low.resolution 8.427902e-01 1.00000000 0.820321877
## Effective.pixels 9.538459e-01 0.82032188 1.000000000
## Zoom.wide..W. -3.739583e-01 -0.20647701 -0.328514362
## Zoom.tele..T. 6.937710e-02 0.15478456 0.084643956
## Normal.focus.range -2.004487e-01 -0.12543612 -0.193252402
## Macro.focus.range -3.473012e-01 -0.31578690 -0.321930012
## Storage.included 1.662223e-01 0.15665176 0.157844345
## Weight..inc..batteries. 1.066780e-01 -0.04492655 0.078198194
## Dimensions -5.660724e-05 -0.10558457 -0.004076899
## Price 1.842009e-01 0.15420406 0.190284008
## Zoom.wide..W. Zoom.tele..T. Normal.focus.range
## Max.resolution -0.3739583 0.06937710 -0.2004487
## Low.resolution -0.2064770 0.15478456 -0.1254361
## Effective.pixels -0.3285144 0.08464396 -0.1932524
## Zoom.wide..W. 1.0000000 0.36464730 0.5385105
## Zoom.tele..T. 0.3646473 1.00000000 0.1676415
## Normal.focus.range 0.5385105 0.16764149 1.0000000
## Macro.focus.range 0.2933308 -0.07455414 0.3978316
## Storage.included 0.1509680 0.11407498 0.1596049
## Weight..inc..batteries. -0.6987818 -0.06590409 -0.3933585
## Dimensions -0.4865417 -0.11817082 -0.2371075
## Price -0.4591034 -0.18948005 -0.2738539
## Macro.focus.range Storage.included
## Max.resolution -0.34730122 0.16622226
## Low.resolution -0.31578690 0.15665176
## Effective.pixels -0.32193001 0.15784435
## Zoom.wide..W. 0.29333084 0.15096797
## Zoom.tele..T. -0.07455414 0.11407498
## Normal.focus.range 0.39783162 0.15960485
## Macro.focus.range 1.00000000 -0.04376886
## Storage.included -0.04376886 1.00000000
## Weight..inc..batteries. -0.21055418 -0.15556845
## Dimensions -0.09033582 -0.11428012
## Price -0.12757671 -0.10304587
## Weight..inc..batteries. Dimensions Price
## Max.resolution 0.10667805 -5.660724e-05 0.1842009
## Low.resolution -0.04492655 -1.055846e-01 0.1542041
## Effective.pixels 0.07819819 -4.076899e-03 0.1902840
## Zoom.wide..W. -0.69878180 -4.865417e-01 -0.4591034
## Zoom.tele..T. -0.06590409 -1.181708e-01 -0.1894801
## Normal.focus.range -0.39335854 -2.371075e-01 -0.2738539
## Macro.focus.range -0.21055418 -9.033582e-02 -0.1275767
## Storage.included -0.15556845 -1.142801e-01 -0.1030459
## Weight..inc..batteries. 1.00000000 6.778848e-01 0.4647604
## Dimensions 0.67788481 1.000000e+00 0.2642562
## Price 0.46476035 2.642562e-01 1.0000000
library(corrgram)
corrgram(cameras, order=FALSE, lower.panel=panel.shade,
upper.panel=panel.pie, text.panel=panel.txt,
main="Corrgram of variables in cameras")
So we can see from the scatterplots,corrgram plot that price is positively correlated with the resolutions,Effective pixels,weight and dimensions whereas it is negatively correlated to the zoom angles,focus ranges and the storage.
library(car)
scatterplotMatrix(formula = ~Max.resolution+Low.resolution+Effective.pixels+Zoom.wide..W.+Zoom.tele..T.+Normal.focus.range+Macro.focus.range+Storage.included+Weight..inc..batteries.+Dimensions+Price,data=cameras)
cor.test(cameras$Max.resolution,cameras$Price)
##
## Pearson's product-moment correlation
##
## data: cameras$Max.resolution and cameras$Price
## t = 5.9982, df = 1036, p-value = 2.753e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1237345 0.2413592
## sample estimates:
## cor
## 0.1832025
cor.test(cameras$Low.resolution,cameras$Price)
##
## Pearson's product-moment correlation
##
## data: cameras$Low.resolution and cameras$Price
## t = 5.0226, df = 1036, p-value = 5.999e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.09421604 0.21302819
## sample estimates:
## cor
## 0.1541794
cor.test(cameras$Effective.pixels,cameras$Price)
##
## Pearson's product-moment correlation
##
## data: cameras$Effective.pixels and cameras$Price
## t = 6.2001, df = 1036, p-value = 8.139e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.1297959 0.2471521
## sample estimates:
## cor
## 0.1891493
cor.test(cameras$Zoom.wide..W.,cameras$Price)
##
## Pearson's product-moment correlation
##
## data: cameras$Zoom.wide..W. and cameras$Price
## t = -16.64, df = 1036, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.5059455 -0.4098410
## sample estimates:
## cor
## -0.459236
cor.test(cameras$Zoom.tele..T.,cameras$Price)
##
## Pearson's product-moment correlation
##
## data: cameras$Zoom.tele..T. and cameras$Price
## t = -6.2078, df = 1036, p-value = 7.762e-10
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2473734 -0.1300277
## sample estimates:
## cor
## -0.1893766
cor.test(cameras$Normal.focus.range,cameras$Price)
##
## Pearson's product-moment correlation
##
## data: cameras$Normal.focus.range and cameras$Price
## t = -9.1692, df = 1036, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.3293316 -0.2167404
## sample estimates:
## cor
## -0.2739745
cor.test(cameras$Macro.focus.range,cameras$Price)
##
## Pearson's product-moment correlation
##
## data: cameras$Macro.focus.range and cameras$Price
## t = -4.1409, df = 1035, p-value = 3.741e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.18708318 -0.06730696
## sample estimates:
## cor
## -0.1276605
cor.test(cameras$Storage.included,cameras$Price)
##
## Pearson's product-moment correlation
##
## data: cameras$Storage.included and cameras$Price
## t = -3.3313, df = 1034, p-value = 0.0008951
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.16292930 -0.04240602
## sample estimates:
## cor
## -0.1030459
cor.test(cameras$Weight..inc..batteries.,cameras$Price)
##
## Pearson's product-moment correlation
##
## data: cameras$Weight..inc..batteries. and cameras$Price
## t = 16.878, df = 1034, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.4156192 0.5111961
## sample estimates:
## cor
## 0.4647604
cor.test(cameras$Dimensions,cameras$Price)
##
## Pearson's product-moment correlation
##
## data: cameras$Dimensions and cameras$Price
## t = 8.8106, df = 1034, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.2066766 0.3200116
## sample estimates:
## cor
## 0.2642562
So we can see that price is correlated with all the features since in all these cases p values are less than 0.05 and hence the alternative hypothesis is accepted rejecting the null hypothesis.
chisq.test(xtabs(~cameras$Max.resolution+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Max.resolution + cameras$Price)): Chi-
## squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: xtabs(~cameras$Max.resolution + cameras$Price)
## X-squared = 9455.4, df = 4116, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Low.resolution+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Low.resolution + cameras$Price)): Chi-
## squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: xtabs(~cameras$Low.resolution + cameras$Price)
## X-squared = 9881.3, df = 2898, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Effective.pixels+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Effective.pixels + cameras$Price)):
## Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: xtabs(~cameras$Effective.pixels + cameras$Price)
## X-squared = 2123.4, df = 630, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Zoom.wide..W.+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Zoom.wide..W. + cameras$Price)): Chi-
## squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: xtabs(~cameras$Zoom.wide..W. + cameras$Price)
## X-squared = 1756.7, df = 1008, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Zoom.tele..T.+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Zoom.tele..T. + cameras$Price)): Chi-
## squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: xtabs(~cameras$Zoom.tele..T. + cameras$Price)
## X-squared = 5931.7, df = 4158, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Normal.focus.range+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Normal.focus.range + cameras$Price)):
## Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: xtabs(~cameras$Normal.focus.range + cameras$Price)
## X-squared = 2917.1, df = 1302, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Macro.focus.range+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Macro.focus.range + cameras$Price)):
## Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: xtabs(~cameras$Macro.focus.range + cameras$Price)
## X-squared = 2320.7, df = 1176, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Storage.included+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Storage.included + cameras$Price)):
## Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: xtabs(~cameras$Storage.included + cameras$Price)
## X-squared = 3898.5, df = 1806, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Weight..inc..batteries.+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Weight..inc..batteries. + cameras
## $Price)): Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: xtabs(~cameras$Weight..inc..batteries. + cameras$Price)
## X-squared = 16805, df = 9912, p-value < 2.2e-16
chisq.test(xtabs(~cameras$Dimensions+cameras$Price))
## Warning in chisq.test(xtabs(~cameras$Dimensions + cameras$Price)): Chi-
## squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: xtabs(~cameras$Dimensions + cameras$Price)
## X-squared = 8089.7, df = 4200, p-value < 2.2e-16
So even here we could see that in all the cases the p value is p<0.05 and hence we reject the null hypothesis and accept the interdependence of price with all the features.
t.test(cameras$Max.resolution,cameras$Price)
##
## Welch Two Sample t-test
##
## data: cameras$Max.resolution and cameras$Price
## t = 60.471, df = 2074, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1951.866 2082.710
## sample estimates:
## mean of x mean of y
## 2474.6724 457.3844
t.test(cameras$Low.resolution,cameras$Price)
##
## Welch Two Sample t-test
##
## data: cameras$Low.resolution and cameras$Price
## t = 37.658, df = 2057.9, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1247.990 1385.114
## sample estimates:
## mean of x mean of y
## 1773.9364 457.3844
t.test(cameras$Effective.pixels,cameras$Price)
##
## Welch Two Sample t-test
##
## data: cameras$Effective.pixels and cameras$Price
## t = -19.183, df = 1037, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -499.1042 -406.4720
## sample estimates:
## mean of x mean of y
## 4.596339 457.384393
t.test(cameras$Zoom.wide..W.,cameras$Price)
##
## Welch Two Sample t-test
##
## data: cameras$Zoom.wide..W. and cameras$Price
## t = -17.98, df = 1037.4, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -470.741 -378.101
## sample estimates:
## mean of x mean of y
## 32.96339 457.38439
t.test(cameras$Zoom.tele..T.,cameras$Price)
##
## Welch Two Sample t-test
##
## data: cameras$Zoom.tele..T. and cameras$Price
## t = -14.123, df = 1068.3, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -382.5220 -289.1967
## sample estimates:
## mean of x mean of y
## 121.5250 457.3844
t.test(cameras$Normal.focus.range,cameras$Price)
##
## Welch Two Sample t-test
##
## data: cameras$Normal.focus.range and cameras$Price
## t = -17.499, df = 1039.1, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -459.5779 -366.8999
## sample estimates:
## mean of x mean of y
## 44.14547 457.38439
t.test(cameras$Macro.focus.range,cameras$Price)
##
## Welch Two Sample t-test
##
## data: cameras$Macro.focus.range and cameras$Price
## t = -19.047, df = 1037.2, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -495.9149 -403.2782
## sample estimates:
## mean of x mean of y
## 7.78785 457.38439
t.test(cameras$Storage.included,cameras$Price)
##
## Welch Two Sample t-test
##
## data: cameras$Storage.included and cameras$Price
## t = -18.627, df = 1039.7, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -486.2824 -393.5907
## sample estimates:
## mean of x mean of y
## 17.44788 457.38439
t.test(cameras$Weight..inc..batteries.,cameras$Price)
##
## Welch Two Sample t-test
##
## data: cameras$Weight..inc..batteries. and cameras$Price
## t = -5.5355, df = 1277.3, p-value = 3.762e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -187.06929 -89.16861
## sample estimates:
## mean of x mean of y
## 319.2654 457.3844
t.test(cameras$Dimensions,cameras$Price)
##
## Welch Two Sample t-test
##
## data: cameras$Dimensions and cameras$Price
## t = -14.906, df = 1039.1, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -398.3603 -305.6817
## sample estimates:
## mean of x mean of y
## 105.3634 457.3844
Here we could see that p value in all cases is p<0.05 and hence our correlations are significant.We reject the null hypothesis which rejects the dependence of factors on each other,From this we could finally conclude that price is correlated with and dependent upon all the features of a camera.
We will be formulating a regression model as:Price=b0+b1(Max resolution)+b2(Low resolution)+b3(Effective pixels)+b4(Zoom wide angle)+b5(Zoom Tele)+b6(Normal focus range)+b7(Macro focus range)+b8(Storage included)+b9(Weight)+b10(Dimensions).So here we are including all features of the camera to check with its price.
fit1<-lm(formula=Price~Max.resolution+Low.resolution+Effective.pixels+Zoom.wide..W.+Zoom.tele..T.+Normal.focus.range+Macro.focus.range+Storage.included+Weight..inc..batteries.+Dimensions, data = cameras)
summary(fit1)
##
## Call:
## lm(formula = Price ~ Max.resolution + Low.resolution + Effective.pixels +
## Zoom.wide..W. + Zoom.tele..T. + Normal.focus.range + Macro.focus.range +
## Storage.included + Weight..inc..batteries. + Dimensions,
## data = cameras)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2378.0 -233.1 -105.4 58.9 5562.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 967.26665 242.59736 3.987 7.16e-05 ***
## Max.resolution -0.38814 0.09798 -3.961 7.97e-05 ***
## Low.resolution 0.24205 0.04697 5.153 3.07e-07 ***
## Effective.pixels 78.91679 23.58787 3.346 0.000851 ***
## Zoom.wide..W. -7.78381 3.72525 -2.089 0.036911 *
## Zoom.tele..T. -1.31217 0.25707 -5.104 3.96e-07 ***
## Normal.focus.range -0.79766 1.03596 -0.770 0.441490
## Macro.focus.range 3.25358 2.84521 1.144 0.253086
## Storage.included -0.68525 0.76011 -0.902 0.367525
## Weight..inc..batteries. 1.38801 0.13815 10.047 < 2e-16 ***
## Dimensions -3.28937 1.12876 -2.914 0.003644 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 638.7 on 1025 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.3026, Adjusted R-squared: 0.2958
## F-statistic: 44.48 on 10 and 1025 DF, p-value: < 2.2e-16
Now lets consider only those features with which the price is positively correlated.So for this our regression model would be:Price=b0+b1(Max resolution)+b2(Low resolution)+b3(Effective pixels)+b4(Weight)+b5*(Dimensions).
fit2<-lm(formula=Price~Max.resolution+Low.resolution+Effective.pixels+Weight..inc..batteries.+Dimensions, data = cameras)
summary(fit2)
##
## Call:
## lm(formula = Price ~ Max.resolution + Low.resolution + Effective.pixels +
## Weight..inc..batteries. + Dimensions, data = cameras)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2278.6 -247.1 -104.4 57.3 5648.4
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 288.75059 165.78776 1.742 0.08186 .
## Max.resolution -0.29474 0.09775 -3.015 0.00263 **
## Low.resolution 0.18667 0.04754 3.926 9.2e-05 ***
## Effective.pixels 70.02774 24.16005 2.898 0.00383 **
## Weight..inc..batteries. 1.57042 0.10982 14.299 < 2e-16 ***
## Dimensions -2.42978 1.15491 -2.104 0.03563 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 658 on 1030 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.2562, Adjusted R-squared: 0.2526
## F-statistic: 70.97 on 5 and 1030 DF, p-value: < 2.2e-16
Now lets consider only those features with which the price is negatively correlated.So for this our regression model would be:Price=b0+b1(Zoom wide angle)+b2(Zoom Tele)+b3(Normal focus range)+b4(Macro focus range)+b5*(Storage included).
fit3<-lm(formula=Price~Zoom.wide..W.+Zoom.tele..T.+Normal.focus.range+Macro.focus.range+Storage.included, data = cameras)
summary(fit3)
##
## Call:
## lm(formula = Price ~ Zoom.wide..W. + Zoom.tele..T. + Normal.focus.range +
## Macro.focus.range + Storage.included, data = cameras)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1561.3 -220.5 -132.2 11.8 6418.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1580.2804 70.4360 22.436 <2e-16 ***
## Zoom.wide..W. -31.5536 2.5994 -12.139 <2e-16 ***
## Zoom.tele..T. -0.1833 0.2470 -0.742 0.458
## Normal.focus.range -1.2221 1.0913 -1.120 0.263
## Macro.focus.range 1.0019 2.9239 0.343 0.732
## Storage.included -0.8069 0.7860 -1.027 0.305
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 676.7 on 1030 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.2134, Adjusted R-squared: 0.2095
## F-statistic: 55.87 on 5 and 1030 DF, p-value: < 2.2e-16
After all this we will be considering only those features which are in a better correlation with price when compared to other factors.Here we will be considering both positive and negative correlations.Also those features which have somewhat stonger correlations with price i.e. correlation > 0.2 or near to the number.So our regression model will be:Price=b0+b1(Max resolution)+b2(Zoom wide angle)+b3(Zoom Tele)+b4(Normal focus range)+b5(Effective pixels)+b6(Weight)+b7*(Dimensions).
fit4<-lm(formula=Price~Max.resolution+Effective.pixels+Zoom.wide..W.+Zoom.tele..T.+Normal.focus.range+Weight..inc..batteries.+Dimensions, data = cameras)
summary(fit4)
##
## Call:
## lm(formula = Price ~ Max.resolution + Effective.pixels + Zoom.wide..W. +
## Zoom.tele..T. + Normal.focus.range + Weight..inc..batteries. +
## Dimensions, data = cameras)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2453.7 -213.0 -114.0 32.9 5422.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 994.00263 242.83353 4.093 4.59e-05 ***
## Max.resolution -0.22106 0.09135 -2.420 0.015698 *
## Effective.pixels 89.88447 23.77622 3.780 0.000166 ***
## Zoom.wide..W. -8.55472 3.75212 -2.280 0.022814 *
## Zoom.tele..T. -1.14531 0.25523 -4.487 8.02e-06 ***
## Normal.focus.range -0.65687 0.99190 -0.662 0.507971
## Weight..inc..batteries. 1.27750 0.13803 9.255 < 2e-16 ***
## Dimensions -3.41371 1.14186 -2.990 0.002860 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 646.5 on 1028 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.2834, Adjusted R-squared: 0.2785
## F-statistic: 58.08 on 7 and 1028 DF, p-value: < 2.2e-16
Finally we will be fitting the model with features which are inbuilt and hence hold a vital role in a camera.Regression model:Price=b0+b1(Max resolution)+b2(Low resolution)+b3(Effective pixels)+b4(Zoom wide angle)+b5(Zoom Tele)+b6(Normal focus range)+b7(Macro focus range)+b8(Storage included).
fit5<-lm(formula=Price~Max.resolution+Low.resolution+Effective.pixels+Zoom.wide..W.+Zoom.tele..T.+Normal.focus.range+Macro.focus.range+Storage.included, data = cameras)
summary(fit5)
##
## Call:
## lm(formula = Price ~ Max.resolution + Low.resolution + Effective.pixels +
## Zoom.wide..W. + Zoom.tele..T. + Normal.focus.range + Macro.focus.range +
## Storage.included, data = cameras)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1533.4 -236.7 -132.3 24.7 6469.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1942.68192 175.60691 11.063 < 2e-16 ***
## Max.resolution -0.37831 0.10254 -3.689 0.000237 ***
## Low.resolution 0.15544 0.04838 3.213 0.001355 **
## Effective.pixels 73.52505 24.70609 2.976 0.002989 **
## Zoom.wide..W. -32.71498 2.82258 -11.590 < 2e-16 ***
## Zoom.tele..T. -0.32390 0.25015 -1.295 0.195679
## Normal.focus.range -1.03518 1.08618 -0.953 0.340791
## Macro.focus.range 2.09637 2.98439 0.702 0.482561
## Storage.included -0.89641 0.79764 -1.124 0.261347
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 670.5 on 1027 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.23, Adjusted R-squared: 0.224
## F-statistic: 38.34 on 8 and 1027 DF, p-value: < 2.2e-16
So we could see that from the first regression model we are getting the maximum value of multiple and adjusted r-squared values.Multiple R-squared value:0.3026 and Adjusted R-squared value:0.2958.Also in each model the p-value was less than 0.05,showing the results and each fit being significant enough with their particular r-squared values but the first fit being the best.So we would say that the first model would be the best fit model.Hence we conclude that the price of a camera is dependent on all the features considered.
Finally we could say that the price is dependent upon all the ten features given in the dataset.A person should check out on all these features before buying a particular product.If he wants to buy a particular product with a better feature then obviously price will increase.Even we found out that Olympus,Sony and Canon provides the largest variety of products,making upto around 34% of the products together, in the given dataset.Then we could check from the summary statistics that what is the highest number i.e. maximum of each feature.