data <- read.csv("C:/Users/dell/Desktop/Jawwad/dataSet_19122098.csv")
The given line of code reads a CSV file named ‘dataSet_19122098.csv’ from the specified path (‘C:Users/dell/Downloads/data files/csv file/’) into an R data frame named “data” .
summary(data)
## Observation Brand Price_. Megapixels
## Min. : 1.00 Length:28 Min. : 64.0 Min. :10.00
## 1st Qu.: 7.75 Class :character 1st Qu.: 88.0 1st Qu.:12.00
## Median :14.50 Mode :character Median :128.0 Median :12.00
## Mean :14.50 Mean :140.3 Mean :12.86
## 3rd Qu.:21.25 3rd Qu.:160.0 3rd Qu.:14.00
## Max. :28.00 Max. :320.0 Max. :16.00
## Weight_oz Score Brand.1
## Min. :4.000 Min. :50.00 Min. :0.0000
## 1st Qu.:5.000 1st Qu.:60.00 1st Qu.:0.0000
## Median :6.000 Median :64.50 Median :0.0000
## Mean :5.821 Mean :64.36 Mean :0.4643
## 3rd Qu.:7.000 3rd Qu.:69.25 3rd Qu.:1.0000
## Max. :7.000 Max. :74.00 Max. :1.0000
This command provides a summary of the statistical properties of the variables in the “data” dataset, including measures such as mean, median, quartiles, and other descriptive statistics.
str(data)
## 'data.frame': 28 obs. of 7 variables:
## $ Observation: int 1 2 3 4 5 6 7 8 9 10 ...
## $ Brand : chr "Canon" "Canon" "Canon" "Canon" ...
## $ Price_. : int 264 160 240 160 144 160 160 104 104 88 ...
## $ Megapixels : int 10 12 12 10 12 12 14 10 12 16 ...
## $ Weight_oz : int 7 5 7 6 5 7 5 7 5 5 ...
## $ Score : int 74 74 73 70 70 69 68 68 67 63 ...
## $ Brand.1 : int 1 1 1 1 1 1 1 1 1 1 ...
This command displays the structure of the “data” dataset, showing the data types and the first few observations of each variable. It provides a concise overview of the dataset’s composition.
hist(data$Price, col = rainbow(length(data$Price)))
This histogram appears to be negatively skewed, as evidenced by
the longer left tail. The majority of the data points are concentrated
on the higher values, with a few lower values extending the left tail.
hist(data$Megapixels, col = heat.colors(length(data$Megapixels)))
The histogram of Megapixels appears to be right-skewed, with a
higher frequency of lower Megapixel values. The color variation provides
a clear distinction between different levels of Megapixels. There are no
apparent outliers, but the distribution is not perfectly symmetric,
suggesting some deviation from a normal distribution.
colors <- rainbow(length(data$Weight))
hist(data$Weight, col = colors, main = "Histogram of Weight", xlab = "Weight")
Suggests a positively skewed distribution, indicating that
lighter weights are more common
hist(data$Score, col = rainbow(10)) # Adjust the number in rainbow() as needed
Suggests a positively skewed distribution, indicating that
lighter lighter are more common
pairs(~Price_. + Megapixels + Weight_oz + Score, data = data,
col = c("red", "green", "blue", "purple"))
cor_matrix <- cor(data[, c("Price_.", "Megapixels", "Weight_oz", "Score")])
print(cor_matrix)
## Price_. Megapixels Weight_oz Score
## Price_. 1.0000000 0.138906307 0.3488151 0.683211844
## Megapixels 0.1389063 1.000000000 -0.1988338 -0.007729723
## Weight_oz 0.3488151 -0.198833809 1.0000000 0.285688204
## Score 0.6832118 -0.007729723 0.2856882 1.000000000
Price_. and Megapixels (0.1389063):There is a weak positive correlation between Price_. and Megapixels (correlation coefficient = 0.1389063). This suggests that as Megapixels increase, there is a slight tendency for Price_. to increase, but the correlation is not very strong.
Price_. and Weight_oz (0.3488151):There is a moderate positive correlation between Price_. and Weight_oz (correlation coefficient = 0.3488151). This indicates that there is a moderate tendency for Price_. to increase as Weight_oz increases.
Price_. and Score (0.6832118):There is a strong positive correlation between Price_. and Score (correlation coefficient = 0.6832118). This suggests a strong tendency for Price_. to increase as Score increases. These two variables are positively correlated.
Megapixels and Weight_oz (-0.1988338):There is a weak negative correlation between Megapixels and Weight_oz (correlation coefficient = -0.1988338). This indicates a slight tendency for Megapixels to decrease as Weight_oz increases, but the correlation is not very strong.
Megapixels and Score (-0.007729723):There is a very weak negative correlation between Megapixels and Score (correlation coefficient = -0.007729723). The correlation is close to zero, suggesting little to no linear relationship between these two variables.
Weight_oz and Score (0.2856882):There is a moderate positive correlation between Weight_oz and Score (correlation coefficient = 0.2856882). This indicates a moderate tendency for Weight_oz to increase as Score increases.
m1 <- lm(Price_.~Megapixels, data=data)
summary(m1)
##
## Call:
## lm(formula = Price_. ~ Megapixels, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -82.0 -48.5 -18.0 26.5 174.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 76.000 90.766 0.837 0.410
## Megapixels 5.000 6.991 0.715 0.481
##
## Residual standard error: 66.85 on 26 degrees of freedom
## Multiple R-squared: 0.01929, Adjusted R-squared: -0.01842
## F-statistic: 0.5115 on 1 and 26 DF, p-value: 0.4808
The model equation is: = 76.000 + 5.000
The residuals represent the differences between the observed and predicted values.
The minimum residual is -82.0, and the maximum residual is 174.0.
The intercept is 76.000, indicating the estimated mean Price_. when Megapixels is zero.
The coefficient for Megapixels is 5.000, suggesting that, on average, Price_. increases by 5.000 units for each one-unit increase in Megapixels.
The t-test for the coefficient of Megapixels checks whether it is significantly different from zero.
The p-value associated with Megapixels is 0.481, which is greater than the conventional significance level of 0.05. This suggests that we fail to reject the null hypothesis, indicating that the coefficient for Megapixels is not statistically different from zero.
The F-statistic tests the overall significance of the model. The p-value associated with the F-statistic is 0.4808, indicating that the model as a whole is not statistically significant.
m2 <- lm(Price_. ~ Megapixels + Weight_oz, data=data)
summary(m2)
##
## Call:
## lm(formula = Price_. ~ Megapixels + Weight_oz, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -116.321 -36.408 -0.915 33.685 139.679
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -113.756 124.148 -0.916 0.3683
## Megapixels 7.805 6.705 1.164 0.2554
## Weight_oz 26.401 12.548 2.104 0.0456 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 62.83 on 25 degrees of freedom
## Multiple R-squared: 0.1668, Adjusted R-squared: 0.1002
## F-statistic: 2.503 on 2 and 25 DF, p-value: 0.1021
The coefficient for Intercept is -113.756, and its p-value is 0.3683. This suggests that the intercept is not significantly different from zero.
The coefficient for Megapixels is 7.805 with a p-value of 0.2554. This coefficient is not statistically significant at conventional significance levels (e.g., 0.05).
The coefficient for Weight_oz is 26.401 with a p-value of 0.0456, indicating that Weight_oz is statistically significant at a significance level of 0.05.
The F-statistic is 2.503 with a p-value of 0.1021. This tests the overall significance of the model. The p-value suggests that the model may not be statistically significant.
m3 <- lm(Price_. ~ Megapixels + Weight_oz + Score, data=data)
summary(m3)
##
## Call:
## lm(formula = Price_. ~ Megapixels + Weight_oz + Score, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -60.97 -27.98 -11.45 29.50 139.33
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -424.658 120.352 -3.528 0.001717 **
## Megapixels 6.655 5.173 1.286 0.210573
## Weight_oz 13.935 10.101 1.379 0.180467
## Score 6.188 1.454 4.256 0.000275 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 48.41 on 24 degrees of freedom
## Multiple R-squared: 0.5252, Adjusted R-squared: 0.4659
## F-statistic: 8.85 on 3 and 24 DF, p-value: 0.0003961
he overall model is statistically significant (F-statistic:
8.85, p-value: 0.0003961), indicating that at least one of the
predictors has a significant effect on the dependent variable
(Price_.).
Intercept: The intercept is -424.658, representing the estimated
Price_. when all predictor variables are zero.
Megapixels: The coefficient is 6.655, but it is not statistically significant (p-value: 0.210573). There is weak evidence to suggest a relationship between Megapixels and Price_..
Weight_oz: The coefficient is 13.935, but it is not statistically significant (p-value: 0.180467). There is weak evidence to suggest a relationship between Weight_oz and Price_..
Score: The coefficient is 6.188, and it is statistically significant (p-value: 0.000275). There is strong evidence to suggest a positive relationship between Score and Price_..
The residual standard error is 48.41, providing an estimate of the variability of the unexplained variance
The distribution of residuals shows that they range from -60.97 to 139.33.
Adjusted R-squared is 0.4659, considering the number of predictors.
anova(m1, m3)
## Analysis of Variance Table
##
## Model 1: Price_. ~ Megapixels
## Model 2: Price_. ~ Megapixels + Weight_oz + Score
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 116176
## 2 24 56244 2 59932 12.787 0.0001658 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
F (F-statistic):The F-statistic tests the null hypothesis that all coefficients of the additional predictors are equal to zero (i.e., these predictors do not contribute significantly to explaining the variation in the response variable).In this case, the F-statistic is 12.787.
Pr(>F) (p-value for F-statistic):The p-value associated with the F-statistic is 0.0001658, which is less than the typical significance levels (0.05, 0.01, etc.) This indicates that at least one of the predictors (Weight_oz or Score) in Model 2 is significant.
Signif. codes:The stars indicate the level of significance. In this case, (’*’) means highly significant (p-value < 0.001), suggesting that the additional predictors in Model 2 significantly improve the model fit compared to Model 1.**
The ANOVA table suggests that Model 2, which includes Megapixels, Weight_oz, and Score, is a significantly better fit than Model 1, which only includes Megapixels.
The predictors Weight_oz and Score together contribute significantly to explaining the variation in the response variable Price_.
anova(m2, m3)
## Analysis of Variance Table
##
## Model 1: Price_. ~ Megapixels + Weight_oz
## Model 2: Price_. ~ Megapixels + Weight_oz + Score
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 25 98699
## 2 24 56244 1 42455 18.116 0.0002752 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The F-statistic tests the hypothesis that adding the variable Score to the model does not significantly reduce the amount of variability left unexplained.
p−value=0.0002752 The p-value associated with the F-statistic. It indicates the probability of observing such an extreme F-statistic under the null hypothesis that the added variable does not significantly contribute to explaining the variability in the response variable.
(**): Highly significant (p < 0.001)*
*(**):Significant (0.001 < p < 0.01) (): Marginally
significant (0.01 < p < 0.05) (.): Not significant (p
> 0.05)
The low p-value (0.0002752) associated with the F-statistic in the ANOVA table suggests that including the variable Score in the model significantly improves the model fit.
You can reject the null hypothesis that the coefficient of Score is zero (i.e., Score does not significantly contribute to explaining the variability in Price_).
Consider keeping the variable Score in your model, as it
appears to be a significant predictor.
The output suggests that Model 2 is better than Model 1 in
explaining the variability in Price_.
# (PART D9)Add an indicator variable named Nikon which is 1 if the Brand
= 0 otherwise 0
data$Nikon <- ifelse(data$Brand.1 == 0, 1, 0)
data
## Observation Brand Price_. Megapixels Weight_oz Score Brand.1 Nikon
## 1 1 Canon 264 10 7 74 1 0
## 2 2 Canon 160 12 5 74 1 0
## 3 3 Canon 240 12 7 73 1 0
## 4 4 Canon 160 10 6 70 1 0
## 5 5 Canon 144 12 5 70 1 0
## 6 6 Canon 160 12 7 69 1 0
## 7 7 Canon 160 14 5 68 1 0
## 8 8 Canon 104 10 7 68 1 0
## 9 9 Canon 104 12 5 67 1 0
## 10 10 Canon 88 16 5 63 1 0
## 11 11 Canon 72 14 5 60 1 0
## 12 12 Canon 80 10 6 59 1 0
## 13 13 Canon 72 12 7 54 1 0
## 14 14 Nikon 216 16 5 73 0 1
## 15 15 Nikon 240 16 7 71 0 1
## 16 16 Nikon 160 14 6 69 0 1
## 17 17 Nikon 320 14 7 67 0 1
## 18 18 Nikon 96 14 5 65 0 1
## 19 19 Nikon 136 16 6 64 0 1
## 20 20 Nikon 120 12 5 64 0 1
## 21 21 Nikon 184 14 6 63 0 1
## 22 22 Nikon 144 12 6 61 0 1
## 23 23 Nikon 104 12 6 61 0 1
## 24 24 Nikon 64 12 7 60 0 1
## 25 25 Nikon 64 14 7 58 0 1
## 26 26 Nikon 80 12 4 54 0 1
## 27 27 Nikon 88 12 5 53 0 1
## 28 28 Nikon 104 14 4 50 0 1
m4 <- lm(Price_. ~ Megapixels + Weight_oz + Score + Nikon, data=data)
summary(m4)
##
## Call:
## lm(formula = Price_. ~ Megapixels + Weight_oz + Score + Nikon,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -73.226 -25.490 -2.939 19.960 127.943
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -439.938 117.215 -3.753 0.001036 **
## Megapixels 2.334 5.723 0.408 0.687152
## Weight_oz 12.152 9.870 1.231 0.230666
## Score 7.166 1.542 4.648 0.000112 ***
## Nikon 34.123 21.684 1.574 0.129228
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 46.99 on 23 degrees of freedom
## Multiple R-squared: 0.5714, Adjusted R-squared: 0.4968
## F-statistic: 7.665 on 4 and 23 DF, p-value: 0.0004448
The linear regression model (M4) predicts “Price_” based on “Megapixels,” “Weight_oz,” “Score,” and “Nikon.” The significant predictors are “Score” (p < 0.001), indicating a positive relationship, while others are not significant. The model explains 57.14% of the variance, suggesting moderate predictive power. Residuals have a standard deviation of 46.99. The F-statistic (p = 0.0004448) implies the overall significance of the model. Consider further examining the residuals and predictor significance for a comprehensive assessment.
# Fit linear models
m4_nikon <- lm(Price_. ~ Megapixels + Weight_oz + Score, data = subset(data, Nikon == 1))
m4_canon <- lm(Price_. ~ Megapixels + Weight_oz + Score, data = subset(data, Nikon == 0))
# Function to create scatter plots with colors
create_scatter_plot <- function(x, y, color_column, title) {
plot(x, y, col = ifelse(data$Nikon == 1, "red", "blue"), main = title, xlab = names(data)[x], ylab = names(data)[y])
legend("topright", legend = levels(factor(data[[color_column]])), fill = c("red", "blue"))
}
# Create scatter plots
create_scatter_plot(data$Megapixels, data$Price_, "Nikon", "Scatter Plot: Price_ vs. Megapixels")
create_scatter_plot(data$Weight_oz, data$Price_, "Nikon", "Scatter Plot: Price_ vs. Weight_oz")
create_scatter_plot(data$Score, data$Price_, "Nikon", "Scatter Plot: Price_ vs. Score")
m4_nikon <- lm(Price_. ~ Megapixels + Weight_oz + Score, data = subset(data, Nikon == 1))
m4_canon <- lm(Price_. ~ Megapixels + Weight_oz + Score, data = subset(data, Nikon == 0))
# Plotting
plot(data$Price_. ~ data$Megapixels, col = ifelse(data$Nikon == 1, "red", "blue"))
plot(data$Price_. ~ data$Weight_oz, col = ifelse(data$Nikon == 1, "red", "blue"))
plot(data$Price_. ~ data$Score, col = ifelse(data$Nikon == 1, "red", "blue"))