The p-values in Table 3.4 correspond to hypothesis tests that examine whether TV, radio, and newspaper advertising budgets have a significant impact on sales.
The p-value for TV advertising is very small, indicating strong
evidence that
increasing TV ad spending leads to increased sales.
The p-value for radio advertising is also very small, suggesting
that
radio ads significantly contribute to sales growth.
The p-value for newspaper advertising is relatively large,
meaning that
there is no strong evidence that newspaper ads impact sales.
TV and radio ads are effective in increasing sales, so companies
should
focus more on these mediums.
Newspaper ads do not have a significant effect on sales, so they
may not
be a cost-effective way to boost sales.
The K-Nearest Neighbors (KNN) method is a non-parametric machine learning algorithm used for both classification and regression tasks. However, its application differs based on whether the target variable is categorical (classification) or continuous (regression).
k
nearest data points based on
distance.k
neighbors to the test point.k
nearest data points and
computes the average (or weighted average) of their target values.Feature | KNN Classification | KNN Regression |
---|---|---|
Target Variable | Categorical (classes) | Continuous (numerical) |
Prediction Output | Majority class of k neighbor’s | Average of k neighbor’s values |
Example | Predicting if a customer will buy a product (Yes/No) | Predicting the price of a product |
Decision Rule | Voting among k nearest neighbors | Mean or weighted mean of k nearest values |
KNN works for multi-class classification, where there are more than two possible classes.
Instead of a simple Yes/No (binary classification), it assigns the test point to the most common class among the k neighbors.
Example: Classifying types of flowers as “Setosa”, “Versicolor” or “Virginica” based on petal and sepal measurements.
KNN classification can handle both binary and multi-class problems.
The main difference lies in how predictions are made majority voting (classification) vs. numerical averaging (regression).
We are given a regression model to predict starting salary after graduation (in thousands of dollars) based on the following predictors:
The estimated regression equation is:
\[ \hat{Y} = 50 + 20X_1 + 0.07X_2 + 35X_3 + 0.01X_4 - 10X_5 \]
We analyze which of the four given statements about salary differences between high school and college graduates is correct.
Now, let’s compare the salaries:
High School Graduate (\(X_3 = 0\)): \[ \hat{Y}_{HS} = 50 + 20X_1 + 0.07X_2 + 0.01X_4 \]
College Graduate (\(X_3 = 1\)): \[ \hat{Y}_{College} = 50 + 20X_1 + 0.07X_2 + 35 + 0.01X_4 - 10X_1 \] \[ = 85 + 10X_1 + 0.07X_2 + 0.01X_4 \]
For high school graduates to earn more than college graduates:
\[ 50 + 20X_1 + 0.07X_2 + 0.01X_4 > 85 + 10X_1 + 0.07X_2 + 0.01X_4 \]
Cancel common terms:
\[ 50 + 20X_1 > 85 + 10X_1 \]
\[ 10X_1 > 35 \]
\[ X_1 > 3.5 \]
This means that high school graduates earn more than college graduates when GPA is greater than 3.5.
The Correct option is (iii) High school graduates earn more than college graduates, provided GPA is high enough (above 3.5).
We substitute: - \(X_1 = 4.0\) (GPA) - \(X_2 = 110\) (IQ) - \(X_3 = 1\) (College graduate) - \(X_4 = X_1 \times X_2 = 4.0 \times 110 = 440\) - \(X_5 = X_1 \times X_3 = 4.0 \times 1 = 4.0\)
\[ \hat{Y} = 50 + (20 \times 4.0) + (0.07 \times 110) + (35 \times 1) + (0.01 \times 440) - (10 \times 4.0) \]
\[ = 50 + 80 + 7.7 + 35 + 4.4 - 40 \]
\[ = 137.1 \]
Predicted salary is $137,100 (137.1 thousand dollars).
Since we do not have the p-value, we cannot conclude that the interaction is weak.
So, it is False because a small coefficient does not necessarily indicate a weak interaction effect.
We are given the fitted values for a linear regression without an intercept:
\[ \hat{y}_i = x_i \hat{\beta} \]
where:
\[ \hat{\beta} = \frac{\sum_{i=1}^{n} x_i y_i}{\sum_{i'=1}^{n} x_{i'}^2} \]
We need to show that:
\[ \hat{y}_i = \sum_{i'=1}^{n} a_{i'} y_{i'} \]
and determine \(a_{i'}\).
Substituting the formula for \(\hat{\beta}\) into \(\hat{y}_i\):
\[ \hat{y}_i = x_i \times \frac{\sum_{i'=1}^{n} x_{i'} y_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]
Rewriting:
\[ \hat{y}_i = \sum_{i'=1}^{n} \left( \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \right) y_{i'} \]
Comparing with:
\[ \hat{y}_i = \sum_{i'=1}^{n} a_{i'} y_{i'} \]
we identify:
\[ a_{i'} = \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]
Thus, the coefficient \(a_{i'}\) is:
\[ a_{i'} = \frac{x_i x_{i'}}{\sum_{i'=1}^{n} x_{i'}^2} \]
This result confirms that the fitted values in linear regression are linear combinations of the response values \(y_{i'}\), weighted by \(a_{i'}\).