Q1. For each of parts (a) through (d), indicate whether we would generally expect the performance of a flexible statistical learning method to be better or worse than an inflexible method. Justify your answer.
Inflexible Flexible methods work better when we have and want to estimate a large number of predictors. With a small number of predictors, an inflexible method will work better.
Flexible Same reason as above.
Flexible Flexible methods are good at fitting non linear models.
Inflexible Choosing a flexible model in this case will result in overfitting. A flexible model with end up fitting the noise.
Q2. Explain whether each scenario is a classification or regression problem, and indicate whether we are most interested in inference or prediction. Finally, provide n and p.
Inference; n=500, p=3
Classification; n=20; p=13
Regression; n=52, p=4
Q3. We now revisit the bias-variance decomposition.
Bias Variance Decomposition
Because we are fitting the model on the given training data set, the training MSE decreases as the flexibility of the model increases. However, just because training MSE decreases and the model fits the training data well, does not mean that it will also fit the test data equally well. So test MSE decreases at first, and then starts increasing with flexbility of the model, because of the possibility that the model is over-fitting the data. This explains the U shape of the test MSE curve. Bias is inversely proportional to flexibility and decreases monotonically as flexibility increases. Irreducible error [Var(ε)] is a constant and is independent of the model, therefore it is a parallel line and by assumption it is usually standard normal, mean =0; sd=1. Test MSE is always greater than Var(ε) because by definition Var(ε) is not predicted by the model. Therefore this curve lies below the test MSE curve. Variance increases with flexibility because a model with high flexibility will produce greater variance.
Q4. You will now think of some real-life applications for statistical learning.
Classification 1– Is a TV series/movie/ad campaign going to be successful or not? Response: Success/Failure Predictors: Money spent, Talent, Running Time, Producer, TV Channel, Air time slot, etc. Goal: Prediction
Classification 2 – Should this applicant be admitted into Harvard University or not. Response: Admit/Not admit Predictors: SAT Scores, GPA, Socio Economic Strata, Income of parents, Essay effectiveness, Potential, etc. Goal: Prediction
Classification 3 – Salk Polio vaccine trials – Successful/Not Successful. Response: Did the child get polio or not Predictors: Age, Geography, General health condition, Control/Test group, etc. Goal: Prediction
Regression 1 – GDP Growth in European economies Response: What is the GDP of countries predicted to be by 2050 Predictors: Population, Per capita income, Education, Average life expectancy, Tax Revenue, Government Spending etc. Goal: Inference
Regression 2 – What is the average house sale price in XXX neighborhood over the next 5 years? Response: Average house in XXX neighborhood will sell for $Y next year, $Z the year after, $T after that, etc. Predictors: Proximity to transit, Parks, Schools, Average size of family, Average Income of Family, Crime Rate, Price Flux in surrounding neighborhoods etc. Goal: Inference
Regression 3 – Gas mileage that a new car design with result in Response: With certain parameters being set, X is the mileage we will get out of this car. Predictors: Fuel type, Number of Cylinders, Engine Version, etc. Goal: Inference
Cluster 1 – Division of countries into Developed, Developing and Third World Response: By 2050, countries in Asia can be split into these following clusters Predictors: Per Capita Income, Purchasing power parity, Average birth rate, Average number of years of education received, Average Death Rate, Population etc. Goal: Prediction
Cluster 2 – Division of average working population into income segments for taxation purposes. Response: This worker falls under this taxation bracket. Predictors: Income, Job Industry, Job Segment, Size of Company, etc. Goal: Inference
Cluster 3 – Cluster new movies being produced into ratings G/PG/R/PG-13 etc. Response: This movie is a R/PG/PG-13. Predictors: Violent content, Sexual language, theme, etc. Goal : Prediction
Q5. What are the advantages and disadvantages of a very flexible (versus a less flexible) approach for regression or classification? Under what circumstances might a more flexible approach be preferred to a less flexible approach? When might a less flexible approach be preferred?
Less Flexible +Inference – more interpretable -Model may not be a perfect fit +Fits linear models well. GAM may work for testing non linear effects but it makes inference more difficult
More Flexible +Prediction – more accurate prediction -Overfitting may result +Excellent for non linear models.
Q6. Describe the differences between a parametric and a non-parametric statistical learning approach. What are the advantages of a parametric approach to regression or classification (as opposed to a nonparametric approach)? What are its disadvantages?
Parametric + Reduces estimating f to the problem of estimating parameters. -Model we choose will not match the true unknown form of fIn order to avoid this, use flexible models. -Overfitting may result, so be careful
Non Parametric +By avoiding the assumptions about the shape of f, we have the option to fit a wider range of possible shapes for f -large number of observations are needed for non parametric approach in order to obtain an accurate estimate. +Excellent for non linear models.
Q7.The table below provides a training data set containing 6 observations, 3 predictors, and 1 qualitative response variable.
Suppose we wish to use this data set to make a prediction for Y when X1 = X2 = X3 = 0 using K-nearest neighbors. (a) Compute the Euclidean distance between each observation and the test point, X1 = X2 = X3 = 0.
| X1 | X2 | X3 | Y | Distance from origin |
|---|---|---|---|---|
| 0 | 3 | 0 | Red | 3 |
| 2 | 0 | 0 | Red | 2 |
| 0 | 1 | 3 | Red | 3.16227766 |
| 0 | 1 | 2 | Green | 2.236067977 |
| -1 | 0 | 1 | Green | 1.414213562 |
| 1 | 0 | 2 | Red | 2.236067977 |
Our prediction with K=1 is Green because we will be picking the 1 nearest neighbor and clustering accordingly.
Our prediction with K=1 is Green because we will be picking the 3 nearest neighbors and clustering according to whichever color occurs most number of times.
When K becomes larger, we get a smoother boundary, therefore if the boundary is very non linear, we would expect K to be small.