Answers should be written in clear English and complete sentences with reference to appropriate statistical output. Highlighted output is not an answer; taking the output and translating it into a clear sentence that captures the meaning is your task. When intervals are called for, use them: both boundaries.
The cold has arrived in Chicago and your investment management firm – MoneyMagic – faces a dilemma. After many years of dedicated and profitable service, the firm’s leading fund manager, Larry LeDuc has announced his intention to step aside. The management team must find a replacement for the star manager of their two signature funds: MoneyMagic GRI 1 and MoneyMagic GRI 2 – both growth and income funds.
There are two primary candidates to replace Larry: Hank Hogan is a thirty-five year old graduate of the Ohio State University with five years of experience managing a growth and income fund. He does not hold an MBA and lists as his hobbies games of strategy, military history, and family time. The other primary candidate is Prudence Preudhomme. Prudence Preudhomme is a thirty-two year old graduate of Princeton University with a Harvard MBA that has managed a growth and income fund for two years. She lists her hobbies as Harvard Alumni network events.
The hiring team is divided based on their personal perspectives on what makes a good fund manager. The CEO, Robert Maxwell, holds an MBA from Kellogg and was an undergraduate at Yale and he is impressed by pedigree. He believes that managers with MBAs and top school pedigree are smarter, better educated, better networked, and just better equipped to manage other people’s money. Maxwell believes that Prudence Preudhomme is an ideal hire.
Larry LeDuc leads the other camp. Larry was born in rural Illinois and dropped out of university to trade stocks. His early success trading on his own led to his hire by a fund management company and to his eventual rise to managing Money Magic GRI 1 and 2. He has successfully managed the latter two funds for almost ten years and points to himself as data for hiring. “I never finished college and I haven’t needed an MBA to be a success. Age, MBAs, undergraduate pedigree – all meaningless! People with pedigree are too picky, critical, and political!” Needless to say, Larry believes that Hank Hogan should be hired
Everyone in the firm is being called upon to contribute to this hire. You are a 27 year old new analyst with MoneyMagic and you worked in investments for 2 years after Yale and before receiving your MBA. You are young and upwardly mobile with large loans to pay off. It is clearly in your personal best interests for the firm to prefer MBAs from prestigious undergraduate institutions because these are parts of your pedigree. But others are aware of your incentives; only claim that which the data justifies.
Relevant to the problem at hand, you recall that your pre-MBA employer was sued for employment discrimination and the decisive evidence had been multiple regression. Controlling for age, education, rank, and specialization, it had been shown that gender (male as 0 and female as 1) had a negative impact on gross wages that, with 95% confidence, ranged from -$9004 to $-6286. The importance of regression in the disposition of the case alongside the growth of human resource analytics led you to focus in decision sciences during your MBA and you studied the case and the data. Here is your chance to shine by bringing your analytics to a key hiring problem for your company. And a fortuitous opportunity has arisen….
Mr. Maxwell is having his shoes shined in the break room when you enter for a cup of coffee. Not five seconds pass before breaking news flashes across the screen that your previous employer has paid a multibillion dollar settlement in the aforementioned discrimination case; the statistical evidence was cited. You see a light go off in Maxwell’s head and he says to you. “Meet me in my office in five minutes!” You arrive in Maxwell’s office and he presents you with some regression output.
MFP.LM
##
## Call:
## lm(formula = Returns ~ GRI + SAT + MBA + Age + Tenure, data = MFPerform)
##
## Coefficients:
## (Intercept) GRIGRI SAT MBAYes Age Tenure
## -0.297887 -2.788525 0.005379 1.192412 -0.108892 0.010426
Maxwell says, “That news report made me think that some statistical analysis might be in order so I looked at some data that I have from Morningstar. I wanted to know if pedigree and an MBA matter for fund managers. But the low r-squared made the data useless, or at least that is my vague recollection from analytics in graduate school. I remember that and that 95% confidence is common, use that if you need. Perhaps you can have a closer look and convince me that there is something here…..” Maxwell continues, “These data come from a random sample of 540 Morningstar mutual funds. For each fund, the following characteristics are measured in MFPerform.RData [an .RData file].
| Variable | Definition |
|---|---|
| Returns | The percent excess returns of the fund in the year of the observation. Excess returns are the returns over or under the percentage return on a benchmark portfolio consisting of all stocks traded on the two major American stock markets [Nasdaq and NYSE]. For example, if a fund returns 7 percent and the benchmark returned 10 percent, then the Returns are -3. |
| GRI | A qualitative variable that is either GRI [a Growth and Income fund] or Growth [a Growth fund] according to Morningstar classifications. |
| SAT | The average composite SAT scores of enrolling students at the institution where the fund manager received his/her undergraduate degree. |
| MBA | The fund manager either holds an MBA (Yes) or does not hold an MBA (No). |
| Age | The age in calendar years of the fund manager at the end of the previous calendar year, e.g. the 2008 data would contain the age of the fund manager on December 31, 2007. |
| Tenure | The tenure of the fund manager in whole numbers of years managing the fund. Note that this is not how long an individual has been a fund manager, it only measures how long this manager has managed this fund. |
Convince Mr. Maxwell why r-squared alone isn’t enough to tell you whether or not the model is useful and come up with an example of high r-squared in a useless regression.
For this regression, r-squared would be the value in the coefficient section (-0.297887). r-squared represents the variation, how close the values are from the regression line. So, in this case, the distance between our values and the line would be about -.297
If not going with the r=squared, then we would look at the individual values for each variable. One that would bring the most attention would be GRIGRI. The value is high, showing high variation, and show the distribution of its values for the variable GRI.
The following questions will require that you compute things except 1(c).
result1 <- explore(
MFPerform,
vars = "Returns",
fun = c("n_obs", "mean", "min", "max", "sd", "se"),
nr = Inf
)
# summary()
dtab(result1) %>% render()
I’m guessing you’re asking for just returns? The mean value of returns is .729, with a pretty substantial range (-14.056 to 16.865) and a standard deviation of 4.895
If you meant summary in general, I’ll provide in just in case
result2 <- explore(
MFPerform,
vars = c("Returns", "GRI", "SAT", "MBA", "Age", "Tenure"),
fun = c("n_obs", "mean", "min", "max", "sd", "se"),
nr = Inf
)
# summary()
dtab(result2) %>% render()
ggplot(MFPerform) +
aes(x = Returns) +
geom_histogram(bins = 30L, fill = "#0C778A") +
theme_minimal()
So, we can use a simple histogram visual to see our data, visually see the spread. We can see the average returns are in the form of smaller percentages on both negative and positive.High count of 0% returns, only a few cases of 10% returns or losses
visualize(
MFPerform,
xvar = "Tenure",
yvar = "Returns",
type = "scatter",
nrobs = -1,
check = "line",
custom = FALSE
)
plot(x = MFPerform$Returns, y = MFPerform$Tenure,)
model1 <- lm(MFPerform$Returns ~ MFPerform$Tenure,)
model1
##
## Call:
## lm(formula = MFPerform$Returns ~ MFPerform$Tenure)
##
## Coefficients:
## (Intercept) MFPerform$Tenure
## 1.2266 -0.1338
summary(model1)
##
## Call:
## lm(formula = MFPerform$Returns ~ MFPerform$Tenure)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.8808 -3.0615 -0.3684 3.3597 16.1736
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.22662 0.26129 4.694 3.4e-06 ***
## MFPerform$Tenure -0.13385 0.04225 -3.168 0.00162 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.854 on 538 degrees of freedom
## Multiple R-squared: 0.01831, Adjusted R-squared: 0.01649
## F-statistic: 10.04 on 1 and 538 DF, p-value: 0.001622
The scatterplot really threw me off. There isn’t much scattering going on. The data is not bivariate, the line of best fit does not work into the scatterplot and my linear model indicates with 95% confidence that there is no relationship. Our p-value is too small, our coefficient indicates that at most if there was a relationship, then the influence of the points would be of ~ -.13
The data for returns is normal, as previosuly seen in the histogram I created, the data followed a bellcurve shape. Further, I can create a new histogram with the bellcurve included with it, showing that it follows it and is considered normal. Most of our data in returns is also concentrated near the mean, there is only a few points that go our further to the +/- 10% ranges
ggplot(MFPerform, aes(Returns)) +
geom_histogram(aes(y = ..density..), bins = 7) +
stat_function(fun = dnorm, args = list(mean = mean(MFPerform$Returns), sd = sd(MFPerform$Returns)))
t.test(MFPerform$Returns, conf.level = 0.90)
##
## One Sample t-test
##
## data: MFPerform$Returns
## t = 3.4627, df = 539, p-value = 0.0005773
## alternative hypothesis: true mean is not equal to 0
## 90 percent confidence interval:
## 0.3823241 1.0764845
## sample estimates:
## mean of x
## 0.7294043
With a probability of 0.9, excess returns should range between .382 and 1.07 (percentages)
(b) what is the probability of excess Returns greater than -3?
result3 <- prob_norm(mean = .729, stdev = 4.895, lb = -3)
summary(result1)
## Explore
## Data : MFPerform
## Functions : n_obs, mean, min, max, sd, se
## Top : Function
##
## variable n_obs mean min max sd se
## Returns 540 0.729 -14.056 16.865 4.895 0.211
plot(result3)
The probability of x being greater than -3 is .777 or, 77.7%
cauchy so the relevant functions are pcauchy, qcauchy, rcauchy, and dcauchy. The verbs here are 0 and 1 (with location = 0 and scale = 1, the ratio of two \(z\)).
pcauchy(-1, location = 0, scale = 1, lower.tail = TRUE, log.p = FALSE)
## [1] 0.25
The probability of a Cauchy random variable being greater than -1 is .25 or 25%
(b) With probability 0.75, a Cauchy random variable is no greater than ______.
t.test(MFPerform$Returns)
##
## One Sample t-test
##
## data: MFPerform$Returns
## t = 3.4627, df = 539, p-value = 0.0005773
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 0.3156144 1.1431942
## sample estimates:
## mean of x
## 0.7294043
Average returns, with 95% confidence, ranges from .315 to 1.143
result4 <- explore(
MFPerform,
vars = "Returns",
byvar = "MBA",
fun = c("n_obs", "mean", "min", "max", "sd", "se", "p95"),
nr = Inf
)
# summary()
dtab(result4) %>% render()
result5 <- compare_means(
MFPerform,
var1 = "MBA",
var2 = "Returns"
)
summary(result5, show = FALSE)
## Pairwise mean comparisons (t-test)
## Data : MFPerform
## Variables : MBA, Returns
## Samples : independent
## Confidence: 0.95
## Adjustment: None
##
## MBA mean n n_missing sd se me
## No -0.293 205 0 4.692 0.328 0.646
## Yes 1.355 335 0 4.918 0.269 0.529
##
## Null hyp. Alt. hyp. diff p.value
## No = Yes No not equal to Yes -1.649 < .001 ***
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(result5, plots = "scatter", custom = FALSE)
It seems that, with 95%, there is a slight difference in returns between people who do have MBA vs those who do not. There is a difference in means Those having MBA’s having an average return of +1.355 (SD = 4.918) Those without MBA’s having an average return of -.293 (SD = 4.692) That is a difference of 1.649
result6 <- pivotr(MFPerform, cvars = "GRI", nr = Inf)
plot(result6)
# summary()
dtab(result6) %>% render()
There appears to be a total of 540 values, there are more growth funds (327) than GRI funds (213)
binom.test(327, 540)
##
## Exact binomial test
##
## data: 327 and 540
## number of successes = 327, number of trials = 540, p-value = 1.061e-06
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
## 0.5629224 0.6470270
## sample estimates:
## probability of success
## 0.6055556
With 95% confidence, we can see that growth funds are more common than GRI funds
prop.test(table(MFPerform$GRI))
##
## 1-sample proportions test with continuity correction
##
## data: table(MFPerform$GRI), null probability 0.5
## X-squared = 23.646, df = 1, p-value = 1.158e-06
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.5627918 0.6467948
## sample estimates:
## p
## 0.6055556
The 95% confidence interval for the probability of growth type is between .5627 and .6467
result7 <- pivotr(
MFPerform,
cvars = c("GRI", "MBA"),
nr = Inf
)
# summary()
dtab(result7) %>% render()
result8 <- compare_props(
MFPerform,
var1 = "GRI",
var2 = "MBA",
levs = "No"
)
summary(result8, show = FALSE)
## Pairwise proportion comparisons
## Data : MFPerform
## Variables : GRI, MBA
## Level : No in MBA
## Confidence: 0.95
## Adjustment: None
##
## GRI No Yes p n n_missing sd se me
## Growth 126 201 0.385 327 0 0.487 0.027 0.053
## GRI 79 134 0.371 213 0 0.483 0.033 0.065
##
## Null hyp. Alt. hyp. diff p.value
## Growth = GRI Growth not equal to GRI 0.014 0.736
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(result8, plots = "bar", custom = FALSE)
prop.test(table(MFPerform$GRI, MFPerform$MBA))
##
## 2-sample test for equality of proportions with continuity correction
##
## data: table(MFPerform$GRI, MFPerform$MBA)
## X-squared = 0.060988, df = 1, p-value = 0.8049
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.07305678 0.10191495
## sample estimates:
## prop 1 prop 2
## 0.3853211 0.3708920
There doesn’t seem to be much of a difference in MBA’s across the funds. The interval is wide, (-.073 to .101), the estimates are close to each other with .385 vs .370. Difference is only .014 with Growth having the -.014 with Yes MBA or vice versa when flipping it
result9 <- compare_props(
MFPerform,
var1 = "Age",
var2 = "MBA",
levs = "No",
comb = "24:27"
)
summary(result9, show = FALSE)
## Pairwise proportion comparisons
## Data : MFPerform
## Variables : Age, MBA
## Level : No in MBA
## Confidence: 0.95
## Adjustment: None
##
## Age No Yes p n n_missing sd se me
## 24 0 1 0.000 1 0 0.000 0.000 0.000
## 27 1 2 0.333 3 0 0.471 0.272 0.533
## 28 1 4 0.200 5 0 0.400 0.179 0.351
## 29 6 7 0.462 13 0 0.499 0.138 0.271
## 30 5 13 0.278 18 0 0.448 0.106 0.207
## 31 5 13 0.278 18 0 0.448 0.106 0.207
## 32 5 13 0.278 18 0 0.448 0.106 0.207
## 33 5 15 0.250 20 0 0.433 0.097 0.190
## 34 8 9 0.471 17 0 0.499 0.121 0.237
## 35 6 19 0.240 25 0 0.427 0.085 0.167
## 36 4 10 0.286 14 0 0.452 0.121 0.237
## 37 11 5 0.688 16 0 0.464 0.116 0.227
## 38 8 4 0.667 12 0 0.471 0.136 0.267
## 39 5 5 0.500 10 0 0.500 0.158 0.310
## 40 2 8 0.200 10 0 0.400 0.126 0.248
## 41 3 12 0.200 15 0 0.400 0.103 0.202
## 42 8 8 0.500 16 0 0.500 0.125 0.245
## 43 7 14 0.333 21 0 0.471 0.103 0.202
## 44 13 15 0.464 28 0 0.499 0.094 0.185
## 45 6 14 0.300 20 0 0.458 0.102 0.201
## 46 12 14 0.462 26 0 0.499 0.098 0.192
## 47 8 18 0.308 26 0 0.462 0.091 0.177
## 48 9 19 0.321 28 0 0.467 0.088 0.173
## 49 8 23 0.258 31 0 0.438 0.079 0.154
## 50 3 11 0.214 14 0 0.410 0.110 0.215
## 51 4 10 0.286 14 0 0.452 0.121 0.237
## 52 6 8 0.429 14 0 0.495 0.132 0.259
## 53 3 3 0.500 6 0 0.500 0.204 0.400
## 54 4 4 0.500 8 0 0.500 0.177 0.346
## 55 3 4 0.429 7 0 0.495 0.187 0.367
## 56 5 5 0.500 10 0 0.500 0.158 0.310
## 57 2 2 0.500 4 0 0.500 0.250 0.490
## 58 2 4 0.333 6 0 0.471 0.192 0.377
## 59 2 2 0.500 4 0 0.500 0.250 0.490
## 60 6 4 0.600 10 0 0.490 0.155 0.304
## 61 2 3 0.400 5 0 0.490 0.219 0.429
## 62 0 5 0.000 5 0 0.000 0.000 0.000
## 63 1 1 0.500 2 0 0.500 0.354 0.693
## 64 2 2 0.500 4 0 0.500 0.250 0.490
## 65 1 1 0.500 2 0 0.500 0.354 0.693
## 66 3 1 0.750 4 0 0.433 0.217 0.424
## 67 3 0 1.000 3 0 0.000 0.000 0.000
## 71 1 0 1.000 1 0 0.000 0.000 0.000
## 73 1 0 1.000 1 0 0.000 0.000 0.000
## 75 2 0 1.000 2 0 0.000 0.000 0.000
## 76 1 0 1.000 1 0 0.000 0.000 0.000
## 77 1 0 1.000 1 0 0.000 0.000 0.000
## 79 1 0 1.000 1 0 0.000 0.000 0.000
##
## Null hyp. Alt. hyp. diff p.value
## 24 = 27 24 not equal to 27 -0.333 1 (2000 replicates)
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(result9, plots = "bar", custom = FALSE)
result10 <- pivotr(
MFPerform,
cvars = "MBA",
nvar = "Age",
nr = Inf
)
# summary()
dtab(result10) %>% render()
result11 <- explore(
MFPerform,
vars = "Age",
byvar = "MBA",
fun = c("n_obs", "mean", "min", "max", "sd", "se"),
nr = Inf
)
# summary()
dtab(result11) %>% render()
result12 <- compare_means(MFPerform, var1 = "MBA", var2 = "Age")
summary(result12, show = FALSE)
## Pairwise mean comparisons (t-test)
## Data : MFPerform
## Variables : MBA, Age
## Samples : independent
## Confidence: 0.95
## Adjustment: None
##
## MBA mean n n_missing sd se me
## No 45.473 205 0 11.035 0.771 1.520
## Yes 42.982 335 0 9.001 0.492 0.967
##
## Null hyp. Alt. hyp. diff p.value
## No = Yes No not equal to Yes 2.491 0.007 **
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(result12, plots = "scatter", custom = FALSE)
In terms of general data, there is a difference between ages. For those with MBA’s, the avg age is 42.98 (range between 24 and 66, SD of 9) For those without MBA’s, the avg age is 45.47 (range between 27 and 79, SD of 11.03) With 95% confidence confirms, there is a difference in age (difference of 2.491) with MBA holders being younger. This puts into perspective the demographic of MBA candidates. It appears that people do not go straight to an MBA after college. The demographic seems to be professionals who are in around 40+ years of age.
result13 <- explore(
MFPerform,
vars = "Tenure",
fun = c("n_obs", "mean", "min", "max", "sd"),
nr = Inf
)
# summary()
dtab(result13) %>% render()
result14 <- single_mean(MFPerform, var = "Tenure")
summary(result14)
## Single mean test
## Data : MFPerform
## Variable : Tenure
## Confidence: 0.95
## Null hyp. : the mean of Tenure = 0
## Alt. hyp. : the mean of Tenure is not equal to 0
##
## mean n n_missing sd se me
## 3.715 540 0 4.949 0.213 0.418
##
## diff se t.value p.value df 2.5% 97.5%
## 3.715 0.213 17.442 < .001 539 3.296 4.133 ***
##
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(result14, plots = "hist", custom = FALSE)
I am guessing that this has to do with tenure. If we are only going off of their current job at their current fund, then we can see that the total average is 3.71 years (range between 0 and 35) With 95% confidence, the average fund manager spent 3.71 years outside their current position with a standard deviation of 4.94 years.
Use the 90% level of confidence where needed. The following questions relate to the regression model presented previously.
Maxwell continues, “Here are the key pieces. Does having an MBA matter? Does undergraduate SAT matter? Does age matter? Does the type of fund that one is managing matter? Does tenure as a manager matter?” Eventually, you will clearly answer these questions, but let's think about this regression.
source(url("https://raw.githubusercontent.com/robertwwalker/DADMStuff/master/ResidPlotter.R"))
resid.plotter(MFP.LM)
Ir appears that the residuals are normal. Our Shapiro value is strong, with a high p-value, our residuals shape is at a normal shape and our fitted values are clustered through our line of best fit.
library(gvlma)
gvlma(MFP.LM)
##
## Call:
## lm(formula = Returns ~ GRI + SAT + MBA + Age + Tenure, data = MFPerform)
##
## Coefficients:
## (Intercept) GRIGRI SAT MBAYes Age Tenure
## -0.297887 -2.788525 0.005379 1.192412 -0.108892 0.010426
##
##
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance = 0.05
##
## Call:
## gvlma(x = MFP.LM)
##
## Value p-value Decision
## Global Stat 3.53128 0.4731 Assumptions acceptable.
## Skewness 1.03327 0.3094 Assumptions acceptable.
## Kurtosis 0.03059 0.8612 Assumptions acceptable.
## Link Function 0.04675 0.8288 Assumptions acceptable.
## Heteroscedasticity 2.42067 0.1197 Assumptions acceptable.
car::qqPlot(residuals(MFP.LM))
## [1] 8 482
pnorm(.08)
## [1] 0.5318814
It is a .5318 probability of the percentage points being plus/minus 8%
## Load commands
Ageslope <- get("Ageslope", envir = .GlobalEnv)
register("Ageslope")
confint(MFP.LM,'Age',level=0.95)
## 2.5 % 97.5 %
## Age -0.1526457 -0.06513859
The 95% confidence interval for the slope of Age shows that it is between -.1526 and -.0651 This tells us that we are 95% confident that in our population for Age, the slope is between -.1526 and -.0651 So for every point change, the slope will have a change that is between -.1526 and -.0651
result21 <- explore(
MFPerform,
vars = "Returns",
byvar = "GRI",
fun = c("n_obs", "mean", "min", "max", "sd", "p025", "p975"),
nr = Inf
)
# summary()
dtab(result21) %>% render()
Growth and income funds earn -4.218 to 2.635 less compared to straight growth funds
library(gvlma)
gvlma(MFP.LM)
##
## Call:
## lm(formula = Returns ~ GRI + SAT + MBA + Age + Tenure, data = MFPerform)
##
## Coefficients:
## (Intercept) GRIGRI SAT MBAYes Age Tenure
## -0.297887 -2.788525 0.005379 1.192412 -0.108892 0.010426
##
##
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance = 0.05
##
## Call:
## gvlma(x = MFP.LM)
##
## Value p-value Decision
## Global Stat 3.53128 0.4731 Assumptions acceptable.
## Skewness 1.03327 0.3094 Assumptions acceptable.
## Kurtosis 0.03059 0.8612 Assumptions acceptable.
## Link Function 0.04675 0.8288 Assumptions acceptable.
## Heteroscedasticity 2.42067 0.1197 Assumptions acceptable.
car:::qqPlot(residuals(MFP.LM))
## [1] 8 482
result30 <- explore(
MFPerform,
vars = "Returns",
byvar = "GRI",
fun = c("n_obs", "median"),
nr = Inf
)
# summary()
dtab(result30) %>% render()
result31 <- explore(
MFPerform,
vars = "Returns",
fun = c("n_obs", "median"),
nr = Inf
)
# summary()
dtab(result31) %>% render()
(I really didn’t know what it what these questions were asking for. Specifically on unexplained vs explained. I could use codes to try and answer them, but I didn’t understand what data it wanted me to use or look at, I have no reference as to what I should be looking at for unexplained variables)
The following statements are either True or False. If true, explain that it is so. If false, change one word to make it true.
This statement is False. Only one manager did not have an MBA and that the ages across the top 5 funds are not young. The ages are fron 32 to 51
## filter and sort the dataset
MFPerform %>%
arrange(desc(Returns)) %>%
dtab(dec = 2, nr = 5) %>% render()
(b) The bottom 5 funds, in terms of expected excess returns, are Growth funds managed by older and long-tenured MBA's from below average schools.
This statement is False. These people do not have MBA’s (except for 1) and these people have not been long tenured (except for 2 people)
## filter and sort the dataset
MFPerform %>%
arrange(Returns) %>%
dtab(dec = 2, pageLength = 5, nr = 5) %>% render()
What variables explain the most and least variation?
Predict the average and the distribution of excess returns for Hank [Ohio State has average SAT of 1042] and for Prudence [Princeton has average SAT of 1355] in their current funds [all the necessary facts are given in the text]? Who should perform better and is this true on average, overall, both, or neither?
Hank <- MFPerform %>% filter(SAT=="1042", GRI=="Growth and Income Fund", MBA=="2")
predict(MFP.LM, newdata = Hank, interval="confidence")
## fit lwr upr
Neither of them would get a good predictor value based on their age (because the ages were high both for the top 5 as well as the lower 5), or for being in GRI (Since the top funds were growth, meanwhile the bottom funds were GRI). However, overall, Prudence is predicted more favorably based on his MBA holding and school ranking as shown as variables being placed amongst the top 5 funds
Sliced.regression <- step(MFP.LM, direction = "both")
## Start: AIC=1618.27
## Returns ~ GRI + SAT + MBA + Age + Tenure
##
## Df Sum of Sq RSS AIC
## - Tenure 1 1.09 10575 1616.3
## <none> 10574 1618.3
## - MBA 1 173.35 10747 1625.0
## - SAT 1 285.50 10859 1630.7
## - Age 1 473.29 11047 1639.9
## - GRI 1 994.22 11568 1664.8
##
## Step: AIC=1616.33
## Returns ~ GRI + SAT + MBA + Age
##
## Df Sum of Sq RSS AIC
## <none> 10575 1616.3
## + Tenure 1 1.09 10574 1618.3
## - MBA 1 172.78 10748 1623.1
## - SAT 1 287.04 10862 1628.8
## - Age 1 581.56 11157 1643.2
## - GRI 1 994.48 11570 1662.9
summary(Sliced.regression)
##
## Call:
## lm(formula = Returns ~ GRI + SAT + MBA + Age, data = MFPerform)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.731 -2.986 0.154 2.761 15.167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.306479 1.780482 -0.172 0.863398
## GRIGRI -2.788863 0.393181 -7.093 4.18e-12 ***
## SAT 0.005327 0.001398 3.811 0.000155 ***
## MBAYes 1.190063 0.402524 2.957 0.003249 **
## Age -0.106429 0.019621 -5.424 8.83e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.446 on 535 degrees of freedom
## Multiple R-squared: 0.1812, Adjusted R-squared: 0.1751
## F-statistic: 29.6 on 4 and 535 DF, p-value: < 2.2e-16
The steps taken on a stepwise are that of adding and removing variables for our predictors based on our F statistic
lmfit <- lm(Sliced.regression)
lm_resid <- resid(lmfit)
hist(resid(lmfit), breaks=100)
Our residuals from our regression are normal as seen by the distribution of the data and the shape that it has taken
(I tried solving it with qqplots or a jarque.bera.test, however I kept getting errors returned)
The residual standard error from our table is 1.78 (Intercept) The individual standard errors are as follows: GRIGRI 0.393181 SAT 0.001398 MBAYes 0.402524 Age 0.019621
Our residual standard errors show us how much deviation there is from our true regression line. It would be better for us to have a smaller value as it would strengthen our data and means our predictions are better and would mean our model fits the data better
wack <- confint(Sliced.regression) %>% data.frame() %>% rename(High=2, Low=1)
wack$Returns <- rownames(confint(Sliced.regression))
Result20 <- wack %>% rowwise() %>% mutate(Interpretation = paste("Percent point per one unit change in ", Returns), to = "to") %>% select(Low, to, High, Interpretation)
Result20[1,"Interpretation"] <- "Percent points if all predictors are zero"
Result20 %>% knitr::kable()
| Low | to | High | Interpretation |
|---|---|---|---|
| -3.8040718 | to | 3.1911131 | Percent points if all predictors are zero |
| -3.5612314 | to | -2.0164956 | Percent point per one unit change in GRIGRI |
| 0.0025809 | to | 0.0080726 | Percent point per one unit change in SAT |
| 0.3993415 | to | 1.9807837 | Percent point per one unit change in MBAYes |
| -0.1449727 | to | -0.0678849 | Percent point per one unit change in Age |
List the factors in order of importance.
Plot at least one relevant effect from the regression.
library(jtools)
effect_plot(Sliced.regression, pred=SAT, interval=TRUE, int.type="confidence") + labs(y="Expected Mean Returns", x="SAT", title="The Estimated Effect of Fund Returns", subtitle="From a model simplified by stepwise")
Provide advice on the hiring decision that summarises what you have learned including the construction of at least one appropriate graphic that clearly summarises the case. This should include a discussion of the candidates and how their attributes fit into the results.
When deciding between the two candidates, I did favor Hank more at the beginning before doing the analysis. It made sense that people who working hard have greater perspectives.
But the data I have analyzed now shows the opposite in performance
Their age and fund that they are coming from will be irrelevant, as we saw that ages were more prevalent with an “older” skew, meanwhile, we also saw that GRI’s were less prevalent and had lower returns compared to Growth funds. Tenure did not have a place either as top and lower rankings showed that recently hired people were more prevalent (also did not find a correlation between them when graphing them), and these two currently have a low tenure in their current positions.
However, what did interest me was the data on the accolades. First, we saw that there was greater mean, max and minimums for returns from those who had MBA’s vs those who did not. We also saw that those managers from the top 5 performing funds came from top schools (such as Prudence) (Also had a very low value for it’s regression, showing how this could be a strong predictor variable).
I tried my best, the 2nd half seemed impossible at my level of understanding.