This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.
Import the Growth_Value Excel sheet from the Chapter 3 Excel file
Acetech=c(40000,40000,65000,90000,100000,145000,150000,550000)
mean(Acetech)
[1] 147500
median(Acetech)
[1] 95000
load and activate the statip package
#mfv=most frequent value
mfv(Acetech)
[1] 40000
load the epiDisplay package
Sizes=c('S','L','L','M','S','L','M','L','L','M')
tab1(Sizes)
Sizes :
Frequency Percent Cum. percent
L 5 50 50
M 3 30 80
S 2 20 100
Total 10 100 100
Import and activate the psych package
#in this output, the SD refers to the sample standard deviation
describe(Growth_Value)
Population SD (if needed)
# this is the code to create the population standard deviation function:
sd.p=function(x){sd(x)*sqrt((length(x)-1)/length(x))}
#Once the above code is input, you can use the following command to produce the population standard deviation
sd.p(Growth_Value$Growth)
[1] 23.46641187
options(digits=10)
summary(Growth_Value)
Year Growth Value
Min. :1984.00 Min. :-40.9000 Min. :-46.5200
1st Qu.:1992.75 1st Qu.: 2.8600 1st Qu.: 1.7025
Median :2001.50 Median : 15.2450 Median : 15.3800
Mean :2001.50 Mean : 15.7550 Mean : 12.0050
3rd Qu.:2010.25 3rd Qu.: 36.9725 3rd Qu.: 22.4375
Max. :2019.00 Max. : 79.4800 Max. : 44.0800
Load the matrixStats package
#the answer on the slides (75.5) is incorrect. R will produce the correct answer (72.5)
Score=c(60,70,80)
Weight=c(.25,.25,.5)
weightedMean(Score,Weight)
[1] 72.5
mean(Score)
[1] 70
The book uses this code:
ClothingMean <- aggregate(Online$Clothing, by=list(Online$Sex), mean)
ClothingMean
This code may be simpler in some cases:
aggregate(. ~ Sex, Online, mean)
?summary
summary(Growth_Value)
Year Growth Value
Min. :1984.00 Min. :-40.9000 Min. :-46.5200
1st Qu.:1992.75 1st Qu.: 2.8600 1st Qu.: 1.7025
Median :2001.50 Median : 15.2450 Median : 15.3800
Mean :2001.50 Mean : 15.7550 Mean : 12.0050
3rd Qu.:2010.25 3rd Qu.: 36.9725 3rd Qu.: 22.4375
Max. :2019.00 Max. : 79.4800 Max. : 44.0800
#note that the help description specified that quantile type=7
To directly calculate the inter-quartile range:
IQR(Growth_Value$Growth)
[1] 34.1125
IQR(Growth_Value$Value)
[1] 20.735
?IQR
#notice in the help description, the type=7; that is because there are 9 different, valid ways to calculate quantiles (they become more similar the more measurements you have and they are essentially the same with 100 or more observations)
#see the Sample Quantiles help description in the stats package for a description of the 9 possible methods to calculate quantiles
?quantile
#For example:
quantile(Growth_Value$Growth, probs=c(.2, .3,.4), type=7)
20% 30% 40%
-1.70 6.92 12.12
#For more information, see:
# https://www.jstor.org/stable/2684934
Boxplot
boxplot(Growth_Value$Growth, Growth_Value$Value, xlab="Annual Returns, 1984-2019
(in percent)", names =c("Growth","Value"), horizontal = TRUE,
col="gold")
This will analyze the data for outliers and extreme values load the rstatix package
Value=as.data.frame(Growth_Value$Value)
identify_outliers(Value)
NA
#this code gives the high and low values of the range
range(Growth_Value$Growth)
[1] -40.90 79.48
range(Growth_Value$Value)
[1] -46.52 44.08
Load the pastecs package
#this code will calculate the range itself as well as the standard deviation and variance
stat.desc(Growth_Value$Growth)
nbr.val nbr.null nbr.na min max
36.000000000 0.000000000 0.000000000 -40.900000000 79.480000000
range sum median mean SE.mean
120.380000000 567.180000000 15.245000000 15.755000000 3.966547567
CI.mean.0.95 var std.dev coef.var
8.052519664 566.405985714 23.799285403 1.510586189
Mean Absolute Deviation: Load the DescTools package
MeanAD(Growth_Value$Growth)
[1] 17.49055556
MeanAD(Growth_Value$Value)
[1] 13.66666667
Variance and Standard Deviation
var(Growth_Value$Growth)
[1] 566.4059857
sd(Growth_Value$Growth)
[1] 23.7992854
var(Growth_Value$Value)
[1] 323.2511743
sd(Growth_Value$Value)
[1] 17.97918725
Load the EnvStats package
cv(Growth_Value$Growth)
[1] 1.510586189
cv(Growth_Value$Value)
[1] 1.497641587
Sharpe Ratio
(mean(Growth_Value$Growth)-1)/sd(Growth_Value$Growth)
[1] 0.619976598
(mean(Growth_Value$Value)-1)/sd(Growth_Value$Value)
[1] 0.612096634
k=2
Cheb <- sapply(k, function(k) 1-1/k^2)
Cheb
[1] 0.75
pnorm(1)-pnorm(-1)
[1] 0.6826894921
pnorm(2)-pnorm(-2)
[1] 0.9544997361
pnorm(3)-pnorm(-3)
[1] 0.9973002039
Note that for this slide, the numbers are slightly off from what the book describes because +/- 1.96 standard deviations gives 95%, but they rounded to 2 standard deviations in their description
pnorm(90,74,8)-pnorm(58,74,8)
[1] 0.9544997361
1-pnorm(90,74,8)
[1] 0.02275013195
0.02275013195*280
#this will generate the Z scores for the Growth column
scale(Growth_Value$Growth)
[,1]
[1,] -0.89309404210
[2,] 1.01494644022
[3,] -0.11449923617
[4,] -0.73342538249
[5,] 0.01239533015
[6,] 1.08763769841
[7,] -0.51114980109
[8,] 1.36873857546
[9,] -0.32837120390
[10,] 0.01827785972
[11,] -0.75527477802
[12,] 1.00234101972
[13,] 0.04432906208
[14,] 0.13256700554
[15,] 0.48215733395
[16,] 2.67760140356
[17,] -0.92754885812
[18,] -1.72547197548
[19,] -2.06749905158
[20,] 1.07587263928
[21,] -0.15273567834
[22,] -0.09475074406
[23,] -0.26030193323
[24,] 0.17374471249
[25,] -2.38053366055
[26,] 1.06704884493
[27,] 0.20147663758
[28,] -0.63384256058
[29,] 0.11617995890
[30,] 0.91830488309
[31,] -0.05525375984
[32,] -0.33299319142
[33,] -0.40946607576
[34,] 0.88258952502
[35,] -0.85233651583
[36,] 0.95233951843
attr(,"scaled:center")
[1] 15.755
attr(,"scaled:scale")
[1] 23.7992854
#this will generate the min and max Z-scores for the Growth column
min(scale(Growth_Value$Growth))
[1] -2.380533661
max(scale(Growth_Value$Growth))
[1] 2.677601404
plot(Growth_Value$Growth, Growth_Value$Value, xlab = "Growth", ylab = "Value", pch=21, bg="red")
cov(Growth_Value)
Year Growth Value
Year 111.000000000 -7.485142857 -4.020571429
Growth -7.485142857 566.405985714 285.605448571
Value -4.020571429 285.605448571 323.251174286
#The covariance between the Growth and Value funds can be found by looking at the number where the Growth column and Value row meet, as well as where the Growth row and Value column meet
cor(Growth_Value)
Year Growth Value
Year 1.00000000000 -0.02985208619 -0.02122541728
Growth -0.02985208619 1.00000000000 0.66747117548
Value -0.02122541728 0.66747117548 1.00000000000
#The correlation between the Growth and Value funds can be found by looking at the number where the Growth column and Value row meet, as well as where the Growth row and Value column meet
#also see https://rpsychologist.com/correlation/
Geometric Mean: Load the EnvStats package
The Geometric Mean formula cannot use negative numbers. Thus, if you had a 100% increase one year and a 50% loss the next year, you would have what you started with. In r, you would write it as 2.0 (a 100% increase or doubling) and 0.50 (your amount was cut by 50%):
#this represents a doubling in the first year and losing half in the second year--you are left with 100% of what you started with
return=c(2.0,0.5)
geoMean(return)
[1] 1
#the arithmetic mean gives you the incorrect answer and suggests you are left with an extra 25% at the end of the second year:
mean(return)
[1] 1.25
#this code is for the example on slide 49 and represents a 10% increase followed by a 10% decrease
r=c(1.1,0.9)
geoMean(r)
[1] 0.9949874371
#the answer (0.995) shows a .005 loss or -0.5%
For the Connect homework and exams, they sometimes give (for example), the first 2 years and the first half of the third year. In this case you would need to multiply the numbers and then take the 2.5th root
#If our return is 10% in the first year and 25% in the second year, with a loss of 15% in the first half of the third year, your calculation would look like:
(1.1*1.25*0.85)^(1/(2.5))
[1] 1.064360257
time1=13322
time2=16915
n=5
G=(x=time2/time1)^(1/(n-1))
G-1
[1] 0.06151379683
#the result is written as a proportion
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.