We will be using a statistical program called “R” to generate 500 random numbers from the Chi-squared distribution. Amung other usefull tools, this program allows one to generate true random samples and analize them. You can analize data with graphs and through calculations to find probablilities of outcomes based on given or generated data. This program is incredibly usefull ecspecially if one were trying to analize thousands of peices of data. “R” allows for short cuts and helps eliminate tedious hand written calculations through cammands that instruct the program.
For example: If I wanted to find out the probability of a light bulb lasting 198 hours or less, I would enter in that amount of hours, the average amount of hours a light bulb lasts, and the standard Deviation, which is a statitical measurement of a samples deviation from an average. The command would look like this : pnorm(198,240,20). When that command is entered, the program gave 0.01786442 as an answer. That means that approximatly 18% of light bulbs lasted 198 hours or less given the information I enetered.
As you can see, “R” is very usefull and efficient in making statistical calculations.
x=rchisq(500,1)
mean(x)
## [1] 1.1
sd(x)
## [1] 1.453
We just found the mean and the standard deviation of the random sample of 500.
Next I’ll make a histogram of the data:
x=rchisq(500,1)
mean(x)
## [1] 0.9299
sd(x)
## [1] 1.285
hist(x,
col="green",
border="purple")
In statistical terms, this graph is decribed as skewed to the right. Which means that as the x amount gets bigger, the frequency of those amounts get lower.
Its unlikely that if we took a sample of 5 random numbers from the data 10 times and got the mean of those numbers, and then did the same thing of 5 a total of 10 times again the mean of that would be the same. If you took a sample of 5, 500 times and got the average, you would get a close answer if you did that again.
x=rchisq(500,1)
n=5
x=rchisq(n,1)
mean(x)
## [1] 1.183
Now we can replicate this action 500 times using “R”. This is a time where this program can help with tedious calculations.
x=rchisq(500,1)
n=5
x=rchisq(n,1)
mean(x)
## [1] 0.3013
replicate(500,mean(rchisq(n,1)))
## [1] 0.23679 1.26385 0.45588 0.76786 2.23793 0.70607 1.14577 0.83358
## [9] 0.29620 0.17092 0.52055 0.31418 1.46854 0.77648 0.25327 2.30457
## [17] 0.19190 1.40077 1.81567 0.70247 0.24514 1.17125 0.32747 0.61475
## [25] 2.10917 1.97063 0.44643 0.39227 0.40001 0.44239 1.77064 1.05358
## [33] 0.63520 0.28973 0.74608 2.18776 0.62164 0.76126 1.34582 0.03926
## [41] 0.59425 0.30405 2.84880 0.51618 0.27446 0.43566 0.42051 0.40572
## [49] 2.29299 0.64161 1.37965 0.67863 0.49584 2.57608 2.63856 0.30438
## [57] 0.79718 0.64571 0.86802 0.68804 1.29674 0.31878 0.97547 2.17652
## [65] 1.31368 1.67848 1.65601 1.25857 1.38296 1.41697 1.21254 0.85641
## [73] 0.78360 0.59132 3.37438 1.09123 1.62728 0.55746 1.01025 0.99528
## [81] 0.82108 0.68165 1.40693 0.95596 0.37418 0.24306 0.57383 0.33917
## [89] 0.99460 1.43542 0.69454 0.63570 0.68878 2.32938 0.55111 1.28693
## [97] 1.55544 0.63602 0.61728 0.87511 1.04888 1.64346 0.70872 1.50047
## [105] 0.89220 2.26101 0.72162 1.31454 2.14503 0.88167 1.98513 1.50861
## [113] 0.62697 0.27186 1.54316 0.84280 0.11595 0.08153 0.90827 0.86509
## [121] 0.45365 0.34360 1.51611 0.95864 0.55792 2.03538 0.40141 1.97419
## [129] 1.65350 0.35416 0.26019 0.96243 0.65967 0.30558 0.67388 1.95102
## [137] 0.43137 0.96205 2.70110 1.13601 0.91674 1.14277 0.45289 0.37873
## [145] 0.27033 0.61114 0.25591 0.35994 0.36542 1.53007 0.63588 0.34760
## [153] 1.08634 1.04480 0.54987 1.09147 0.53974 1.20921 0.38473 2.07642
## [161] 0.60676 0.82929 1.25807 0.77303 0.49786 1.08279 1.07977 2.10154
## [169] 4.21785 0.51032 1.86124 0.29942 0.62593 0.37629 0.89893 1.13004
## [177] 1.65135 0.80233 0.15983 0.93863 0.89862 1.04603 1.26827 1.03129
## [185] 0.67040 1.79861 0.39960 1.12836 1.02107 1.56312 0.88667 0.17025
## [193] 1.13908 0.10618 0.67219 0.48337 0.79318 0.61224 1.34411 0.41080
## [201] 1.49467 0.86354 0.55957 2.71090 0.54538 1.06132 1.64026 2.94181
## [209] 0.76974 0.40386 0.11599 0.74765 0.88898 0.53207 1.31524 0.92729
## [217] 0.26396 0.74659 1.72183 1.19661 0.95197 2.53201 1.44534 0.69450
## [225] 1.31173 1.13075 1.17214 0.31366 0.55895 0.89768 0.79497 0.38368
## [233] 0.78807 1.20345 1.13288 0.16768 2.29793 0.50495 1.21379 0.72161
## [241] 1.05574 1.20402 0.20358 2.75877 0.74143 0.23554 1.47279 1.19570
## [249] 0.52746 1.90726 0.37413 0.71443 1.10761 1.74561 1.80939 0.46961
## [257] 0.43273 0.44887 1.05709 0.16847 1.31695 1.88523 2.21092 0.78822
## [265] 1.15411 0.69942 1.51183 3.69819 1.44449 0.93577 0.75523 0.47678
## [273] 0.65147 0.38126 0.56278 0.82624 1.49095 0.40629 0.47884 0.93591
## [281] 0.07901 1.17613 1.78130 1.85389 1.10729 2.18054 1.44659 1.25851
## [289] 0.49665 0.48507 0.45135 2.36547 0.92832 0.62021 0.70754 0.20800
## [297] 2.03990 0.60607 0.30170 0.59395 1.64597 0.70756 0.86324 1.30880
## [305] 0.66926 0.20052 0.41206 0.78922 0.28186 1.19550 0.67137 1.68518
## [313] 0.74079 0.76310 1.26597 1.35737 2.45950 1.10638 0.25421 1.07313
## [321] 2.73398 0.66416 0.53845 1.44273 1.50091 1.26983 0.55596 0.66235
## [329] 2.52863 0.46270 1.02028 0.96547 1.28989 1.95917 0.27581 0.95174
## [337] 1.17116 1.34620 1.01621 0.38948 1.05585 0.38553 1.78317 1.18130
## [345] 1.94280 1.67032 1.12065 0.45335 0.06705 0.20149 0.32159 1.07622
## [353] 0.47650 0.11872 1.69739 0.06610 1.23829 0.21280 1.57121 0.72871
## [361] 0.12599 1.33720 2.00906 0.52766 1.21505 0.97473 0.88352 0.44410
## [369] 0.60775 0.57840 0.97506 0.20264 2.72253 0.56151 1.61807 1.17456
## [377] 1.42741 1.26914 0.21757 0.35383 1.08579 0.59725 1.84286 1.31837
## [385] 2.47779 0.76547 0.72571 1.46123 2.33561 0.77058 1.78431 2.18107
## [393] 0.35789 1.12803 1.45992 0.82200 0.82084 1.42811 1.56287 1.16407
## [401] 0.11926 1.93036 0.72659 0.33626 0.58997 2.36727 1.51258 0.50802
## [409] 0.52110 1.65324 1.35796 0.47446 0.37810 1.48999 1.99271 3.14515
## [417] 0.85949 0.46660 1.05442 1.52507 0.57264 0.15380 1.23873 1.42917
## [425] 1.65794 0.51947 0.40555 0.31190 0.73600 1.71228 0.48403 1.02015
## [433] 0.51702 0.54904 1.04162 0.89694 0.95347 1.12501 1.44770 1.89634
## [441] 2.26732 2.34047 0.84417 0.83047 2.01960 0.62797 1.31207 0.42751
## [449] 2.47414 1.27915 0.40001 0.71159 1.05843 0.90869 2.02306 1.54840
## [457] 0.75244 0.27429 1.39686 3.29922 0.26881 0.59868 0.69757 1.12905
## [465] 0.38365 0.18368 0.53920 0.40168 2.75421 0.51894 1.15957 1.55447
## [473] 0.47076 1.12386 1.58329 0.57979 0.95545 0.42825 1.70041 0.90926
## [481] 0.69229 1.01535 1.45734 0.11152 0.31251 0.79431 1.02519 1.47161
## [489] 1.05602 0.08649 0.59602 1.24371 0.92516 0.67189 0.79913 2.77347
## [497] 0.91824 0.15005 1.80486 3.77137
We will repeat this again, this time storing the results under the variable xbar5. Then we calculate the standard deviation and create a histogram for the results. There is sample of 5 being drawn 500 times, if you were to find an average of those 500 random drawings again and again, the results would be close together compared to olny taking 10 random drawings again and again.
x=rchisq(500,1)
n=5
x=rchisq(n,1)
mean(x)
## [1] 2.18
xbar5=replicate(500,mean(rchisq(n,1)))
mu5=round(mean(xbar5),2)
sd5=round(sd(xbar5),2)
mu5
## [1] 1.01
sd5
## [1] 0.71
hist(xbar5,xlim=c(0,6),col="blue",
main=paste("Mean = ", mu5, ", Sample Deviation = ",sd5))
Now with a larger sample sizes (n)=10, 20, 30, 40 ,50, well see what average tends to be after pulling (n) peices of data a few times. As (n) gets bigger, when I try to find the mean of those numbers a few times I will likely get close numbers again and again. This is because the distribution of the averages of the random samples gets more normal as the sample size goes up.
x=rchisq(500,1)
n=5
x=rchisq(n,1)
mean(x)
## [1] 0.9933
xbar5=replicate(500,mean(rchisq(n,1)))
mu5=round(mean(xbar5),2)
sd5=round(sd(xbar5),2)
mu5
## [1] 1.01
sd5
## [1] 0.63
hist(xbar5,xlim=c(0,6),col="blue",
main=paste("Mean = ", mu5, ", Sample Deviation = ",sd5))
x=rchisq(500,1)
n=10
x=rchisq(n,1)
mean(x)
## [1] 1.441
xbar10=replicate(500,mean(rchisq(n,1)))
mu10=round(mean(xbar10),2)
sd10=round(sd(xbar10),2)
mu10
## [1] 0.99
sd10
## [1] 0.48
hist(xbar10,xlim=c(0,6),col="blue",
main=paste("Mean = ", mu10, ", Sample Deviation = ",sd10))
x=rchisq(500,1)
n=20
x=rchisq(n,1)
mean(x)
## [1] 1.58
xbar20=replicate(500,mean(rchisq(n,1)))
mu20=round(mean(xbar20),2)
sd20=round(sd(xbar20),2)
mu20
## [1] 1
sd20
## [1] 0.31
hist(xbar20,xlim=c(0,6),col="blue",
main=paste("Mean = ", mu20, ", Sample Deviation = ",sd20))
x=rchisq(500,1)
n=30
x=rchisq(n,1)
mean(x)
## [1] 0.9252
xbar30=replicate(500,mean(rchisq(n,1)))
mu30=round(mean(xbar30),2)
sd30=round(sd(xbar30),2)
mu30
## [1] 0.98
sd30
## [1] 0.26
hist(xbar30,xlim=c(0,6),col="blue",
main=paste("Mean = ", mu30, ", Sample Deviation = ",sd30))
x=rchisq(500,1)
n=40
x=rchisq(n,1)
mean(x)
## [1] 0.9939
xbar40=replicate(500,mean(rchisq(n,1)))
mu40=round(mean(xbar40),2)
sd40=round(sd(xbar40),2)
mu40
## [1] 1.01
sd40
## [1] 0.23
hist(xbar40,xlim=c(0,6),col="blue",
main=paste("Mean = ", mu40, ", Sample Deviation = ",sd40))
x=rchisq(500,1)
n=50
x=rchisq(n,1)
mean(x)
## [1] 1.085
xbar50=replicate(500,mean(rchisq(n,1)))
mu50=round(mean(xbar50),2)
sd50=round(sd(xbar50),2)
mu50
## [1] 1
sd50
## [1] 0.19
hist(xbar50,xlim=c(0,6),col="blue",
main=paste("Mean = ", mu50, ", Sample Deviation = ",sd50))
par(mfrow=c(3,2))
par(mfrow=c(1,1))
In this activity I have learned that the distribution of means becomes normal as the sample size goes up. The original distribution of the data was skewed to the right heavily. The mean was .99 and the sd was 1.59. The distribution of sample means became more normal as the sample size went up. First a sample (n) of 5 peices of data was pulled 500 times and that distribution was a little less skewed to the right compared to the original data distribution. Then a sample (n) of 10 was taken 500 times and the sample mean distribution of that became closer to normal. At (n)=50, the distribution of sample means was extremely normal.