For this project, you will be working with data that are available through the Department of Education (https://collegescorecard.ed.gov/data/). The data file contains information data related to various characteristics of 7175 degree granting U.S. institutions of higher education.
The variables you will be working with are defined in the included pdf (https://www2.stetson.edu/~jrasp/data/CollegeScorecard_variables.pdf). You should use the data contained in this file and your knowledge of R to answer the following questions. You must submit a document that includes your answers and also the R code you used to produce the answers. Your material should be submitted through the Google Classroom site.
Use the data contained in the files.
Answer 16 questions based on your R experimentation.
Covariance and Correlation
Simple Regression
Normal Distributions
Probability
Present 5 relations through scatterplots. In addition to the scatterplots, provide an interpretation of what you think about the direction and magnitude of the each of the relations you report.
Report the full covariance and correlation matrix for all variables and provide an interpretation of what variables are most strongly related (either positive or negative) and what variables are relatively unrelated to one another.
What is the strongest relation in the data? What are the variables involved and what is the correlation? What is the weakest relation in the data? What are the variables involved and what is the correlation?
Predict the average cost of attendance from average SAT score
Predict the average cost of attendance from admission rate
Predict the number of students from average SAT score
Predict the number of students from admission rate
Predict completion rate from average SAT score
Predict completion rate from admission rate
Predict the percentage of students with federal loans from average SAT score
Predict the percentage of students with federal loans from admission rate
Run two additional regression equations using any variables you like from the dataset. Provide an interpretation of the results of these additional analyses.
Are the distributions of average SAT score, admission rate, and total number of undergraduate students normal or non-normal? What information did you use to answer this question?
For any of the distributions that were not normal in your previous answer, how could you transform them to be normal? Perform the appropriate transformation and report the mean, standard deviation, and variance of the new transformed variable.
Based on the distribution of scores, what is the probability that the average SAT score for a school is greater than 1400? What is the probability that the average SAT score for a school is less than 800?
Imagine that the distribution of average SAT scores was perfectly normal. Answer both parts of Question 15 again using the observed mean and standard deviation as your parameters.
- Compute basic descriptive statistics for all quantitiative variables
# Code
# Prints descriptive statistics of all 24 quantitative variables
psych::describe(alotofdata[7:30])
## vars n mean sd median trimmed mad min
## ADM_RATE 1 2198 0.69 0.21 0.71 0.71 0.21 0
## SAT_AVG 2 1304 1059.07 133.36 1039.50 1048.34 104.52 720
## UGDS 3 6990 2332.16 5438.85 406.00 1052.09 526.32 0
## UGDS_WHITE 4 6990 0.51 0.29 0.56 0.52 0.34 0
## UGDS_BLACK 5 6990 0.19 0.22 0.10 0.14 0.12 0
## UGDS_HISP 6 6990 0.16 0.22 0.07 0.11 0.08 0
## UGDS_ASIAN 7 6990 0.03 0.08 0.01 0.02 0.02 0
## UGDS_AIAN 8 6990 0.01 0.07 0.00 0.00 0.00 0
## UGDS_NHPI 9 6990 0.00 0.03 0.00 0.00 0.00 0
## UGDS_2MOR 10 6990 0.02 0.03 0.02 0.02 0.03 0
## UGDS_NRA 11 6990 0.02 0.05 0.00 0.01 0.00 0
## UGDS_UNKN 12 6990 0.05 0.09 0.01 0.02 0.02 0
## PPTUG_EF 13 6969 0.23 0.25 0.15 0.19 0.22 0
## NPT4_PUB 14 1911 9624.66 4669.67 8751.00 9341.96 4293.61 -2434
## NPT4_PRIV 15 4688 18230.18 7272.13 18254.50 18021.04 7034.94 -581
## COSTT4_A 16 4030 24853.36 12762.63 22881.50 23321.23 12766.67 4610
## TUITFTE 17 7270 10401.08 17375.87 9015.00 9227.39 6676.15 0
## INEXPFTE 18 7270 7360.21 12726.34 5490.00 5875.23 3399.60 0
## PFTFAC 19 4045 0.57 0.31 0.54 0.57 0.40 0
## PCTPELL 20 6966 0.53 0.23 0.52 0.53 0.26 0
## C150_4 21 2481 0.48 0.21 0.47 0.47 0.22 0
## PFTFTUG1_EF 22 3664 0.53 0.26 0.53 0.53 0.31 0
## RET_FT4 23 2293 0.71 0.20 0.74 0.73 0.15 0
## PCTFLOAN 24 6966 0.52 0.28 0.58 0.54 0.28 0
## max range skew kurtosis se
## ADM_RATE 1.00 1.00 -0.58 -0.04 0.00
## SAT_AVG 1545.00 825.00 0.82 1.11 3.69
## UGDS 151558.00 151558.00 6.84 104.77 65.05
## UGDS_WHITE 1.00 1.00 -0.26 -1.08 0.00
## UGDS_BLACK 1.00 1.00 1.73 2.49 0.00
## UGDS_HISP 1.00 1.00 2.24 4.83 0.00
## UGDS_ASIAN 0.97 0.97 6.38 55.98 0.00
## UGDS_AIAN 1.00 1.00 11.16 136.65 0.00
## UGDS_NHPI 1.00 1.00 22.92 608.03 0.00
## UGDS_2MOR 0.53 0.53 4.11 36.08 0.00
## UGDS_NRA 0.93 0.93 8.29 101.58 0.00
## UGDS_UNKN 0.90 0.90 4.37 23.94 0.00
## PPTUG_EF 1.00 1.00 1.02 0.22 0.00
## NPT4_PUB 28201.00 30635.00 0.61 0.10 106.82
## NPT4_PRIV 89406.00 89987.00 0.73 3.89 106.21
## COSTT4_A 79212.00 74602.00 0.97 0.47 201.04
## TUITFTE 1292154.00 1292154.00 55.94 4076.80 203.79
## INEXPFTE 735077.00 735077.00 31.03 1570.11 149.26
## PFTFAC 1.00 1.00 0.06 -1.30 0.00
## PCTPELL 1.00 1.00 0.00 -0.79 0.00
## C150_4 1.00 1.00 0.13 -0.38 0.00
## PFTFTUG1_EF 1.00 1.00 -0.06 -0.99 0.00
## RET_FT4 1.00 1.00 -1.26 2.31 0.00
## PCTFLOAN 1.00 1.00 -0.53 -0.81 0.00
1․ Present 5 relations through scatterplots. In addition to the scatterplots, provide an interpretation of what you think about the direction and magnitude of the each of the relations you report.
# Code
ggplot2::qplot(COSTT4_A, NPT4_PUB, data=alotofdata, geom = "point", xlim = c(0, 35000), main = 'Scatterplot of Avg. Net Price for Public Title IV and Avg. Cost of Attendance', xlab = 'Average cost of attendance', ylab = 'Average net price for Title IV public institutions')
# Code
ggplot2::qplot(UGDS_NHPI, UGDS_UNKN, data=alotofdata, geom = "point", main = 'Scatterplot of Native Hawaiian/Pacific Isl. Enrollment and Unknown Enrollment', xlab = 'Native Hawaiian/Pacific Islander Enrollment Race', ylab = 'Unknown Enrollment Race')
# Code
ggplot2::qplot(C150_4, SAT_AVG, data=alotofdata, geom = "point", main = 'Scatterplot of Completion Rate and Average SAT score', xlab = 'Completion Rate', ylab = 'Average SAT score')
# Code
ggplot2::qplot(PFTFTUG1_EF, UGDS_2MOR, data=alotofdata, geom = "point", ylim = c(0, .4), main = 'Scatterplot of Full-Time Undergrads and Enrollments Who Are Two or More Races', xlab = 'Full-Time Undergrads', ylab = 'Enrollments Who Are Two or More Races')
# Code
ggplot2::qplot(PFTFTUG1_EF, PPTUG_EF, data=alotofdata, geom = "point", main = 'Scatterplot of Full-Time Undergrads and Part-Time Undergrads', xlab = 'Full-Time Undergrads', ylab = 'Part-Time Undergrads')
2․ Report the full covariance and correlation matrix for all variables and provide an interpretation of what variables are most strongly related (either positive or negative) and what variables are relatively unrelated to one another.
# Code
# Prints Covariance matrix of all quantitative variables
cov(alotofdata[, c(8:20,22:30)], use = "complete.obs", method = "pearson")
## SAT_AVG UGDS UGDS_WHITE UGDS_BLACK
## SAT_AVG 1.218022e+04 5.329262e+05 7.602445e+00 -1.139528e+01
## UGDS 5.329262e+05 8.285945e+07 -1.742406e+02 -4.046013e+02
## UGDS_WHITE 7.602445e+00 -1.742406e+02 5.742202e-02 -3.180472e-02
## UGDS_BLACK -1.139528e+01 -4.046013e+02 -3.180472e-02 4.293978e-02
## UGDS_HISP -2.289345e-01 2.508627e+02 -1.776047e-02 -5.888088e-03
## UGDS_ASIAN 2.940904e+00 2.582883e+02 -6.168199e-03 -2.583640e-03
## UGDS_AIAN -2.015583e-01 -2.680593e+01 1.468718e-04 -4.383466e-04
## UGDS_NHPI -2.699450e-02 5.489100e-02 -1.776705e-04 -1.011808e-04
## UGDS_2MOR 3.445257e-01 3.163754e+01 -3.089527e-04 -1.002266e-03
## UGDS_NRA 1.229237e+00 9.322055e+01 -6.599578e-04 -1.127995e-03
## UGDS_UNKN -2.638249e-01 -2.840537e+01 -6.884413e-04 5.944754e-06
## PPTUG_EF -3.816668e+00 -7.697761e+01 -3.402001e-03 1.452766e-03
## NPT4_PUB 1.385288e+05 4.890578e+06 3.771110e+02 -1.341249e+02
## COSTT4_A 2.120705e+05 1.128208e+07 1.485043e+02 -1.158920e+02
## TUITFTE 2.200865e+05 1.350737e+07 1.635458e+02 -1.790211e+02
## INEXPFTE 2.413508e+05 1.380195e+07 -4.826708e+01 -1.068839e+02
## PFTFAC 2.097600e+00 6.739357e+01 2.259232e-03 3.873909e-03
## PCTPELL -9.354897e+00 -3.201298e+02 -2.136342e-02 1.743799e-02
## C150_4 1.406813e+01 7.645367e+02 9.136309e-03 -1.343661e-02
## PFTFTUG1_EF 5.079774e+00 1.943247e+02 4.346010e-03 -1.095800e-03
## RET_FT4 7.771277e+00 4.830751e+02 1.129095e-03 -6.430685e-03
## PCTFLOAN -7.287525e+00 -4.466688e+02 1.244968e-03 1.368123e-02
## UGDS_HISP UGDS_ASIAN UGDS_AIAN UGDS_NHPI
## SAT_AVG -2.289345e-01 2.940904e+00 -2.015583e-01 -2.699450e-02
## UGDS 2.508627e+02 2.582883e+02 -2.680593e+01 5.489100e-02
## UGDS_WHITE -1.776047e-02 -6.168199e-03 1.468718e-04 -1.776705e-04
## UGDS_BLACK -5.888088e-03 -2.583640e-03 -4.383466e-04 -1.011808e-04
## UGDS_HISP 2.105206e-02 2.728131e-03 -1.707483e-04 5.362607e-05
## UGDS_ASIAN 2.728131e-03 4.753151e-03 -1.775280e-04 1.039281e-04
## UGDS_AIAN -1.707483e-04 -1.775280e-04 4.861502e-04 1.285416e-06
## UGDS_NHPI 5.362607e-05 1.039281e-04 1.285416e-06 3.194102e-05
## UGDS_2MOR -5.569517e-05 4.881726e-04 1.701521e-04 8.110200e-05
## UGDS_NRA 2.172151e-05 8.658289e-04 3.279742e-06 1.072810e-05
## UGDS_UNKN 1.946616e-05 -1.000288e-05 -2.112756e-05 -3.773000e-06
## PPTUG_EF 2.453818e-03 -4.711632e-04 1.764228e-04 3.888940e-05
## NPT4_PUB -2.162551e+02 -1.431622e+01 -1.342254e+01 -1.909956e+00
## COSTT4_A -1.310979e+02 8.443090e+01 -1.609801e+01 -1.047081e+00
## TUITFTE -8.683237e+01 6.604482e+01 -1.077970e+01 -2.437688e-01
## INEXPFTE -1.724218e+01 1.141844e+02 -8.156141e+00 1.962394e+00
## PFTFAC -3.763366e-03 -2.208007e-03 2.999566e-04 -3.803687e-05
## PCTPELL 5.267153e-03 -3.991713e-04 1.242456e-04 2.696185e-05
## C150_4 -1.008910e-03 4.340904e-03 -6.325488e-04 -2.611656e-05
## PFTFTUG1_EF -1.126385e-03 -8.905536e-04 -3.586837e-04 -1.357610e-04
## RET_FT4 1.765587e-03 3.173605e-03 -4.895431e-04 -1.789240e-05
## PCTFLOAN -9.145455e-03 -3.752316e-03 -1.656458e-04 -1.034155e-04
## UGDS_2MOR UGDS_NRA UGDS_UNKN PPTUG_EF
## SAT_AVG 3.445257e-01 1.229237e+00 -2.638249e-01 -3.816668e+00
## UGDS 3.163754e+01 9.322055e+01 -2.840537e+01 -7.697761e+01
## UGDS_WHITE -3.089527e-04 -6.599578e-04 -6.884413e-04 -3.402001e-03
## UGDS_BLACK -1.002266e-03 -1.127995e-03 5.944754e-06 1.452766e-03
## UGDS_HISP -5.569517e-05 2.172151e-05 1.946616e-05 2.453818e-03
## UGDS_ASIAN 4.881726e-04 8.658289e-04 -1.000288e-05 -4.711632e-04
## UGDS_AIAN 1.701521e-04 3.279742e-06 -2.112756e-05 1.764228e-04
## UGDS_NHPI 8.110200e-05 1.072810e-05 -3.773000e-06 3.888940e-05
## UGDS_2MOR 6.307534e-04 6.640463e-05 -6.965429e-05 1.926695e-05
## UGDS_NRA 6.640463e-05 8.691456e-04 -4.911387e-05 -2.911461e-04
## UGDS_UNKN -6.965429e-05 -4.911387e-05 8.168477e-04 2.251906e-05
## PPTUG_EF 1.926695e-05 -2.911461e-04 2.251906e-05 9.677881e-03
## NPT4_PUB 9.694901e-01 4.783556e-02 1.923607e+00 -1.351751e+02
## COSTT4_A 1.042207e+01 2.229337e+01 -1.486209e+00 -1.749173e+02
## TUITFTE 8.204159e+00 3.388701e+01 5.209673e+00 -1.029201e+02
## INEXPFTE 1.575697e+01 4.359567e+01 5.067068e+00 -1.139963e+02
## PFTFAC 9.234901e-05 -2.319271e-05 -4.925984e-04 -1.664099e-03
## PCTPELL -3.905091e-04 -7.951955e-04 9.181373e-05 2.672191e-03
## C150_4 3.905836e-04 1.420905e-03 -1.838141e-04 -9.527538e-03
## PFTFTUG1_EF -3.235708e-04 7.834947e-05 -4.936012e-04 -1.135068e-02
## RET_FT4 1.561521e-04 8.495269e-04 -1.356578e-04 -4.265497e-03
## PCTFLOAN -6.137635e-04 -1.289573e-03 1.443932e-04 -2.242967e-03
## NPT4_PUB COSTT4_A TUITFTE INEXPFTE
## SAT_AVG 1.385288e+05 2.120705e+05 2.200865e+05 2.413508e+05
## UGDS 4.890578e+06 1.128208e+07 1.350737e+07 1.380195e+07
## UGDS_WHITE 3.771110e+02 1.485043e+02 1.635458e+02 -4.826708e+01
## UGDS_BLACK -1.341249e+02 -1.158920e+02 -1.790211e+02 -1.068839e+02
## UGDS_HISP -2.162551e+02 -1.310979e+02 -8.683237e+01 -1.724218e+01
## UGDS_ASIAN -1.431622e+01 8.443090e+01 6.604482e+01 1.141844e+02
## UGDS_AIAN -1.342254e+01 -1.609801e+01 -1.077970e+01 -8.156141e+00
## UGDS_NHPI -1.909956e+00 -1.047081e+00 -2.437688e-01 1.962394e+00
## UGDS_2MOR 9.694901e-01 1.042207e+01 8.204159e+00 1.575697e+01
## UGDS_NRA 4.783556e-02 2.229337e+01 3.388701e+01 4.359567e+01
## UGDS_UNKN 1.923607e+00 -1.486209e+00 5.209673e+00 5.067068e+00
## PPTUG_EF -1.351751e+02 -1.749173e+02 -1.029201e+02 -1.139963e+02
## NPT4_PUB 1.560800e+07 1.287933e+07 7.263421e+06 3.494579e+06
## COSTT4_A 1.287933e+07 1.677172e+07 9.799257e+06 7.238153e+06
## TUITFTE 7.263421e+06 9.799257e+06 1.170850e+07 8.223882e+06
## INEXPFTE 3.494579e+06 7.238153e+06 8.223882e+06 1.556465e+07
## PFTFAC -1.259444e+01 3.481788e+00 4.339883e+01 3.441474e+01
## PCTPELL -2.510637e+02 -1.975503e+02 -2.263728e+02 -1.478998e+02
## C150_4 3.181454e+02 4.348695e+02 3.466991e+02 3.382111e+02
## PFTFTUG1_EF 2.138848e+02 2.365901e+02 1.606063e+02 1.159236e+02
## RET_FT4 1.173998e+02 1.933343e+02 1.704002e+02 1.861454e+02
## PCTFLOAN 1.707053e+02 9.555363e+01 -3.347646e+01 -1.182968e+02
## PFTFAC PCTPELL C150_4 PFTFTUG1_EF
## SAT_AVG 2.097600e+00 -9.354897e+00 1.406813e+01 5.079774e+00
## UGDS 6.739357e+01 -3.201298e+02 7.645367e+02 1.943247e+02
## UGDS_WHITE 2.259232e-03 -2.136342e-02 9.136309e-03 4.346010e-03
## UGDS_BLACK 3.873909e-03 1.743799e-02 -1.343661e-02 -1.095800e-03
## UGDS_HISP -3.763366e-03 5.267153e-03 -1.008910e-03 -1.126385e-03
## UGDS_ASIAN -2.208007e-03 -3.991713e-04 4.340904e-03 -8.905536e-04
## UGDS_AIAN 2.999566e-04 1.242456e-04 -6.325488e-04 -3.586837e-04
## UGDS_NHPI -3.803687e-05 2.696185e-05 -2.611656e-05 -1.357610e-04
## UGDS_2MOR 9.234901e-05 -3.905091e-04 3.905836e-04 -3.235708e-04
## UGDS_NRA -2.319271e-05 -7.951955e-04 1.420905e-03 7.834947e-05
## UGDS_UNKN -4.925984e-04 9.181373e-05 -1.838141e-04 -4.936012e-04
## PPTUG_EF -1.664099e-03 2.672191e-03 -9.527538e-03 -1.135068e-02
## NPT4_PUB -1.259444e+01 -2.510637e+02 3.181454e+02 2.138848e+02
## COSTT4_A 3.481788e+00 -1.975503e+02 4.348695e+02 2.365901e+02
## TUITFTE 4.339883e+01 -2.263728e+02 3.466991e+02 1.606063e+02
## INEXPFTE 3.441474e+01 -1.478998e+02 3.382111e+02 1.159236e+02
## PFTFAC 3.008119e-02 -9.424151e-04 1.623653e-03 5.401583e-03
## PCTPELL -9.424151e-04 1.651561e-02 -1.255180e-02 -2.966363e-03
## C150_4 1.623653e-03 -1.255180e-02 2.660994e-02 1.157375e-02
## PFTFTUG1_EF 5.401583e-03 -2.966363e-03 1.157375e-02 2.568155e-02
## RET_FT4 -4.473285e-04 -5.685523e-03 1.335546e-02 5.700239e-03
## PCTFLOAN 1.564988e-03 6.671230e-03 -4.494687e-03 3.454553e-03
## RET_FT4 PCTFLOAN
## SAT_AVG 7.771277e+00 -7.287525e+00
## UGDS 4.830751e+02 -4.466688e+02
## UGDS_WHITE 1.129095e-03 1.244968e-03
## UGDS_BLACK -6.430685e-03 1.368123e-02
## UGDS_HISP 1.765587e-03 -9.145455e-03
## UGDS_ASIAN 3.173605e-03 -3.752316e-03
## UGDS_AIAN -4.895431e-04 -1.656458e-04
## UGDS_NHPI -1.789240e-05 -1.034155e-04
## UGDS_2MOR 1.561521e-04 -6.137635e-04
## UGDS_NRA 8.495269e-04 -1.289573e-03
## UGDS_UNKN -1.356578e-04 1.443932e-04
## PPTUG_EF -4.265497e-03 -2.242967e-03
## NPT4_PUB 1.173998e+02 1.707053e+02
## COSTT4_A 1.933343e+02 9.555363e+01
## TUITFTE 1.704002e+02 -3.347646e+01
## INEXPFTE 1.861454e+02 -1.182968e+02
## PFTFAC -4.473285e-04 1.564988e-03
## PCTPELL -5.685523e-03 6.671230e-03
## C150_4 1.335546e-02 -4.494687e-03
## PFTFTUG1_EF 5.700239e-03 3.454553e-03
## RET_FT4 9.335425e-03 -4.348277e-03
## PCTFLOAN -4.348277e-03 2.056833e-02
# Code
# Prints Correlation matrix of all quantitative variables
cor(alotofdata[, c(8:20,22:30)], use = "complete.obs", method = "pearson")
## SAT_AVG UGDS UGDS_WHITE UGDS_BLACK UGDS_HISP
## SAT_AVG 1.00000000 0.530479300 0.28746595 -0.498273335 -0.014296713
## UGDS 0.53047930 1.000000000 -0.07988021 -0.214499541 0.189940545
## UGDS_WHITE 0.28746595 -0.079880211 1.00000000 -0.640504825 -0.510819933
## UGDS_BLACK -0.49827333 -0.214499541 -0.64050482 1.000000000 -0.195838031
## UGDS_HISP -0.01429671 0.189940545 -0.51081993 -0.195838031 1.000000000
## UGDS_ASIAN 0.38651147 0.411569154 -0.37336043 -0.180846962 0.272726304
## UGDS_AIAN -0.08283002 -0.133559424 0.02779802 -0.095940606 -0.053373262
## UGDS_NHPI -0.04327861 0.001066979 -0.13119027 -0.086396045 0.065396461
## UGDS_2MOR 0.12429801 0.138389106 -0.05133611 -0.192585413 -0.015284117
## UGDS_NRA 0.37779987 0.347371663 -0.09341802 -0.184642217 0.005078044
## UGDS_UNKN -0.08364071 -0.109183967 -0.10052108 0.001003768 0.004694209
## PPTUG_EF -0.35153340 -0.085961363 -0.14431289 0.071264897 0.171911489
## NPT4_PUB 0.31771591 0.135992698 0.39834227 -0.163834673 -0.377264147
## COSTT4_A 0.46920625 0.302641965 0.15132507 -0.136563560 -0.220627412
## TUITFTE 0.58279415 0.433659734 0.19945712 -0.252478054 -0.174897510
## INEXPFTE 0.55430827 0.384325956 -0.05105544 -0.130741289 -0.030121396
## PFTFAC 0.10958412 0.042687425 0.05435934 0.107788393 -0.149548348
## PCTPELL -0.65957486 -0.273657783 -0.69371993 0.654815936 0.282475796
## C150_4 0.78142427 0.514879202 0.23372738 -0.397500767 -0.042626820
## PFTFTUG1_EF 0.28721436 0.133212902 0.11317248 -0.032998226 -0.048442764
## RET_FT4 0.72878214 0.549258224 0.04876680 -0.321188726 0.125943239
## PCTFLOAN -0.46041860 -0.342148794 0.03622592 0.460358093 -0.439499529
## UGDS_ASIAN UGDS_AIAN UGDS_NHPI UGDS_2MOR
## SAT_AVG 0.38651147 -0.082830022 -0.043278613 0.124298008
## UGDS 0.41156915 -0.133559424 0.001066979 0.138389106
## UGDS_WHITE -0.37336043 0.027798025 -0.131190270 -0.051336112
## UGDS_BLACK -0.18084696 -0.095940606 -0.086396045 -0.192585413
## UGDS_HISP 0.27272630 -0.053373262 0.065396461 -0.015284117
## UGDS_ASIAN 1.00000000 -0.116785970 0.266727380 0.281937487
## UGDS_AIAN -0.11678597 1.000000000 0.010315352 0.307271522
## UGDS_NHPI 0.26672738 0.010315352 1.000000000 0.571383114
## UGDS_2MOR 0.28193749 0.307271522 0.571383114 1.000000000
## UGDS_NRA 0.42598567 0.005045546 0.064387585 0.089685519
## UGDS_UNKN -0.00507649 -0.033526916 -0.023358338 -0.097039279
## PPTUG_EF -0.06946890 0.081335340 0.069946628 0.007798176
## NPT4_PUB -0.05256104 -0.154090495 -0.085541207 0.009771022
## COSTT4_A 0.29903479 -0.178278159 -0.045239423 0.101329394
## TUITFTE 0.27996082 -0.142879791 -0.012605298 0.095467069
## INEXPFTE 0.41980354 -0.093762630 0.088012031 0.159027870
## PFTFAC -0.18465547 0.078437848 -0.038804571 0.021200933
## PCTPELL -0.04505273 0.043847855 0.037121709 -0.120991264
## C150_4 0.38598229 -0.175868004 -0.028328245 0.095337175
## PFTFTUG1_EF -0.08060438 -0.101511613 -0.149895984 -0.080394974
## RET_FT4 0.47642548 -0.229793990 -0.032766291 0.064350361
## PCTFLOAN -0.37949754 -0.052383633 -0.127588575 -0.170400829
## UGDS_NRA UGDS_UNKN PPTUG_EF NPT4_PUB
## SAT_AVG 0.377799869 -0.083640707 -0.351533401 0.317715914
## UGDS 0.347371663 -0.109183967 -0.085961363 0.135992698
## UGDS_WHITE -0.093418015 -0.100521076 -0.144312888 0.398342269
## UGDS_BLACK -0.184642217 0.001003768 0.071264897 -0.163834673
## UGDS_HISP 0.005078044 0.004694209 0.171911489 -0.377264147
## UGDS_ASIAN 0.425985670 -0.005076490 -0.069468896 -0.052561037
## UGDS_AIAN 0.005045546 -0.033526916 0.081335340 -0.154090495
## UGDS_NHPI 0.064387585 -0.023358338 0.069946628 -0.085541207
## UGDS_2MOR 0.089685519 -0.097039279 0.007798176 0.009771022
## UGDS_NRA 1.000000000 -0.058289096 -0.100386320 0.000410706
## UGDS_UNKN -0.058289096 1.000000000 0.008009207 0.017036177
## PPTUG_EF -0.100386320 0.008009207 1.000000000 -0.347802872
## NPT4_PUB 0.000410706 0.017036177 -0.347802872 1.000000000
## COSTT4_A 0.184646253 -0.012697565 -0.434164032 0.796032414
## TUITFTE 0.335920337 0.053270804 -0.305745059 0.537300149
## INEXPFTE 0.374823962 0.044938282 -0.293718218 0.224208270
## PFTFAC -0.004535840 -0.099374508 -0.097530724 -0.018380517
## PCTPELL -0.209884488 0.024997107 0.211363497 -0.494496352
## C150_4 0.295458637 -0.039426317 -0.593702072 0.493662415
## PFTFTUG1_EF 0.016583615 -0.107769296 -0.719981174 0.337828069
## RET_FT4 0.298238582 -0.049125507 -0.448758245 0.307557898
## PCTFLOAN -0.305000206 0.035227068 -0.158976596 0.301282528
## COSTT4_A TUITFTE INEXPFTE PFTFAC PCTPELL
## SAT_AVG 0.469206247 0.58279415 0.55430827 0.109584117 -0.65957486
## UGDS 0.302641965 0.43365973 0.38432596 0.042687425 -0.27365778
## UGDS_WHITE 0.151325068 0.19945712 -0.05105544 0.054359338 -0.69371993
## UGDS_BLACK -0.136563560 -0.25247805 -0.13074129 0.107788393 0.65481594
## UGDS_HISP -0.220627412 -0.17489751 -0.03012140 -0.149548348 0.28247580
## UGDS_ASIAN 0.299034792 0.27996082 0.41980354 -0.184655466 -0.04505273
## UGDS_AIAN -0.178278159 -0.14287979 -0.09376263 0.078437848 0.04384786
## UGDS_NHPI -0.045239423 -0.01260530 0.08801203 -0.038804571 0.03712171
## UGDS_2MOR 0.101329394 0.09546707 0.15902787 0.021200933 -0.12099126
## UGDS_NRA 0.184646253 0.33592034 0.37482396 -0.004535840 -0.20988449
## UGDS_UNKN -0.012697565 0.05327080 0.04493828 -0.099374508 0.02499711
## PPTUG_EF -0.434164032 -0.30574506 -0.29371822 -0.097530724 0.21136350
## NPT4_PUB 0.796032414 0.53730015 0.22420827 -0.018380517 -0.49449635
## COSTT4_A 1.000000000 0.69928402 0.44799088 0.004901918 -0.37535448
## TUITFTE 0.699284022 1.00000000 0.60919524 0.073127354 -0.51478553
## INEXPFTE 0.447990875 0.60919524 1.00000000 0.050295295 -0.29170949
## PFTFAC 0.004901918 0.07312735 0.05029529 1.000000000 -0.04228121
## PCTPELL -0.375354477 -0.51478553 -0.29170949 -0.042281212 1.00000000
## C150_4 0.650950779 0.62112656 0.52552835 0.057388350 -0.59873803
## PFTFTUG1_EF 0.360493250 0.29288779 0.18335455 0.194340342 -0.14403443
## RET_FT4 0.488599391 0.51540941 0.48833245 -0.026693903 -0.45788466
## PCTFLOAN 0.162689205 -0.06821649 -0.20907570 0.062916419 0.36195869
## C150_4 PFTFTUG1_EF RET_FT4 PCTFLOAN
## SAT_AVG 0.78142427 0.28721436 0.72878214 -0.46041860
## UGDS 0.51487920 0.13321290 0.54925822 -0.34214879
## UGDS_WHITE 0.23372738 0.11317248 0.04876680 0.03622592
## UGDS_BLACK -0.39750077 -0.03299823 -0.32118873 0.46035809
## UGDS_HISP -0.04262682 -0.04844276 0.12594324 -0.43949953
## UGDS_ASIAN 0.38598229 -0.08060438 0.47642548 -0.37949754
## UGDS_AIAN -0.17586800 -0.10151161 -0.22979399 -0.05238363
## UGDS_NHPI -0.02832825 -0.14989598 -0.03276629 -0.12758857
## UGDS_2MOR 0.09533718 -0.08039497 0.06435036 -0.17040083
## UGDS_NRA 0.29545864 0.01658361 0.29823858 -0.30500021
## UGDS_UNKN -0.03942632 -0.10776930 -0.04912551 0.03522707
## PPTUG_EF -0.59370207 -0.71998117 -0.44875825 -0.15897660
## NPT4_PUB 0.49366241 0.33782807 0.30755790 0.30128253
## COSTT4_A 0.65095078 0.36049325 0.48859939 0.16268921
## TUITFTE 0.62112656 0.29288779 0.51540941 -0.06821649
## INEXPFTE 0.52552835 0.18335455 0.48833245 -0.20907570
## PFTFAC 0.05738835 0.19434034 -0.02669390 0.06291642
## PCTPELL -0.59873803 -0.14403443 -0.45788466 0.36195869
## C150_4 1.00000000 0.44273254 0.84736375 -0.19212234
## PFTFTUG1_EF 0.44273254 1.00000000 0.36814206 0.15030786
## RET_FT4 0.84736375 0.36814206 1.00000000 -0.31379828
## PCTFLOAN -0.19212234 0.15030786 -0.31379828 1.00000000
The top 5 strongest correlations in order are:
1. Average cost of attendance vs. average price for Title IV public institutions
2. Completion rate vs. average SAT score
3. Full-time undergraduates vs. part-time undergraduates
4. Retention rate vs. average SAT score
5. Retention rate vs. Completion rate
# Code
# Finds the top 5 largest values
library(reshape)
z <- cor(alotofdata[, c(8:20,22:30)], use = "pairwise.complete.obs", method = "pearson")
x <- subset(melt(cor(z)), value != 1 | value != NA)
xl <- x[with(x, order(-abs(x$value))),]
xl [1:10, ]
## X1 X2 value
## 278 COSTT4_A NPT4_PUB 0.9446102
## 299 NPT4_PUB COSTT4_A 0.9446102
## 19 C150_4 SAT_AVG 0.9364503
## 397 SAT_AVG C150_4 0.9364503
## 262 PFTFTUG1_EF PPTUG_EF -0.9287725
## 430 PPTUG_EF PFTFTUG1_EF -0.9287725
## 21 RET_FT4 SAT_AVG 0.9285524
## 441 SAT_AVG RET_FT4 0.9285524
## 417 RET_FT4 C150_4 0.9148492
## 459 C150_4 RET_FT4 0.9148492
The top 5 weakest correlations in order are:
1. Enrollments of unknown race vs. Enrollments of Native Hawaiian/Pacific Islander
2. Full-time undergraduates vs. Enrollments of students who have two or more races
3. Net tuition revenue per full-time student vs. Enrollments of unknown race
4. Part-time undergraduates vs. Enrollments of students who have two or more races
5. Average cost of attendance vs. Enrollments of unknown race
# Code
# Finds the top 5 smallest values
library(reshape)
z <- cor(alotofdata[, c(8:20,22:30)], use = "pairwise.complete.obs", method = "pearson")
x <- subset(melt(cor(z)), value != 1 | value != NA)
xs <- x[with(x, order(abs(x$value))),]
xs [1:10, ]
## X1 X2 value
## 165 UGDS_UNKN UGDS_NHPI 0.0002322099
## 228 UGDS_NHPI UGDS_UNKN 0.0002322099
## 196 PFTFTUG1_EF UGDS_2MOR -0.0114305185
## 427 UGDS_2MOR PFTFTUG1_EF -0.0114305185
## 235 TUITFTE UGDS_UNKN -0.0127101627
## 319 UGDS_UNKN TUITFTE -0.0127101627
## 188 PPTUG_EF UGDS_2MOR 0.0137433720
## 251 UGDS_2MOR PPTUG_EF 0.0137433720
## 234 COSTT4_A UGDS_UNKN 0.0153987678
## 297 UGDS_UNKN COSTT4_A 0.0153987678
3․ What is the strongest relation in the data? What are the variables involved and what is the correlation? What is the weakest relation in the data? What are the variables involved and what is the correlation?
The strongest correlation is between Average cost of attendance and average price for Title IV public institutions. The correlation is 94.46102%.
# Code
# Finds the strongest correlation
library(reshape)
z <- cor(alotofdata[, c(8:20,22:30)], use = "pairwise.complete.obs", method = "pearson")
x <- subset(melt(cor(z)), value != 1 | value != NA)
xl <- x[with(x, order(-abs(x$value))),]
xl [1, ]
## X1 X2 value
## 278 COSTT4_A NPT4_PUB 0.9446102
The weakest correlation is between Enrollments of unknown race and enrollments of Native Hawaiian/Pacific Islander. The correlation is 0.02322099%.
# Code
# Finds the weakest correlation
library(reshape)
z <- cor(alotofdata[, c(8:20,22:30)], use = "pairwise.complete.obs", method = "pearson")
x <- subset(melt(cor(z)), value != 1 | value != NA)
xs <- x[with(x, order(abs(x$value))),]
xs [1, ]
## X1 X2 value
## 165 UGDS_UNKN UGDS_NHPI 0.0002322099
Run simple regression analyses as outlined below. For each one, report the results of your analyses including both a visual representation, numerical summary, and a written interpretation.
- Predict the average cost of attendance from average SAT score.
# Code
# Visualize the line in the scatterplot of points
library(lm.beta)
plot(alotofdata$SAT_AVG, alotofdata$COSTT4_A, ylim = c(0, 70000), main = 'Scatterplot of Average SAT Score and Average Cost of Attendance', xlab = 'Average SAT score', ylab = 'Average cost of attendance')
abline(lm(alotofdata$COSTT4_A ~ alotofdata$SAT_AVG))
# Code
# Run a simple regression analysis in which we predict COSTT4_A from SAT_AVG
library(lm.beta)
reg <- lm(COSTT4_A ~ SAT_AVG, data=alotofdata)
summary(reg)
##
## Call:
## lm(formula = COSTT4_A ~ SAT_AVG, data = alotofdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32053 -10000 942 9354 25867
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -22454.499 2520.612 -8.908 <2e-16 ***
## SAT_AVG 51.901 2.361 21.984 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11350 on 1296 degrees of freedom
## (6405 observations deleted due to missingness)
## Multiple R-squared: 0.2716, Adjusted R-squared: 0.2711
## F-statistic: 483.3 on 1 and 1296 DF, p-value: < 2.2e-16
coefficients(reg)
## (Intercept) SAT_AVG
## -22454.49939 51.90104
lm.beta(reg)
##
## Call:
## lm(formula = COSTT4_A ~ SAT_AVG, data = alotofdata)
##
## Standardized Coefficients::
## (Intercept) SAT_AVG
## 0.0000000 0.5211713
We are running a Simple Linear Regression analysis to predict average cost of attendance from average SAT score.
The visual representation is a scatterplot all the data points from average cost of attendance and average SAT score.
The line in the middle represents the regression line. This line represents the line of best fit for predicting average cost of attendance from average SAT score.
The numerical representation is a summary of all of the values that desscribe our linear regression model.
Our Model for this problem is: Average cost of attendance = B0 + B1 X Average SAT score + e
The B0 = -22454.49939. This means if average SAT score is 0, then the average cost of attendance on average is -22454.49939.
The B1 = 51.90104. This is the numerical relationship between Y and X. This means that given one unit increase in average SAT score, the expected change in average cost of attendance on average is 51.90104.
The Standard error is 2.361. This is the measure of accuracy of predictions. This means that when you take samples from the population, then this is the estimated variability of the variable.
The Residual standard error is 11350. This is how good the model is at predicting the average cost of attendance from the average SAT score. Our residual standard error shows that this model is off on average by 11350. The closer to 0, the better the model fits.
The R^2 is 0.2711. This is the coefficient of determination, or the percentage of variance in average cost of attendance that can be explained given the average SAT score. This is the ratio of explained variance vs total variance. For our model, 27.11% of the variance in average cost of attendance can be explained given the average SAT score.
The p-value indicates that 51.901 is statistically significant.
- Predict the average cost of attendance from admission rate.
# Code
# Predict COSTT4_A from ADM_RATE
# Visualize the line in the scatterplot of points
library(lm.beta)
plot(alotofdata$ADM_RATE, alotofdata$COSTT4_A, ylim = c(0, 70000), main = 'Scatterplot of Admission Rate and Average Cost of Attendance', xlab = 'Admission Rate', ylab = 'Average Cost of Attendance')
abline(lm(alotofdata$COSTT4_A ~ alotofdata$ADM_RATE))
# Code
# Run a simple regression analysis in which we predict COSTT4_A from ADM_RATE
library(lm.beta)
reg <- lm(COSTT4_A ~ ADM_RATE, data=alotofdata)
summary(reg)
##
## Call:
## lm(formula = COSTT4_A ~ ADM_RATE, data = alotofdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31765 -9986 -1323 9705 36241
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 43930.1 981.5 44.76 <2e-16 ***
## ADM_RATE -17799.4 1380.0 -12.90 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12620 on 1994 degrees of freedom
## (5707 observations deleted due to missingness)
## Multiple R-squared: 0.07701, Adjusted R-squared: 0.07654
## F-statistic: 166.4 on 1 and 1994 DF, p-value: < 2.2e-16
coefficients(reg)
## (Intercept) ADM_RATE
## 43930.13 -17799.40
lm.beta(reg)
##
## Call:
## lm(formula = COSTT4_A ~ ADM_RATE, data = alotofdata)
##
## Standardized Coefficients::
## (Intercept) ADM_RATE
## 0.0000000 -0.2775023
We are running a Simple Linear Regression analysis to predict average cost of attendance from admission rate.
The visual representation is a scatterplot all the data points from average cost of attendance and admission rate.
The line in the middle represents the regression line. This line represents the line of best fit for predicting average cost of attendance from admission rate.
The numerical representation is a summary of all of the values that describe our linear regression model.
Our Model for this problem is: Average cost of attendance = B0 + B1 X Admission rate + e
The B0 = 43930.1. This means if the admission rate is 0, then the average cost of attendance on average is 43930.1.
The B1 = -17799.4. This is the numerical relationship between Y and X. This means that given one unit increase in admission rate, the expected change in average cost of attendance on average is -17799.4.
The Standard error is 1380.0. This is the measure of accuracy of predictions. This means that when you take samples from the population, then this is the estimated variability of the variable.
The Residual standard error is 12620. This is how good the model is at predicting the average cost of attendance from the admission rate. Our residual standard error shows that this model is off on average by 12620. The closer to 0, the better the model fits.
The R^2 is 0.07654. This is the coefficient of determination, or the percentage of variance in average cost of attendance that can be explained given the admission rate. This is the ratio of explained variance vs total variance. For our model, 07.654% of the variance in average cost of attendance can be explained given the admission rate.
The p-value indicates that -17799.4 is statistically significant.
- Predict the number of students from average SAT score
# Code
# Predict UGDS from SAT_AVG
# Visualize the line in the scatterplot of points
library(lm.beta)
plot(alotofdata$SAT_AVG, alotofdata$UGDS, ylim = c(0, 55000), main = 'Scatterplot of Average SAT Score and Number of Students', xlab = 'Average SAT Score', ylab = 'Number of Students')
abline(lm(alotofdata$UGDS ~ alotofdata$SAT_AVG))
# Code
# Run a simple regression analysis in which we predict UGDS from SAT_AVG
library(lm.beta)
reg <- lm(UGDS ~ SAT_AVG, data=alotofdata)
summary(reg)
##
## Call:
## lm(formula = UGDS ~ SAT_AVG, data = alotofdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11197 -3981 -2485 1077 45035
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8999.262 1573.214 -5.720 1.32e-08 ***
## SAT_AVG 13.708 1.474 9.301 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7095 on 1302 degrees of freedom
## (6399 observations deleted due to missingness)
## Multiple R-squared: 0.06231, Adjusted R-squared: 0.06159
## F-statistic: 86.51 on 1 and 1302 DF, p-value: < 2.2e-16
coefficients(reg)
## (Intercept) SAT_AVG
## -8999.26201 13.70842
lm.beta(reg)
##
## Call:
## lm(formula = UGDS ~ SAT_AVG, data = alotofdata)
##
## Standardized Coefficients::
## (Intercept) SAT_AVG
## 0.0000000 0.2496111
We are running a Simple Linear Regression analysis to predict number of students from average SAT score.
The visual representation is a scatterplot all the data points from number of students and average SAT score.
The line in the middle represents the regression line. This line represents the line of best fit for predicting number of students from average SAT score.
The numerical representation is a summary of all of the values that describe our linear regression model.
Our Model for this problem is: Number of students = B0 + B1 X Average SAT score + e
The B0 = -8999.262. This means if the average SAT score is 0, then the number of students on average is -8999.262.
The B1 = 13.708. This is the numerical relationship between Y and X. This means that given one unit increase in average SAT score, the expected change in number of students on average is 13.708.
The Standard error is 1.474. This is the measure of accuracy of predictions. This means that when you take samples from the population, then this is the estimated variability of the variable.
The Residual standard error is 7095. This is how good the model is at predicting the number of students from the average SAT score. Our residual standard error shows that this model is off on average by 7095. The closer to 0, the better the model fits.
The R^2 is 0.06159. This is the coefficient of determination, or the percentage of variance in number of students that can be explained given the average SAT score. This is the ratio of explained variance vs total variance. For our model, 06.159% of the variance number of students can be explained given average SAT score.
- Predict the number of students from admission rate
# Code
# Predict UGDS from ADM_RATE
# Visualize the line in the scatterplot of points
library(lm.beta)
plot(alotofdata$ADM_RATE, alotofdata$UGDS, ylim = c(0, 55000), main = 'Scatterplot of Admission Rate and Number of Students', xlab = 'Admission Rate', ylab = 'Number of Students')
abline(lm(alotofdata$UGDS ~ alotofdata$ADM_RATE))
# Code
# Run a simple regression analysis in which we predict UGDS from ADM_RATE
library(lm.beta)
reg <- lm(UGDS ~ ADM_RATE, data=alotofdata)
summary(reg)
##
## Call:
## lm(formula = UGDS ~ ADM_RATE, data = alotofdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6377 -3081 -2259 81 47716
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6474.7 462.7 13.994 < 2e-16 ***
## ADM_RATE -3852.2 640.6 -6.013 2.12e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6273 on 2196 degrees of freedom
## (5505 observations deleted due to missingness)
## Multiple R-squared: 0.0162, Adjusted R-squared: 0.01575
## F-statistic: 36.16 on 1 and 2196 DF, p-value: 2.122e-09
coefficients(reg)
## (Intercept) ADM_RATE
## 6474.726 -3852.210
lm.beta(reg)
##
## Call:
## lm(formula = UGDS ~ ADM_RATE, data = alotofdata)
##
## Standardized Coefficients::
## (Intercept) ADM_RATE
## 0.0000000 -0.1272796
We are running a Simple Linear Regression analysis to predict number of students from admission rate.
The visual representation is a scatterplot all the data points from number of students and admission rate.
The line in the middle represents the regression line. This line represents the line of best fit for predicting number of students from admission rate.
The numerical representation is a summary of all of the values that describe our linear regression model.
Our Model for this problem is: Number of students = B0 + B1 X admission rate + e
The B0 = 6474.7. This means if the admission rate is 0, then the number of students on average is 6474.7.
The B1 = -3852.2. This is the numerical relationship between Y and X. This means that given one unit increase in admission rate, the expected change in number of students on average is -3852.2.
The Standard error is 640.6. This is the measure of accuracy of predictions. This means that when you take samples from the population, then this is the estimated variability of the variable.
The Residual standard error is 6273. This is how good the model is at predicting the number of students from the admission rate. Our residual standard error shows that this model is off on average by 6273. The closer to 0, the better the model fits.
The R^2 is 0.01575. This is the coefficient of determination, or the percentage of variance in number of students that can be explained given the admission rate. This is the ratio of explained variance vs total variance. For our model, 01.575% of the variance in number of students can be explained given admission rate.
- Predict completion rate from average SAT score
# Code
# Predict C150_4 from SAT_AVG
# Visualize the line in the scatterplot of points
library(lm.beta)
plot(alotofdata$SAT_AVG, alotofdata$C150_4, ylim = c(0, 1), main = 'Scatterplot of Average SAT Score and Completion Rate', xlab = 'Average SAT Score', ylab = 'Completion Rate')
abline(lm(alotofdata$C150_4 ~ alotofdata$SAT_AVG))
# Code
# Run a simple regression analysis in which we predict C150_4 from SAT_AVG
library(lm.beta)
reg <- lm(C150_4 ~ SAT_AVG, data=alotofdata)
summary(reg)
##
## Call:
## lm(formula = C150_4 ~ SAT_AVG, data = alotofdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.52185 -0.06537 0.00721 0.06898 0.46134
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5741736 0.0237624 -24.16 <2e-16 ***
## SAT_AVG 0.0010598 0.0000222 47.74 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1049 on 1269 degrees of freedom
## (6432 observations deleted due to missingness)
## Multiple R-squared: 0.6424, Adjusted R-squared: 0.6421
## F-statistic: 2279 on 1 and 1269 DF, p-value: < 2.2e-16
coefficients(reg)
## (Intercept) SAT_AVG
## -0.574173650 0.001059838
lm.beta(reg)
##
## Call:
## lm(formula = C150_4 ~ SAT_AVG, data = alotofdata)
##
## Standardized Coefficients::
## (Intercept) SAT_AVG
## 0.0000000 0.8014783
We are running a Simple Linear Regression analysis to predict completion rate from average SAT score.
The visual representation is a scatterplot all the data points from completion rate and average SAT score.
The line in the middle represents the regression line. This line represents the line of best fit for predicting completion rate from average SAT score.
The numerical representation is a summary of all of the values that describe our linear regression model.
Our Model for this problem is: Completion rate = B0 + B1 X Average SAT score + e
The B0 = -0.5741736. This means if the average SAT score is 0, then the completion rate on average is -0.5741736.
The B1 = 0.0010598. This is the numerical relationship between Y and X. This means that given one unit increase in average SAT score, the expected change in the completion rate on average is 0.0010598.
The Standard error is 0.0000222. This is the measure of accuracy of predictions. This means that when you take samples from the population, then this is the estimated variability of the variable.
The Residual standard error is 0.1049. This is how good the model is at predicting the completion rate from average SAT scores. Our residual standard error shows that this model is off on average by 0.1049. The closer to 0, the better the model fits.
The R^2 is 0.6421. This is the coefficient of determination, or the percentage of variance in completion rate that can be explained given the average SAT score. This is the ratio of explained variance vs total variance. For our model, 64.21% of the variance in completion rate can be explained given average SAT score.
- Predict completion rate from admission rate
# Code
# Predict C150_4 from ADM_RATE
# Visualize the line in the scatterplot of points
library(lm.beta)
plot(alotofdata$ADM_RATE, alotofdata$C150_4, ylim = c(0, 1), main = 'Scatterplot of Admission Rate and Completion Rate', xlab = 'Admission Rate', ylab = 'Completion Rate')
abline(lm(alotofdata$C150_4 ~ alotofdata$ADM_RATE))
# Code
# Run a simple regression analysis in which we predict C150_4 from ADM_RATE
library(lm.beta)
reg <- lm(C150_4 ~ ADM_RATE, data=alotofdata)
summary(reg)
##
## Call:
## lm(formula = C150_4 ~ ADM_RATE, data = alotofdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.58476 -0.12368 0.00175 0.13908 0.57278
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.73479 0.01509 48.68 <2e-16 ***
## ADM_RATE -0.30757 0.02143 -14.36 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.182 on 1804 degrees of freedom
## (5897 observations deleted due to missingness)
## Multiple R-squared: 0.1025, Adjusted R-squared: 0.102
## F-statistic: 206.1 on 1 and 1804 DF, p-value: < 2.2e-16
coefficients(reg)
## (Intercept) ADM_RATE
## 0.7347908 -0.3075745
lm.beta(reg)
##
## Call:
## lm(formula = C150_4 ~ ADM_RATE, data = alotofdata)
##
## Standardized Coefficients::
## (Intercept) ADM_RATE
## 0.0000000 -0.3201865
We are running a Simple Linear Regression analysis to predict completion rate from admission rate.
The visual representation is a scatterplot all the data points from completion rate and admission rate.
The line in the middle represents the regression line. This line represents the line of best fit for predicting completion rate from admission rate.
The numerical representation is a summary of all of the values that describe our linear regression model.
Our Model for this problem is: Completion rate = B0 + B1 X Admission rate + e
The B0 = 0.73479. This means if the admission rate is 0, then the completion rate on average is 0.73479.
The B1 = -0.30757. This is the numerical relationship between Y and X. This means that given one unit increase in the admission rate, the expected change in the completion rate on average is -0.30757.
The Standard error is 0.02143. This is the measure of accuracy of predictions. This means that when you take samples from the population, then this is the estimated variability of the variable.
The Residual standard error is 0.182. This is how good the model is at predicting the completion rate from the admission rate. Our residual standard error shows that this model is off on average by 0.182. The closer to 0, the better the model fits.
The R^2 is 0.102. This is the coefficient of determination, or the percentage of variance in completion rate that can be explained given the admission rate. This is the ratio of explained variance vs total variance. For our model, 10.2% of the variance in completion rate can be explained given admission rate.
- Predict the percentage of students with federal loans from average SAT score
# Predict PCTFLOAN from SAT_AVG
# Visualize the line in the scatterplot of points
library(lm.beta)
plot(alotofdata$SAT_AVG, alotofdata$PCTFLOAN, ylim = c(0, 1), main = 'Scatterplot of Average SAT Score and % of Students with Fed. Loans', xlab = 'Average SAT Score', ylab = '% of Students with Fed. Loans')
abline(lm(alotofdata$PCTFLOAN ~ alotofdata$SAT_AVG))
# Code
# Run a simple regression analysis in which we predict PCTFLOAN from SAT_AVG
library(lm.beta)
reg <- lm(PCTFLOAN ~ SAT_AVG, data=alotofdata)
summary(reg)
##
## Call:
## lm(formula = PCTFLOAN ~ SAT_AVG, data = alotofdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.72338 -0.07918 0.01720 0.10087 0.34595
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.276e+00 3.239e-02 39.4 <2e-16 ***
## SAT_AVG -6.493e-04 3.035e-05 -21.4 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.146 on 1300 degrees of freedom
## (6401 observations deleted due to missingness)
## Multiple R-squared: 0.2604, Adjusted R-squared: 0.2599
## F-statistic: 457.8 on 1 and 1300 DF, p-value: < 2.2e-16
coefficients(reg)
## (Intercept) SAT_AVG
## 1.2759380530 -0.0006493094
lm.beta(reg)
##
## Call:
## lm(formula = PCTFLOAN ~ SAT_AVG, data = alotofdata)
##
## Standardized Coefficients::
## (Intercept) SAT_AVG
## 0.0000000 -0.5103375
We are running a Simple Linear Regression analysis to predict percentage of students with federal loans from average SAT score.
The visual representation is a scatterplot all the data points from percentage of students with federal loans and average SAT score.
The line in the middle represents the regression line. This line represents the line of best fit for predicting percentage of students with federal loans from average SAT score.
The numerical representation is a summary of all of the values that describe our linear regression model.
Our Model for this problem is: % of students with federal loans = B0 + B1 X Average SAT score + e
The B0 = 1.27593805. This means if the average SAT score is 0, then the percentage of students with federal loans on average is 1.27593805.
The B1 = -0.00064931. This is the numerical relationship between Y and X. This means that given one unit increase in the average SAT score, the expected change in the percentage of students with federal loans on average is -0.00064931.
The Standard error is 0.00003035. This is the measure of accuracy of predictions. This means that when you take samples from the population, then this is the estimated variability of the variable.
The Residual standard error is 0.146. This is how good the model is at predicting the percentage of students with federal loans from average SAT score. Our residual standard error shows that this model is off on average by 0.146. The closer to 0, the better the model fits.
The R^2 is 0.2599. This is the coefficient of determination, or the percentage of variance in the percentage of students with federal loans that can be explained given the average sAT score. This is the ratio of explained variance vs total variance. 25.99% of the variance in percentage of students with federal loans can be explained given average SAT score.
- Predict the percentage of students with federal loans from admission rate
# Predict PCTFLOAN from ADM_RATE
# Visualize the line in the scatterplot of points
library(lm.beta)
plot(alotofdata$ADM_RATE, alotofdata$PCTFLOAN, ylim = c(0, 1), main = 'Scatterplot of Admission Rate and % of Students with Fed. Loans', xlab = 'Admission Rate', ylab = '% of Students with Fed. Loans')
abline(lm(alotofdata$PCTFLOAN ~ alotofdata$ADM_RATE))
# Code
# Run a simple regression analysis in which we predict PCTFLOAN from ADM_RATE
library(lm.beta)
reg <- lm(PCTFLOAN ~ ADM_RATE, data=alotofdata)
summary(reg)
##
## Call:
## lm(formula = PCTFLOAN ~ ADM_RATE, data = alotofdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.62435 -0.11274 0.03085 0.15665 0.44315
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.51186 0.01603 31.923 < 2e-16 ***
## ADM_RATE 0.11249 0.02219 5.069 4.32e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2168 on 2193 degrees of freedom
## (5508 observations deleted due to missingness)
## Multiple R-squared: 0.01158, Adjusted R-squared: 0.01113
## F-statistic: 25.7 on 1 and 2193 DF, p-value: 4.324e-07
coefficients(reg)
## (Intercept) ADM_RATE
## 0.5118554 0.1124929
lm.beta(reg)
##
## Call:
## lm(formula = PCTFLOAN ~ ADM_RATE, data = alotofdata)
##
## Standardized Coefficients::
## (Intercept) ADM_RATE
## 0.0000000 0.1076242
We are running a Simple Linear Regression analysis to predict percentage of students with federal loans from admission rate.
The visual representation is a scatterplot all the data points from percentage of students with federal loans and admission rate.
The line in the middle represents the regression line. This line represents the line of best fit for predicting percentage of students with federal loans from admission rate.
The numerical representation is a summary of all of the values that describe our linear regression model.
Our Model for this problem is: % of students with federal loans = B0 + B1 X Admission rate + e
The B0 = 0.51186. This means if the admission rate is 0, then the percentage of students with federal loans on average is 0.51186.
The B1 = 0.11249. This is the numerical relationship between Y and X. This means that given one unit increase in the admission rate, the expected change in the percentage of students with federal loans on average is 0.11249.
The Standard error is 0.02219. This is the measure of accuracy of predictions. This means that when you take samples from the population, then this is the estimated variability of the variable.
The Residual standard error is 0.2168. This is how good the model is at predicting the percentage of students with federal loans from the admission rate. Our residual standard error shows that this model is off on average by 0.2168. The closer to 0, the better the model fits.
The R^2 is 0.01113. This is the coefficient of determination, or the percentage of variance in the percentage of students with federal loans that can be explained given the average sAT score. This is the ratio of explained variance vs total variance. 01.113% of the variance in percentage of students with federal loans can be explained given admission rate.
- Run two additional regression equations using any variables you like from the dataset. Provide an interpretation of the results of these additional analyses.
12.1 Predict the completion rate from the retention rate
# Code
# Predict C150_4 from RET_FT4
# Visualize the line in the scatterplot of points
library(lm.beta)
plot(alotofdata$RET_FT4, alotofdata$C150_4, ylim = c(0, 1), main = 'Scatterplot of Retention Rate and Completion Rate', xlab = 'Retention Rate', ylab = 'Completion Rate')
abline(lm(alotofdata$C150_4 ~ alotofdata$RET_FT4))
# Code
# Run a simple regression analysis in which we predict C150_4 from RET_FT4
library(lm.beta)
reg <- lm(C150_4 ~ RET_FT4, data=alotofdata)
summary(reg)
##
## Call:
## lm(formula = C150_4 ~ RET_FT4, data = alotofdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.67726 -0.10017 0.00425 0.10588 0.97935
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.02065 0.01443 1.431 0.153
## RET_FT4 0.65661 0.01963 33.455 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1729 on 2196 degrees of freedom
## (5505 observations deleted due to missingness)
## Multiple R-squared: 0.3376, Adjusted R-squared: 0.3373
## F-statistic: 1119 on 1 and 2196 DF, p-value: < 2.2e-16
coefficients(reg)
## (Intercept) RET_FT4
## 0.02065477 0.65660875
lm.beta(reg)
##
## Call:
## lm(formula = C150_4 ~ RET_FT4, data = alotofdata)
##
## Standardized Coefficients::
## (Intercept) RET_FT4
## 0.0000000 0.5810345
We are running a Simple Linear Regression analysis to predict completion rate from retention rate.
The visual representation is a scatterplot all the data points from completion rate and retention rate.
The line in the middle represents the regression line. This line represents the line of best fit for predicting completion rate from retention rate.
The numerical representation is a summary of all of the values that describe our linear regression model.
Our Model for this problem is: Completion rate = B0 + B1 X Retention rate + e
The B0 = 0.02065. This means if the retention rate is 0, then the completion rate on average is 0.02065.
The B1 = 0.65661. This is the numerical relationship between Y and X. This means that given one unit increase in the retention rate, the expected change in the completion rate on average is 0.65661.
The Standard error is 0.01963. This is the measure of accuracy of predictions. This means that when you take samples from the population, then this is the estimated variability of the variable.
The Residual standard error is 0.1729. This is how good the model is at predicting the completion rate from the retention rate. Our residual standard error shows that this model is off on average by 0.1729. The closer to 0, the better the model fits.
The R^2 is 0.3373. This is the coefficient of determination, or the percentage of variance in completion rate that can be explained given the retention rate. This is the ratio of explained variance vs total variance. 33.73% of the variance in pcompletion rate can be explained given retention rate.
12.2 Predict the retention rate from average SAT scores
# Code
# Predict RET_FT4 from SAT_AVG
# Visualize the line in the scatterplot of points
library(lm.beta)
plot(alotofdata$SAT_AVG, alotofdata$RET_FT4, ylim = c(0, 1), main = 'Scatterplot of Average SAT Score and Retention Rate', xlab = 'Average SAT Score', ylab = 'Retention Rate')
abline(lm(alotofdata$RET_FT4 ~ alotofdata$SAT_AVG))
# Code
# Run a simple regression analysis in which we predict RET_FT4 from SAT_AVG
library(lm.beta)
reg <- lm(RET_FT4 ~ SAT_AVG, data=alotofdata)
summary(reg)
##
## Call:
## lm(formula = RET_FT4 ~ SAT_AVG, data = alotofdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.51482 -0.04275 0.00746 0.05047 0.22636
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.803e-02 1.727e-02 5.097 3.98e-07 ***
## SAT_AVG 6.355e-04 1.613e-05 39.396 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07611 on 1267 degrees of freedom
## (6434 observations deleted due to missingness)
## Multiple R-squared: 0.5506, Adjusted R-squared: 0.5502
## F-statistic: 1552 on 1 and 1267 DF, p-value: < 2.2e-16
coefficients(reg)
## (Intercept) SAT_AVG
## 0.0880345367 0.0006355223
lm.beta(reg)
##
## Call:
## lm(formula = RET_FT4 ~ SAT_AVG, data = alotofdata)
##
## Standardized Coefficients::
## (Intercept) SAT_AVG
## 0.0000000 0.7419929
We are running a Simple Linear Regression analysis to predict retention rate from average SAT scores.
The visual representation is a scatterplot all the data points from retention rate and average SAT scores.
The line in the middle represents the regression line. This line represents the line of best fit for predicting retention rate from average SAT scores.
The numerical representation is a summary of all of the values that describe our linear regression model.
Our Model for this problem is: Retention rate = B0 + B1 X Average SAT score + e
The B0 = 0.08803454. This means if the average SAT score is 0, then the retention rate on average is 0.08803454.
The B1 = 0.00063552. This is the numerical relationship between Y and X. This means that given one unit increase in average SAT score, the expected change in the retention rate on average is 0.00063552.
The Standard error is 0.00001613. This is the measure of accuracy of predictions. This means that when you take samples from the population, then this is the estimated variability of the variable.
The Residual standard error is 0.07611. This is how good the model is at predicting the retention rate from average SAT scores. Our residual standard error shows that this model is off on average by 0.07611. The closer to 0, the better the model fits.
The R^2 is 0.5502. This is the coefficient of determination, or the percentage of variance in retention rate that can be explained given average SAT score. This is the ratio of explained variance vs total variance. 55.02% of the variance in retention rate can be explained given average SAT scores.
- Are the distributions of average SAT score, admission rate, and total number of undergraduate students normal or non-normal? What information did you use to answer this question?
# Code
library(ggplot2)
hist.SAT <- ggplot(alotofdata, aes(SAT_AVG)) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "white") + labs(x = "Average SAT Score", y = "Density") +
stat_function(fun = dnorm, args = list(mean = mean(alotofdata$SAT_AVG, na.rm = TRUE), sd = sd(alotofdata$SAT_AVG, na.rm = TRUE)), colour = "red", size =1)
hist.SAT
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Code
#Test for Normality
shapiro.test(alotofdata$SAT_AVG)
##
## Shapiro-Wilk normality test
##
## data: alotofdata$SAT_AVG
## W = 0.95536, p-value < 2.2e-16
No, the distribution of average SAT scores is not a normal distribution because the p-value is less than 0.05.
We are testing if the distribution of average SAT scores is normal using the Shapiro-Wilk normality test. The data is normal if the p-value is above our significance level which is 0.05.
Visually, we can see this because the data falls outside of the normal curve and the mean isn’t perfectly centered.
# Code
library(ggplot2)
hist.SAT <- ggplot(alotofdata, aes(ADM_RATE)) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "white") + labs(x = "Admission Rate", y = "Density") +
stat_function(fun = dnorm, args = list(mean = mean(alotofdata$ADM_RATE, na.rm = TRUE), sd = sd(alotofdata$ADM_RATE, na.rm = TRUE)), colour = "red", size =1)
hist.SAT
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Code
#Test for Normality
shapiro.test(alotofdata$ADM_RATE)
##
## Shapiro-Wilk normality test
##
## data: alotofdata$ADM_RATE
## W = 0.9651, p-value < 2.2e-16
No, the distribution of admission rates is not a normal distribution because the p-value is less than 0.05.
We are testing if the distribution of admission rates is normal using the Shapiro-Wilk normality test. The data is normal if the p-value is above our significance level which is 0.05.
# Code
library(ggplot2)
hist.SAT <- ggplot(alotofdata, aes(UGDS)) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "white") + labs(x = "Total number of undergrads", y = "Density") +
stat_function(fun = dnorm, args = list(mean = mean(alotofdata$UGDS, na.rm = TRUE), sd = sd(alotofdata$UGDS, na.rm = TRUE)), colour = "red", size =1)
hist.SAT
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# Code
#Anderson-Darling normality test
library(nortest)
ad.test(alotofdata$UGDS)
##
## Anderson-Darling normality test
##
## data: alotofdata$UGDS
## A = 1243.9, p-value < 2.2e-16
No, the distribution of admission rates is not a normal distribution because the p-value is less than 0.05.
We are testing if the distribution of admission rates is normal using the Anderson-Darling normality test. It’s pretty similar to the previous process we used. The data is normal if the p-value is above our significance level which is 0.05.
- For any of the distributions that were not normal in your previous answer, how could you transform them to be normal? Perform the appropriate transformation and report the mean, standard deviation, and variance of the new transformed variable.
For all of the distributions, we will standardize the distribution. We can transform the distributions with commonly used transformations like square, square root, cube root, logarithm, and reciprocal root.
I applied the cube root transformation to see if it would help. Looks pretty good. Then I standardized it.
# Cube Root
alotofdata$CUBE_SAT_AVG <- alotofdata$SAT_AVG^(1/3)
alotofdata$zCUBE_SAT_AVG <- scale(alotofdata$CUBE_SAT_AVG, center = TRUE, scale = TRUE)
hist.SAT <- ggplot(alotofdata, aes(zCUBE_SAT_AVG)) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "white") + labs(x = "z-score", y = "Density", title = 'Distribution of Average SAT Score') +
stat_function(fun = dnorm, args = list(mean = mean(alotofdata$zCUBE_SAT_AVG , na.rm = TRUE), sd = sd(alotofdata$zCUBE_SAT_AVG , na.rm = TRUE)), colour = "red", size =1)
hist.SAT
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# TEST
shapiro.test(alotofdata$zCUBE_SAT_AVG)
##
## Shapiro-Wilk normality test
##
## data: alotofdata$zCUBE_SAT_AVG
## W = 0.9747, p-value = 2.388e-14
# Describe
psych::describe(alotofdata$zCUBE_SAT_AVG)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 1304 0 1 -0.11 -0.06 0.82 -2.9 3.31 6.21 0.53 0.72
## se
## X1 0.03
Setting the power to 3 should decrease the skew. Standardizing made it look pretty good. It still looks pretty ugly.
# POWER
alotofdata$LOG_ADM_RATE <- (alotofdata$ADM_RATE+1)^3
alotofdata$z_LOG_ADM_RATE <- scale(alotofdata$LOG_ADM_RATE, center = TRUE, scale = TRUE)
# VIsualize
hist.SAT <- ggplot(alotofdata, aes(z_LOG_ADM_RATE)) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "white") + labs(x = "z-score", y = "Density", title = 'Admission Rate') +
stat_function(fun = dnorm, args = list(mean = mean(alotofdata$z_LOG_ADM_RATE , na.rm = TRUE), sd = sd(alotofdata$z_LOG_ADM_RATE , na.rm = TRUE)), colour = "red", size =1)
hist.SAT
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# TEST
shapiro.test(alotofdata$z_LOG_ADM_RATE)
##
## Shapiro-Wilk normality test
##
## data: alotofdata$z_LOG_ADM_RATE
## W = 0.97855, p-value < 2.2e-16
# Describe
psych::describe(alotofdata$z_LOG_ADM_RATE)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 2198 0 1 -0.02 0.01 1.06 -2.37 1.72 4.09 -0.02 -0.76
## se
## X1 0.02
I took the logarithm of the data. It make it look pretty good. Then I scaled it. It is still not normal, but it is normaler.
# Standardize
alotofdata$t_UGDS <- log(alotofdata$UGDS+1)
alotofdata$zUGDS <- scale(alotofdata$t_UGDS , center = TRUE, scale = TRUE)
# psych::describe(alotofdata$zSAT_AVG)
#Print Standardize
hist.SAT <- ggplot(alotofdata, aes(zUGDS)) +
geom_histogram(aes(y = ..density..), colour = "black", fill = "white") + labs(x = "z-score", y = "Density", title = 'Distribution of Number of Students') +
stat_function(fun = dnorm, args = list(mean = mean(alotofdata$zUGDS, na.rm = TRUE), sd = sd(alotofdata$zUGDS, na.rm = TRUE)), colour = "red", size =1)
hist.SAT
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# TEST
library(nortest)
ad.test(alotofdata$zUGDS)
##
## Anderson-Darling normality test
##
## data: alotofdata$zUGDS
## A = 29.854, p-value < 2.2e-16
# Describe
psych::describe(alotofdata$zUGDS)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 6990 0 1 -0.09 -0.03 1.12 -3.35 3.12 6.47 0.18 -0.57
## se
## X1 0.01
- Based on the distribution of scores, what is the probability that the average SAT score for a school is greater than 1400? What is the probability that the average SAT score for a school is less than 800?
The Probability that the average SAT score for a school is greater than 1400 is 2.60736%.
The Probability that the average SAT score for a school is less than 800 is 1.303681%.
## level freq perc cumfreq cumperc
## 1 [700,750] 4e+00 0.3% 4e+00 0.3%
## 2 (750,800] 1e+01 1.0% 2e+01 1.3%
## 3 (800,850] 3e+01 2.0% 4e+01 3.3%
## 4 (850,900] 6e+01 4.8% 1e+02 8.1%
## 5 (900,950] 1e+02 10.0% 2e+02 18.2%
## 6 (950,1e+03] 2e+02 15.4% 4e+02 33.6%
## 7 (1e+03,1.05e+03] 3e+02 21.8% 7e+02 55.4%
## 8 (1.05e+03,1.1e+03] 2e+02 14.0% 9e+02 69.4%
## 9 (1.1e+03,1.15e+03] 1e+02 11.2% 1e+03 80.6%
## 10 (1.15e+03,1.2e+03] 8e+01 6.5% 1e+03 87.1%
## 11 (1.2e+03,1.25e+03] 6e+01 4.2% 1e+03 91.3%
## 12 (1.25e+03,1.3e+03] 3e+01 2.4% 1e+03 93.7%
## 13 (1.3e+03,1.35e+03] 3e+01 2.1% 1e+03 95.8%
## 14 (1.35e+03,1.4e+03] 2e+01 1.6% 1e+03 97.4%
## 15 (1.4e+03,1.45e+03] 2e+01 1.2% 1e+03 98.6%
## 16 (1.45e+03,1.5e+03] 2e+01 1.2% 1e+03 99.8%
## 17 (1.5e+03,1.55e+03] 3e+00 0.2% 1e+03 100.0%
## X Freq Prop CumProp
## 1 720 1 0.0007668712 0.0007668712
## 2 735 2 0.0015337423 0.0023006135
## 3 740 1 0.0007668712 0.0030674847
## 4 758 1 0.0007668712 0.0038343558
## 5 760 2 0.0015337423 0.0053680982
## 6 762 1 0.0007668712 0.0061349693
## 7 773 1 0.0007668712 0.0069018405
## 8 774 1 0.0007668712 0.0076687117
## 9 775 1 0.0007668712 0.0084355828
## 10 776 1 0.0007668712 0.0092024540
## 11 780 1 0.0007668712 0.0099693252
## 12 796 1 0.0007668712 0.0107361963
## 13 798 1 0.0007668712 0.0115030675
## 14 800 2 0.0015337423 0.0130368098
## 15 802 1 0.0007668712 0.0138036810
## 16 803 1 0.0007668712 0.0145705521
## 17 804 1 0.0007668712 0.0153374233
## 18 806 2 0.0015337423 0.0168711656
## 19 808 1 0.0007668712 0.0176380368
## 20 810 2 0.0015337423 0.0191717791
## 21 811 1 0.0007668712 0.0199386503
## 22 812 1 0.0007668712 0.0207055215
## 23 814 1 0.0007668712 0.0214723926
## 24 820 1 0.0007668712 0.0222392638
## 25 825 2 0.0015337423 0.0237730061
## 26 827 1 0.0007668712 0.0245398773
## 27 837 1 0.0007668712 0.0253067485
## 28 840 1 0.0007668712 0.0260736196
## 29 842 1 0.0007668712 0.0268404908
## 30 843 2 0.0015337423 0.0283742331
## 31 845 1 0.0007668712 0.0291411043
## 32 847 1 0.0007668712 0.0299079755
## 33 849 2 0.0015337423 0.0314417178
## 34 850 2 0.0015337423 0.0329754601
## 35 851 2 0.0015337423 0.0345092025
## 36 852 1 0.0007668712 0.0352760736
## 37 853 2 0.0015337423 0.0368098160
## 38 854 1 0.0007668712 0.0375766871
## 39 855 1 0.0007668712 0.0383435583
## 40 857 1 0.0007668712 0.0391104294
## 41 859 2 0.0015337423 0.0406441718
## 42 860 1 0.0007668712 0.0414110429
## 43 862 1 0.0007668712 0.0421779141
## 44 863 1 0.0007668712 0.0429447853
## 45 864 1 0.0007668712 0.0437116564
## 46 866 2 0.0015337423 0.0452453988
## 47 867 1 0.0007668712 0.0460122699
## 48 868 1 0.0007668712 0.0467791411
## 49 870 3 0.0023006135 0.0490797546
## 50 871 1 0.0007668712 0.0498466258
## 51 872 3 0.0023006135 0.0521472393
## 52 873 1 0.0007668712 0.0529141104
## 53 874 1 0.0007668712 0.0536809816
## 54 875 3 0.0023006135 0.0559815951
## 55 876 1 0.0007668712 0.0567484663
## 56 877 1 0.0007668712 0.0575153374
## 57 880 1 0.0007668712 0.0582822086
## 58 881 1 0.0007668712 0.0590490798
## 59 883 1 0.0007668712 0.0598159509
## 60 884 1 0.0007668712 0.0605828221
## 61 885 2 0.0015337423 0.0621165644
## 62 886 4 0.0030674847 0.0651840491
## 63 887 2 0.0015337423 0.0667177914
## 64 890 4 0.0030674847 0.0697852761
## 65 892 1 0.0007668712 0.0705521472
## 66 894 2 0.0015337423 0.0720858896
## 67 895 1 0.0007668712 0.0728527607
## 68 896 1 0.0007668712 0.0736196319
## 69 897 2 0.0015337423 0.0751533742
## 70 899 5 0.0038343558 0.0789877301
## 71 900 3 0.0023006135 0.0812883436
## 72 901 1 0.0007668712 0.0820552147
## 73 902 2 0.0015337423 0.0835889571
## 74 903 1 0.0007668712 0.0843558282
## 75 905 1 0.0007668712 0.0851226994
## 76 906 2 0.0015337423 0.0866564417
## 77 907 2 0.0015337423 0.0881901840
## 78 909 3 0.0023006135 0.0904907975
## 79 910 8 0.0061349693 0.0966257669
## 80 911 1 0.0007668712 0.0973926380
## 81 913 3 0.0023006135 0.0996932515
## 82 914 3 0.0023006135 0.1019938650
## 83 916 1 0.0007668712 0.1027607362
## 84 917 4 0.0030674847 0.1058282209
## 85 918 2 0.0015337423 0.1073619632
## 86 919 2 0.0015337423 0.1088957055
## 87 920 2 0.0015337423 0.1104294479
## 88 921 1 0.0007668712 0.1111963190
## 89 922 1 0.0007668712 0.1119631902
## 90 923 3 0.0023006135 0.1142638037
## 91 924 1 0.0007668712 0.1150306748
## 92 925 2 0.0015337423 0.1165644172
## 93 926 2 0.0015337423 0.1180981595
## 94 927 2 0.0015337423 0.1196319018
## 95 928 4 0.0030674847 0.1226993865
## 96 930 12 0.0092024540 0.1319018405
## 97 931 1 0.0007668712 0.1326687117
## 98 932 2 0.0015337423 0.1342024540
## 99 933 2 0.0015337423 0.1357361963
## 100 934 3 0.0023006135 0.1380368098
## 101 935 2 0.0015337423 0.1395705521
## 102 936 2 0.0015337423 0.1411042945
## 103 937 4 0.0030674847 0.1441717791
## 104 938 2 0.0015337423 0.1457055215
## 105 939 1 0.0007668712 0.1464723926
## 106 940 4 0.0030674847 0.1495398773
## 107 941 5 0.0038343558 0.1533742331
## 108 942 3 0.0023006135 0.1556748466
## 109 943 3 0.0023006135 0.1579754601
## 110 944 2 0.0015337423 0.1595092025
## 111 945 3 0.0023006135 0.1618098160
## 112 946 6 0.0046012270 0.1664110429
## 113 947 3 0.0023006135 0.1687116564
## 114 948 6 0.0046012270 0.1733128834
## 115 949 3 0.0023006135 0.1756134969
## 116 950 8 0.0061349693 0.1817484663
## 117 951 5 0.0038343558 0.1855828221
## 118 952 1 0.0007668712 0.1863496933
## 119 953 2 0.0015337423 0.1878834356
## 120 954 3 0.0023006135 0.1901840491
## 121 955 1 0.0007668712 0.1909509202
## 122 957 1 0.0007668712 0.1917177914
## 123 958 5 0.0038343558 0.1955521472
## 124 959 3 0.0023006135 0.1978527607
## 125 960 3 0.0023006135 0.2001533742
## 126 961 3 0.0023006135 0.2024539877
## 127 962 6 0.0046012270 0.2070552147
## 128 963 4 0.0030674847 0.2101226994
## 129 964 3 0.0023006135 0.2124233129
## 130 965 7 0.0053680982 0.2177914110
## 131 966 4 0.0030674847 0.2208588957
## 132 967 5 0.0038343558 0.2246932515
## 133 968 5 0.0038343558 0.2285276074
## 134 969 4 0.0030674847 0.2315950920
## 135 970 20 0.0153374233 0.2469325153
## 136 971 2 0.0015337423 0.2484662577
## 137 972 1 0.0007668712 0.2492331288
## 138 973 2 0.0015337423 0.2507668712
## 139 974 3 0.0023006135 0.2530674847
## 140 975 3 0.0023006135 0.2553680982
## 141 976 5 0.0038343558 0.2592024540
## 142 977 1 0.0007668712 0.2599693252
## 143 978 3 0.0023006135 0.2622699387
## 144 979 2 0.0015337423 0.2638036810
## 145 980 3 0.0023006135 0.2661042945
## 146 981 4 0.0030674847 0.2691717791
## 147 982 3 0.0023006135 0.2714723926
## 148 983 3 0.0023006135 0.2737730061
## 149 984 5 0.0038343558 0.2776073620
## 150 985 5 0.0038343558 0.2814417178
## 151 986 3 0.0023006135 0.2837423313
## 152 987 2 0.0015337423 0.2852760736
## 153 988 3 0.0023006135 0.2875766871
## 154 989 3 0.0023006135 0.2898773006
## 155 990 19 0.0145705521 0.3044478528
## 156 991 2 0.0015337423 0.3059815951
## 157 992 2 0.0015337423 0.3075153374
## 158 993 3 0.0023006135 0.3098159509
## 159 994 2 0.0015337423 0.3113496933
## 160 995 6 0.0046012270 0.3159509202
## 161 996 6 0.0046012270 0.3205521472
## 162 997 4 0.0030674847 0.3236196319
## 163 998 3 0.0023006135 0.3259202454
## 164 999 5 0.0038343558 0.3297546012
## 165 1000 8 0.0061349693 0.3358895706
## 166 1001 6 0.0046012270 0.3404907975
## 167 1002 3 0.0023006135 0.3427914110
## 168 1003 3 0.0023006135 0.3450920245
## 169 1004 7 0.0053680982 0.3504601227
## 170 1005 10 0.0076687117 0.3581288344
## 171 1006 4 0.0030674847 0.3611963190
## 172 1007 3 0.0023006135 0.3634969325
## 173 1008 5 0.0038343558 0.3673312883
## 174 1009 8 0.0061349693 0.3734662577
## 175 1010 26 0.0199386503 0.3934049080
## 176 1011 3 0.0023006135 0.3957055215
## 177 1012 2 0.0015337423 0.3972392638
## 178 1013 2 0.0015337423 0.3987730061
## 179 1014 6 0.0046012270 0.4033742331
## 180 1015 2 0.0015337423 0.4049079755
## 181 1016 6 0.0046012270 0.4095092025
## 182 1017 1 0.0007668712 0.4102760736
## 183 1018 4 0.0030674847 0.4133435583
## 184 1019 1 0.0007668712 0.4141104294
## 185 1020 5 0.0038343558 0.4179447853
## 186 1021 5 0.0038343558 0.4217791411
## 187 1022 2 0.0015337423 0.4233128834
## 188 1023 1 0.0007668712 0.4240797546
## 189 1024 4 0.0030674847 0.4271472393
## 190 1025 8 0.0061349693 0.4332822086
## 191 1026 3 0.0023006135 0.4355828221
## 192 1027 2 0.0015337423 0.4371165644
## 193 1028 3 0.0023006135 0.4394171779
## 194 1029 10 0.0076687117 0.4470858896
## 195 1030 20 0.0153374233 0.4624233129
## 196 1031 10 0.0076687117 0.4700920245
## 197 1032 4 0.0030674847 0.4731595092
## 198 1033 6 0.0046012270 0.4777607362
## 199 1034 3 0.0023006135 0.4800613497
## 200 1035 8 0.0061349693 0.4861963190
## 201 1036 2 0.0015337423 0.4877300613
## 202 1037 6 0.0046012270 0.4923312883
## 203 1038 7 0.0053680982 0.4976993865
## 204 1039 3 0.0023006135 0.5000000000
## 205 1040 4 0.0030674847 0.5030674847
## 206 1041 2 0.0015337423 0.5046012270
## 207 1042 3 0.0023006135 0.5069018405
## 208 1043 4 0.0030674847 0.5099693252
## 209 1044 6 0.0046012270 0.5145705521
## 210 1045 3 0.0023006135 0.5168711656
## 211 1046 3 0.0023006135 0.5191717791
## 212 1047 4 0.0030674847 0.5222392638
## 213 1048 9 0.0069018405 0.5291411043
## 214 1049 6 0.0046012270 0.5337423313
## 215 1050 26 0.0199386503 0.5536809816
## 216 1051 5 0.0038343558 0.5575153374
## 217 1052 3 0.0023006135 0.5598159509
## 218 1053 5 0.0038343558 0.5636503067
## 219 1054 6 0.0046012270 0.5682515337
## 220 1055 8 0.0061349693 0.5743865031
## 221 1056 6 0.0046012270 0.5789877301
## 222 1057 3 0.0023006135 0.5812883436
## 223 1058 2 0.0015337423 0.5828220859
## 224 1059 4 0.0030674847 0.5858895706
## 225 1060 2 0.0015337423 0.5874233129
## 226 1061 2 0.0015337423 0.5889570552
## 227 1062 2 0.0015337423 0.5904907975
## 228 1063 3 0.0023006135 0.5927914110
## 229 1064 6 0.0046012270 0.5973926380
## 230 1065 5 0.0038343558 0.6012269939
## 231 1066 4 0.0030674847 0.6042944785
## 232 1067 3 0.0023006135 0.6065950920
## 233 1068 1 0.0007668712 0.6073619632
## 234 1069 2 0.0015337423 0.6088957055
## 235 1070 10 0.0076687117 0.6165644172
## 236 1071 2 0.0015337423 0.6180981595
## 237 1072 2 0.0015337423 0.6196319018
## 238 1073 4 0.0030674847 0.6226993865
## 239 1074 6 0.0046012270 0.6273006135
## 240 1075 4 0.0030674847 0.6303680982
## 241 1076 4 0.0030674847 0.6334355828
## 242 1077 3 0.0023006135 0.6357361963
## 243 1078 3 0.0023006135 0.6380368098
## 244 1079 4 0.0030674847 0.6411042945
## 245 1080 3 0.0023006135 0.6434049080
## 246 1081 6 0.0046012270 0.6480061350
## 247 1082 4 0.0030674847 0.6510736196
## 248 1083 3 0.0023006135 0.6533742331
## 249 1085 5 0.0038343558 0.6572085890
## 250 1086 6 0.0046012270 0.6618098160
## 251 1087 1 0.0007668712 0.6625766871
## 252 1088 4 0.0030674847 0.6656441718
## 253 1089 6 0.0046012270 0.6702453988
## 254 1090 9 0.0069018405 0.6771472393
## 255 1091 2 0.0015337423 0.6786809816
## 256 1092 2 0.0015337423 0.6802147239
## 257 1093 1 0.0007668712 0.6809815951
## 258 1094 3 0.0023006135 0.6832822086
## 259 1095 2 0.0015337423 0.6848159509
## 260 1096 2 0.0015337423 0.6863496933
## 261 1097 2 0.0015337423 0.6878834356
## 262 1098 3 0.0023006135 0.6901840491
## 263 1099 2 0.0015337423 0.6917177914
## 264 1100 3 0.0023006135 0.6940184049
## 265 1101 4 0.0030674847 0.6970858896
## 266 1102 2 0.0015337423 0.6986196319
## 267 1103 3 0.0023006135 0.7009202454
## 268 1104 4 0.0030674847 0.7039877301
## 269 1105 13 0.0099693252 0.7139570552
## 270 1106 4 0.0030674847 0.7170245399
## 271 1107 2 0.0015337423 0.7185582822
## 272 1108 2 0.0015337423 0.7200920245
## 273 1109 6 0.0046012270 0.7246932515
## 274 1110 11 0.0084355828 0.7331288344
## 275 1111 1 0.0007668712 0.7338957055
## 276 1112 3 0.0023006135 0.7361963190
## 277 1113 1 0.0007668712 0.7369631902
## 278 1114 2 0.0015337423 0.7384969325
## 279 1115 2 0.0015337423 0.7400306748
## 280 1116 6 0.0046012270 0.7446319018
## 281 1117 2 0.0015337423 0.7461656442
## 282 1118 2 0.0015337423 0.7476993865
## 283 1120 3 0.0023006135 0.7500000000
## 284 1121 1 0.0007668712 0.7507668712
## 285 1122 3 0.0023006135 0.7530674847
## 286 1123 1 0.0007668712 0.7538343558
## 287 1124 1 0.0007668712 0.7546012270
## 288 1125 12 0.0092024540 0.7638036810
## 289 1126 1 0.0007668712 0.7645705521
## 290 1127 4 0.0030674847 0.7676380368
## 291 1128 4 0.0030674847 0.7707055215
## 292 1129 1 0.0007668712 0.7714723926
## 293 1130 2 0.0015337423 0.7730061350
## 294 1131 2 0.0015337423 0.7745398773
## 295 1132 1 0.0007668712 0.7753067485
## 296 1133 4 0.0030674847 0.7783742331
## 297 1134 2 0.0015337423 0.7799079755
## 298 1135 3 0.0023006135 0.7822085890
## 299 1137 3 0.0023006135 0.7845092025
## 300 1138 1 0.0007668712 0.7852760736
## 301 1139 2 0.0015337423 0.7868098160
## 302 1140 1 0.0007668712 0.7875766871
## 303 1141 2 0.0015337423 0.7891104294
## 304 1142 1 0.0007668712 0.7898773006
## 305 1143 1 0.0007668712 0.7906441718
## 306 1144 4 0.0030674847 0.7937116564
## 307 1145 6 0.0046012270 0.7983128834
## 308 1146 3 0.0023006135 0.8006134969
## 309 1147 3 0.0023006135 0.8029141104
## 310 1148 2 0.0015337423 0.8044478528
## 311 1149 1 0.0007668712 0.8052147239
## 312 1150 1 0.0007668712 0.8059815951
## 313 1151 1 0.0007668712 0.8067484663
## 314 1152 5 0.0038343558 0.8105828221
## 315 1153 2 0.0015337423 0.8121165644
## 316 1155 3 0.0023006135 0.8144171779
## 317 1156 1 0.0007668712 0.8151840491
## 318 1157 1 0.0007668712 0.8159509202
## 319 1158 1 0.0007668712 0.8167177914
## 320 1159 2 0.0015337423 0.8182515337
## 321 1160 1 0.0007668712 0.8190184049
## 322 1161 2 0.0015337423 0.8205521472
## 323 1162 5 0.0038343558 0.8243865031
## 324 1164 2 0.0015337423 0.8259202454
## 325 1165 6 0.0046012270 0.8305214724
## 326 1166 2 0.0015337423 0.8320552147
## 327 1168 2 0.0015337423 0.8335889571
## 328 1169 1 0.0007668712 0.8343558282
## 329 1170 3 0.0023006135 0.8366564417
## 330 1171 1 0.0007668712 0.8374233129
## 331 1175 3 0.0023006135 0.8397239264
## 332 1176 2 0.0015337423 0.8412576687
## 333 1177 2 0.0015337423 0.8427914110
## 334 1178 2 0.0015337423 0.8443251534
## 335 1179 1 0.0007668712 0.8450920245
## 336 1180 1 0.0007668712 0.8458588957
## 337 1181 2 0.0015337423 0.8473926380
## 338 1182 1 0.0007668712 0.8481595092
## 339 1183 3 0.0023006135 0.8504601227
## 340 1184 2 0.0015337423 0.8519938650
## 341 1185 5 0.0038343558 0.8558282209
## 342 1186 1 0.0007668712 0.8565950920
## 343 1187 1 0.0007668712 0.8573619632
## 344 1188 3 0.0023006135 0.8596625767
## 345 1189 1 0.0007668712 0.8604294479
## 346 1191 2 0.0015337423 0.8619631902
## 347 1193 2 0.0015337423 0.8634969325
## 348 1194 3 0.0023006135 0.8657975460
## 349 1195 2 0.0015337423 0.8673312883
## 350 1196 1 0.0007668712 0.8680981595
## 351 1198 2 0.0015337423 0.8696319018
## 352 1200 2 0.0015337423 0.8711656442
## 353 1205 2 0.0015337423 0.8726993865
## 354 1206 1 0.0007668712 0.8734662577
## 355 1207 3 0.0023006135 0.8757668712
## 356 1209 1 0.0007668712 0.8765337423
## 357 1210 1 0.0007668712 0.8773006135
## 358 1211 2 0.0015337423 0.8788343558
## 359 1212 1 0.0007668712 0.8796012270
## 360 1213 4 0.0030674847 0.8826687117
## 361 1215 3 0.0023006135 0.8849693252
## 362 1217 2 0.0015337423 0.8865030675
## 363 1218 1 0.0007668712 0.8872699387
## 364 1219 1 0.0007668712 0.8880368098
## 365 1220 1 0.0007668712 0.8888036810
## 366 1221 2 0.0015337423 0.8903374233
## 367 1223 2 0.0015337423 0.8918711656
## 368 1225 1 0.0007668712 0.8926380368
## 369 1226 2 0.0015337423 0.8941717791
## 370 1227 1 0.0007668712 0.8949386503
## 371 1228 1 0.0007668712 0.8957055215
## 372 1229 1 0.0007668712 0.8964723926
## 373 1230 2 0.0015337423 0.8980061350
## 374 1231 1 0.0007668712 0.8987730061
## 375 1234 2 0.0015337423 0.9003067485
## 376 1235 1 0.0007668712 0.9010736196
## 377 1239 3 0.0023006135 0.9033742331
## 378 1240 3 0.0023006135 0.9056748466
## 379 1241 1 0.0007668712 0.9064417178
## 380 1243 2 0.0015337423 0.9079754601
## 381 1244 3 0.0023006135 0.9102760736
## 382 1246 1 0.0007668712 0.9110429448
## 383 1247 1 0.0007668712 0.9118098160
## 384 1248 1 0.0007668712 0.9125766871
## 385 1249 1 0.0007668712 0.9133435583
## 386 1252 1 0.0007668712 0.9141104294
## 387 1253 2 0.0015337423 0.9156441718
## 388 1254 3 0.0023006135 0.9179447853
## 389 1259 1 0.0007668712 0.9187116564
## 390 1261 2 0.0015337423 0.9202453988
## 391 1263 1 0.0007668712 0.9210122699
## 392 1266 1 0.0007668712 0.9217791411
## 393 1271 1 0.0007668712 0.9225460123
## 394 1272 1 0.0007668712 0.9233128834
## 395 1273 1 0.0007668712 0.9240797546
## 396 1274 1 0.0007668712 0.9248466258
## 397 1275 1 0.0007668712 0.9256134969
## 398 1280 1 0.0007668712 0.9263803681
## 399 1281 1 0.0007668712 0.9271472393
## 400 1283 1 0.0007668712 0.9279141104
## 401 1286 1 0.0007668712 0.9286809816
## 402 1287 1 0.0007668712 0.9294478528
## 403 1290 1 0.0007668712 0.9302147239
## 404 1292 2 0.0015337423 0.9317484663
## 405 1296 3 0.0023006135 0.9340490798
## 406 1297 1 0.0007668712 0.9348159509
## 407 1300 3 0.0023006135 0.9371165644
## 408 1308 1 0.0007668712 0.9378834356
## 409 1309 1 0.0007668712 0.9386503067
## 410 1311 1 0.0007668712 0.9394171779
## 411 1313 3 0.0023006135 0.9417177914
## 412 1315 1 0.0007668712 0.9424846626
## 413 1316 1 0.0007668712 0.9432515337
## 414 1317 1 0.0007668712 0.9440184049
## 415 1323 2 0.0015337423 0.9455521472
## 416 1326 2 0.0015337423 0.9470858896
## 417 1328 1 0.0007668712 0.9478527607
## 418 1330 3 0.0023006135 0.9501533742
## 419 1332 1 0.0007668712 0.9509202454
## 420 1333 2 0.0015337423 0.9524539877
## 421 1337 2 0.0015337423 0.9539877301
## 422 1342 1 0.0007668712 0.9547546012
## 423 1343 1 0.0007668712 0.9555214724
## 424 1344 1 0.0007668712 0.9562883436
## 425 1346 1 0.0007668712 0.9570552147
## 426 1349 1 0.0007668712 0.9578220859
## 427 1354 1 0.0007668712 0.9585889571
## 428 1357 1 0.0007668712 0.9593558282
## 429 1360 1 0.0007668712 0.9601226994
## 430 1366 1 0.0007668712 0.9608895706
## 431 1369 2 0.0015337423 0.9624233129
## 432 1372 1 0.0007668712 0.9631901840
## 433 1373 1 0.0007668712 0.9639570552
## 434 1375 1 0.0007668712 0.9647239264
## 435 1379 1 0.0007668712 0.9654907975
## 436 1380 3 0.0023006135 0.9677914110
## 437 1382 1 0.0007668712 0.9685582822
## 438 1383 1 0.0007668712 0.9693251534
## 439 1388 1 0.0007668712 0.9700920245
## 440 1390 1 0.0007668712 0.9708588957
## 441 1393 1 0.0007668712 0.9716257669
## 442 1395 1 0.0007668712 0.9723926380
## 443 1398 1 0.0007668712 0.9731595092
## 444 1400 1 0.0007668712 0.9739263804
## 445 1403 1 0.0007668712 0.9746932515
## 446 1408 1 0.0007668712 0.9754601227
## 447 1414 1 0.0007668712 0.9762269939
## 448 1419 1 0.0007668712 0.9769938650
## 449 1420 2 0.0015337423 0.9785276074
## 450 1422 2 0.0015337423 0.9800613497
## 451 1423 1 0.0007668712 0.9808282209
## 452 1433 1 0.0007668712 0.9815950920
## 453 1435 1 0.0007668712 0.9823619632
## 454 1436 1 0.0007668712 0.9831288344
## 455 1439 2 0.0015337423 0.9846625767
## 456 1444 1 0.0007668712 0.9854294479
## 457 1450 1 0.0007668712 0.9861963190
## 458 1452 2 0.0015337423 0.9877300613
## 459 1454 2 0.0015337423 0.9892638037
## 460 1460 1 0.0007668712 0.9900306748
## 461 1461 1 0.0007668712 0.9907975460
## 462 1465 1 0.0007668712 0.9915644172
## 463 1470 1 0.0007668712 0.9923312883
## 464 1475 1 0.0007668712 0.9930981595
## 465 1478 1 0.0007668712 0.9938650307
## 466 1481 1 0.0007668712 0.9946319018
## 467 1491 1 0.0007668712 0.9953987730
## 468 1493 1 0.0007668712 0.9961656442
## 469 1500 2 0.0015337423 0.9976993865
## 470 1501 1 0.0007668712 0.9984662577
## 471 1505 1 0.0007668712 0.9992331288
## 472 1545 1 0.0007668712 1.0000000000
#Code
#To select a specific row of this table (e.g., a value of 1400), we can use the following
myTable[myTable$X == '1400', c('X', 'Freq', 'Prop', 'CumProp')]
## X Freq Prop CumProp
## 444 1400 1 0.0007668712 0.9739264
#To select a specific row of this table (e.g., a value of 1400), we can use the following
myTable[myTable$X == '800', c('X', 'Freq', 'Prop', 'CumProp')]
## X Freq Prop CumProp
## 14 800 2 0.001533742 0.01303681
- Imagine that the distribution of average SAT scores was perfectly normal. Answer both parts of Question 15 again using the observed mean and standard deviation as your parameters.
Assuming the distribution is perfectly normal…
The Probability that the average SAT score for a school is greater than 1400 is 0.52871%
The Probability that the average SAT score for a school is less than 800 is 2.603005%
# Code
# SUMMARY
psych::describe(alotofdata$SAT_AVG)
## vars n mean sd median trimmed mad min max range skew
## X1 1 1304 1059.07 133.36 1039.5 1048.34 104.52 720 1545 825 0.82
## kurtosis se
## X1 1.11 3.69
#Percentile (for a given score) based on the case in which data are normally distributed
a <- 1400
s <- 133.36
xbar <- 1059.07
z <- (a-xbar)/s
z
## [1] 2.556464
pnorm(z)
## [1] 0.9947129
#Percentile (for a given score) based on the case in which data are normally distributed
a <- 800
s <- 133.36
xbar <- 1059.07
z <- (a-xbar)/s
z
## [1] -1.942636
pnorm(z)
## [1] 0.02603005