Friday, January 27, by the start of class time (2:00pm).
For this homework, I am requiring you to use R markdown, and place your answers directly in this document as code chunks. You can upload your .Rmd file as your solution. This is probably the easiest approach.
If you want, you can upload a ‘knit’ document as either HTML or PDF. However, some students have reported that LMS does not allow HTML files as uploads. And producing PDF files directly from RStudio also requires that you separately install the LaTeX software on your computer. I can attempt to help you with the installation process if you would like to go that route, but no guarantees.
Download the file NumberGameDataCombined2023.csv. This
file contains the complete dataset of student responses to the number
guessing game (the topic of HW 01).
Read the file into a tibble in R, using read_csv() (the
tidyverse replacement for the built-in function
read.csv).
How many rows, and how many columns, does this tibble have? Use R code to answer the question, rather than using relying on your eyeballs.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
filename = "NumberGameDataCombined2023.csv"
data <- read_csv(filename)
## Rows: 70 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Name
## dbl (3): Year, X, Y
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
row = nrow(data)
cols = ncol(data)
cat("There are",row, "and", cols, "columns")
## There are 70 and 4 columns
Generate a graph of the data in this dataframe, using
ggplot. Your graph should be a ‘scatterplot’, ie, showing
each \(x,y\) pair as a separate point
in the plot. In addition your graph should have the following
features:
✓* The x and y axes should be labeled ‘X value’ and ‘Y
value’, respectively ✓* The x and y axes should be
logarithmic (base 10) * The color of each plot marker
should be based on the value in the year column of the tibble, so that
for example, all the datapoints from 2023 are shown using the same
color
X_value = data$X
Y_value = data$Y
Year = data$Year
fig <- ggplot(data) +
geom_point(aes(x = X_value, y = Y_value, color = Year)) +
scale_x_log10() + scale_y_log10()
print(fig)
Compute the mean, and standard deviation, for the x and y values in the dataset (ie, report the mean of \(x\), SD of \(x\), mean of \(y\), and SD of \(y\))
x_mean = mean(data$X)
y_mean = mean(data$Y)
stan_dev_x = sd(data$X)
stan_dev_y = sd(data$Y)
cat("The mean and standard deviation for all X values is:", x_mean, "and", stan_dev_x, "respectively",
"\nThe mean and standard deviation for all Y values is:", y_mean, "and", stan_dev_y, "respectively")
## The mean and standard deviation for all X values is: 331.4357 and 2542.935 respectively
## The mean and standard deviation for all Y values is: 186.6943 and 870.3445 respectively
Let’s crown a grand champion for the number guessing game. Out of the entire dataset, who wins? (ie, who is closest to the average of the entire dataset?)
To make things slightly more interesting, let’s use a different definition of distance. Previously we used Euclidean distance:
\[ \textrm{distance}(x_1, y_1, x_2, y_2) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} \] This time, I want you to use the so-called ‘Manhattan distance’:
\[ \textrm{distance}(x_1, y_1, x_2, y_2) = | x_2 - x_1 | + | y_2 - y_1 | \] where \(|\cdot|\) refers to the absolute value.
(It’s called Manhattan distance, because it measures the number of city blocks you would need to traverse to get from one point to another on a grid.)
dist <- c() #empty container for storing manhattan distances
manhattan_dist <- function(x1, x2, y1, y2) { #function for finding manhattan distance
abs((x2 - x1)) + abs((y2 - y1))
}
for (i in 1:length(data$Name)) { #loop through list of students, append values to dist
student_x <- data$X[i]
student_y <- data$Y[i]
dist[i] <- manhattan_dist(student_x, x_mean, student_y, y_mean)
}
name_and_guess <- data.frame(Name = c(data$Name), Dist = c(dist)) #create a data frame with the names and their respective manhattan distances
winner_id <- which.min(dist)
winner <- name_and_guess$Name[winner_id]
#I tried doing "min(name_and_guess$Name)", but that didn't work again, so I used your solution instead. Any reason why that might be the case?
cat("\n\nThe winner is:", winner, "\n\n")
##
##
## The winner is: Gwyn
Generate a vector of 1000 random values, drawn from a Gaussian distribution (aka normal distribution) with a mean and standard deviation equal to the mean and standard deviation of the \(x\) values in the student dataset (determined in Problem 3)
set.seed(42)
random_vector <- stan_dev_x * runif(1000) + x_mean
#multiply the random vector by the SD, then add the mean to it
print(random_vector)
## [1] 2657.7284 2714.3580 1059.0701 2443.2104 1963.3532 1651.4632 2204.5323
## [8] 673.8842 2002.1247 2124.3700 1495.4435 2160.0918 2708.2469 980.9747
## [15] 1507.0165 2721.8320 2819.0024 630.1985 1539.3226 1756.3257 2630.3292
## [22] 684.1667 2846.1236 2738.7519 541.0691 1639.0431 1323.6979 2634.6693
## [29] 1468.0506 2457.3406 2207.0938 2393.8966 1318.3700 2073.7781 341.4761
## [36] 2449.4876 350.0860 859.4991 2636.8646 1887.1493 1296.6304 1439.5747
## [43] 426.6204 2807.0849 1429.3513 2766.4912 2588.9391 1958.8604 2800.5411
## [50] 1905.1013 1179.3196 1213.1941 1344.7584 2326.8588 430.4487 2235.5741
## [57] 2053.7070 766.9499 995.3656 1639.5546 2049.4614 2830.6764 2262.9078
## [64] 1771.9792 2492.1418 813.2557 1021.3001 2437.3893 2094.2108 943.1255
## [71] 440.7534 688.6650 881.6899 1550.5153 833.4375 2160.7112 351.4861
## [78] 1286.2825 1639.5413 335.4295 1810.4172 732.9785 1244.4215 1973.2359
## [85] 2304.3045 1764.7533 925.7284 560.2504 549.1417 1107.5863 2028.6583
## [92] 332.0432 861.8157 2704.0813 2685.2906 2198.1901 1178.4163 1641.2085
## [99] 2223.3152 1905.9177 1923.9372 883.6537 882.1524 1320.4978 2728.0397
## [106] 2779.2858 2212.8399 2196.0327 1693.8421 337.2157 1879.9244 2459.3681
## [113] 2242.5091 1482.7029 1693.9151 1697.9500 334.9471 1235.8713 1888.0507
## [120] 2439.3821 1238.5567 1375.6543 1789.7479 1830.9496 2161.4778 1335.8267
## [127] 2668.9120 2779.1898 925.2710 2173.7864 2629.3200 1866.0314 1937.3180
## [134] 2715.1474 2494.1585 1805.8828 2420.2129 620.6148 2275.5296 1917.2445
## [141] 708.9259 535.5431 1511.5346 2313.3187 2196.7500 2409.6000 764.1479
## [148] 2733.7985 1078.1022 710.5163 2160.7690 1155.5654 2311.8980 1334.4737
## [155] 2057.0536 2304.3087 809.1746 405.3991 676.5471 2061.0493 2708.6302
## [162] 1731.3066 1861.6884 832.3800 1692.5079 788.0344 1480.5539 1137.6819
## [169] 626.8604 804.6815 2187.0923 1378.7998 1384.3373 1552.8334 1418.5266
## [176] 678.5219 2428.5422 1837.6272 2351.5360 2287.0355 2665.9939 2525.0476
## [183] 1137.4833 990.7186 2218.9714 2231.9268 2665.6064 2348.4698 670.4843
## [190] 1063.1648 826.4846 2325.3753 659.1493 659.7014 515.1707 466.5406
## [197] 1683.9581 617.0283 2221.3141 2191.1238 2582.2329 1646.4158 2497.8412
## [204] 1457.4380 732.9146 1456.2387 2792.3200 1563.7115 973.4212 991.8106
## [211] 1709.7473 1984.0280 1186.9279 486.4270 1479.0901 2464.3356 1792.7014
## [218] 1229.9829 1723.5049 2601.5615 1577.4501 767.8851 1712.3285 2776.3860
## [225] 1129.1134 2417.9513 1112.2552 803.0322 454.3784 956.1692 1224.2780
## [232] 735.8194 1104.7373 376.0600 2865.6049 2376.9560 551.6046 2543.6199
## [239] 1741.7118 1402.9738 503.4318 1759.1362 511.2769 868.9918 1729.0850
## [246] 1557.0834 736.9573 711.8054 1601.0544 2723.2315 1181.3644 810.6121
## [253] 1017.3065 1681.0837 385.9823 2362.6316 612.0108 1704.1079 1784.0466
## [260] 1905.3896 2149.2625 644.9811 1122.4148 2736.3894 1602.9673 675.3181
## [267] 2541.9023 852.8636 2683.7677 2586.3929 678.0272 2328.5287 1484.1570
## [274] 676.6199 2582.4957 1187.6765 1143.3292 1358.2294 1549.6984 1266.9863
## [281] 1515.6568 458.3082 807.8717 2830.2752 1166.2155 766.2685 1573.0364
## [288] 378.9566 1194.7244 406.8678 2536.7422 2192.1209 1133.1236 1314.1633
## [295] 1176.8244 559.7353 2256.5793 1864.7457 701.1785 414.1256 1561.6269
## [302] 1461.9473 484.9924 1164.2624 2565.2241 2697.9039 1328.7202 735.3728
## [309] 1145.0418 1112.0295 605.5928 2821.8196 1595.0291 568.1186 869.9625
## [316] 2697.6391 1086.2970 1992.5276 2622.7998 2850.9655 1425.7453 1332.7850
## [323] 692.3009 1042.9661 1767.7422 2709.4351 1242.8242 2472.6056 2168.4757
## [330] 2240.5089 2681.0787 337.4831 739.3986 1346.7611 2048.7298 1552.9907
## [337] 1688.9279 1137.4195 2403.3117 1074.5311 1371.8322 562.6351 2362.2154
## [344] 1246.3464 434.3930 435.9156 2758.5004 1280.8184 2382.1089 2645.6570
## [351] 1450.5211 1797.0222 518.7682 750.0726 2212.9302 1541.1381 2074.6713
## [358] 2751.0768 1596.4558 1527.8884 1755.9677 1990.7268 1042.3731 2823.2775
## [365] 1968.7406 1812.8951 1897.5560 2684.0078 1323.2388 1063.5969 562.1714
## [372] 1150.3471 2259.6677 596.9511 2137.6265 2789.1253 843.8149 607.3154
## [379] 472.3199 2441.3691 1809.3841 1526.8503 1259.6431 1043.7746 1856.4744
## [386] 2413.0054 580.2169 2782.0360 760.5216 550.3403 2521.4389 1665.9443
## [393] 2001.6634 915.0887 2165.4670 1579.3823 2786.0184 2637.7319 1733.2302
## [400] 523.6793 389.1604 1636.5707 1935.3316 1396.3449 2567.3523 606.0399
## [407] 2824.2212 1005.2288 545.7480 1312.7728 649.0377 1810.2906 942.1207
## [414] 2167.1500 702.5230 720.0948 990.6222 2309.5503 1415.9120 484.1348
## [421] 623.4474 1559.0553 2821.4109 2395.0705 1712.0280 515.4605 1517.6776
## [428] 1195.7938 2085.8542 1638.9047 1640.8517 1718.6435 1469.2908 544.7494
## [435] 2696.7057 373.2624 1384.4460 908.6213 584.8153 1559.4800 1984.6710
## [442] 2674.3179 1253.5087 2505.9888 1096.6206 1515.5852 694.3906 2385.4130
## [449] 2024.5416 488.9476 1427.2516 1340.6126 2103.7518 2008.0430 1367.3023
## [456] 1111.9140 980.1572 2041.7331 2605.8205 2482.0882 1330.5598 532.4561
## [463] 2438.0621 516.7949 623.2698 1958.8744 1146.6149 811.4136 1332.9191
## [470] 2523.5122 1216.1520 335.0820 2648.7733 2751.6070 1579.8110 1510.4721
## [477] 1848.2256 2635.4648 771.3665 2329.8806 923.7728 1798.8321 2469.7317
## [484] 667.6214 2609.6292 1476.1059 2605.1825 963.4701 544.2674 455.1268
## [491] 2823.0442 1562.6430 2481.2156 1390.0435 1575.8022 797.5248 2261.9343
## [498] 1107.3954 752.7448 414.8672 678.5596 781.8822 1652.6444 2394.0635
## [505] 624.7939 2603.3497 1794.5210 704.1599 2627.2126 974.8047 714.3957
## [512] 2285.8015 916.6245 1108.0465 1650.1248 1182.3011 724.1539 1008.7963
## [519] 1223.3821 1802.4180 2387.6579 2704.6819 2451.8964 654.3956 1982.9552
## [526] 2086.9554 412.9236 2672.1802 1548.1512 1009.1803 2509.4872 914.1404
## [533] 2345.3055 1976.1424 1410.4914 573.1882 340.2539 1682.0744 1664.7149
## [540] 873.5528 2154.5477 2776.0704 1649.3542 775.2492 1761.9390 2262.1802
## [547] 2027.5008 903.2730 1210.9093 1144.7472 2632.5339 837.9843 2063.0807
## [554] 681.1345 603.5163 567.5712 2661.9063 1035.9825 2583.9524 2296.7806
## [561] 2353.1997 854.4504 453.9882 430.1421 1055.0894 1218.4366 2206.7319
## [568] 971.4000 1647.2446 2262.6546 1948.9576 850.0437 2856.6858 332.4657
## [575] 856.7300 1943.7282 375.4077 399.4223 1877.1351 1782.2856 956.6249
## [582] 2669.7287 2040.8381 1972.7745 861.0922 1553.6156 2840.0888 2856.8754
## [589] 1477.7250 2149.2500 1136.4315 2451.4494 1435.0076 2864.1797 2380.5616
## [596] 1567.9182 1701.4981 1198.6782 463.7080 1497.9390 1598.2158 1050.4869
## [603] 2305.8854 1104.1137 1642.4491 1546.8337 2286.5773 748.2962 1454.8389
## [610] 1662.4276 1544.2104 1665.5697 2073.9269 1226.0201 2466.5639 2675.8027
## [617] 601.7742 856.6254 1568.9372 1154.8942 1530.9738 1821.8210 900.9261
## [624] 2198.0259 714.1071 2105.2689 1003.0451 1415.9085 2165.7404 1537.9866
## [631] 1192.4405 943.2460 881.3701 946.9739 1767.9518 2143.4504 606.3184
## [638] 1651.4122 2498.5008 2148.6986 1129.8709 625.9512 1793.6339 2445.5609
## [645] 1654.6261 2096.2610 1602.6593 1232.0428 2784.8704 2676.6866 851.7018
## [652] 2759.6922 2553.9431 2104.7386 1634.9286 1554.1211 844.6024 586.3125
## [659] 1533.1987 1125.2884 2295.5973 2622.8087 1183.9021 2426.6728 890.9651
## [666] 2279.6389 1757.2796 1408.2558 2463.8646 1062.0734 1528.4481 2317.0730
## [673] 541.7793 1334.6684 574.6965 670.4935 893.7453 1028.5040 1091.9241
## [680] 1284.4943 1581.1677 2565.6655 2235.4073 2760.4121 1175.4874 2427.2285
## [687] 668.7952 648.5318 2182.9717 1726.0345 1878.8562 644.0812 2560.1526
## [694] 2473.4744 449.0155 2129.3377 1691.3749 2735.6724 2650.6691 1490.9597
## [701] 2278.1049 1615.7315 846.5034 2155.0732 2284.6059 2127.7070 818.7235
## [708] 1088.9757 1158.2235 1785.6801 2575.5547 2561.8530 1312.6338 2207.5427
## [715] 2356.4245 2172.9316 1311.5812 1089.6177 477.5811 2044.5934 834.9164
## [722] 1921.0448 682.8258 1280.1503 1130.1459 2404.3983 1268.0449 1449.4224
## [729] 1447.9319 1861.8462 1358.5450 2768.3773 2800.0360 2657.0215 542.1782
## [736] 1544.0589 1193.8348 2043.9395 1949.9784 2516.5245 2742.8618 1790.9969
## [743] 1692.5904 2726.1039 2600.6373 1374.8183 458.6607 2310.5243 2686.0958
## [750] 1291.3359 954.2540 554.2882 1325.9994 795.6776 671.2349 986.1473
## [757] 1738.3073 707.9663 2780.0202 2733.2964 2854.7544 2127.2836 689.4260
## [764] 354.2779 2077.1496 2063.7296 2664.6067 1370.0245 1405.6718 1697.8060
## [771] 1383.3120 579.7398 1298.4318 834.9384 2175.1255 359.4310 2668.0317
## [778] 2620.6593 2249.2003 2752.1115 1922.2633 1999.4756 1301.2493 498.2666
## [785] 2088.3011 655.6190 1072.2399 1939.1326 1323.2416 510.5236 2527.4459
## [792] 1310.9496 2037.1354 385.3365 1845.0640 1033.8992 1346.3897 702.2349
## [799] 2427.0224 688.4890 2643.0849 2619.8864 820.5675 1685.0155 1659.1652
## [806] 739.1591 1652.6611 902.3336 1047.3025 1612.2807 735.6624 2672.5403
## [813] 1600.4354 1896.2160 2223.2039 1899.9646 2696.7244 535.8330 402.9461
## [820] 1708.5077 944.7983 1971.2159 1291.9960 450.5816 453.8539 2779.8825
## [827] 907.3260 360.4088 887.0052 473.6813 928.9547 1355.4663 367.8110
## [834] 1251.3761 1817.7808 1871.9301 1002.0551 2048.2952 2711.1305 340.5684
## [841] 2332.6526 2383.4146 517.1676 1773.2784 1702.5632 681.5347 1034.1318
## [848] 1226.1510 2398.3470 767.6543 1656.7173 2275.8223 1065.7227 1442.1348
## [855] 2033.8058 425.3305 1757.1232 2702.4261 1436.6527 2712.2148 2819.1962
## [862] 2052.3985 1089.9143 861.0107 1142.3688 2504.0561 486.7071 794.3672
## [869] 1561.8010 1191.6690 2399.7646 2544.8893 380.1616 2428.7171 1351.8356
## [876] 2040.4366 633.2115 1542.9521 2684.5155 1793.2151 2652.5041 2252.1198
## [883] 1936.0278 2351.5962 2679.9548 749.6693 821.9271 579.7347 2439.8074
## [890] 2380.6958 1041.6013 1473.2890 1923.0049 1274.6649 1303.6481 2656.9016
## [897] 1323.9377 510.1882 2130.5794 2093.6819 575.9321 425.2782 1351.0006
## [904] 360.7741 1775.5459 2734.3824 1204.4310 953.1791 1708.8853 630.7504
## [911] 1284.8037 1164.2753 2740.1188 2646.9690 2270.2834 2259.1713 2870.5335
## [918] 1807.3590 873.8036 671.0496 2346.1585 2828.3589 2828.7224 650.4304
## [925] 1903.6949 2513.7494 685.8892 1263.6263 731.0077 1470.2635 1566.9834
## [932] 2747.7407 2632.1664 2722.2139 2298.3791 565.9223 372.7417 1179.5291
## [939] 946.4525 2345.6204 1593.7159 1638.1971 515.5490 1209.1022 2254.7212
## [946] 1310.7686 349.0355 2158.3803 1747.7728 509.6513 1215.9107 1381.6822
## [953] 1123.9548 2207.0235 1318.9600 1906.5917 1917.0609 2371.9185 1994.6883
## [960] 2059.2411 1375.9587 2018.0457 400.3945 2406.2904 522.2647 2622.5323
## [967] 511.5448 817.4501 2307.1167 847.9911 1358.1828 1466.0443 1873.2060
## [974] 1797.3000 482.7576 795.9412 1084.4540 1287.7806 2144.7927 649.6805
## [981] 1422.9486 408.6826 1486.3573 2351.3100 1529.8614 932.5592 2587.1085
## [988] 651.7255 1016.2772 2575.4698 2239.9592 579.8626 2405.3700 2514.2738
## [995] 629.3592 680.8278 1725.6179 481.5443 1591.5195 350.2980
Create a plot (using ggplot) that shows a histogram of the values in this vector.
library(ggplot2)
Values <- random_vector
#ggplot(df, aes(x=weight)) + geom_histogram()
fig2 <- ggplot() + aes(Values)+ geom_histogram(binwidth=100, colour="black")
print(fig2)
# was not sure if I should've randomized the y values, so I left it as is
A normal distribution is not really a good model for the data in this dataset. Briefly discuss why it is not a great model in this case. (Your answer should be ~1 paragraph.)
A normal distribution isnt a good model for this dataset since it does not exactly appear to have the bell curve like one usually expects. This is due to the vector containing completely random values. A probability density function would be better to use in this scenario since we are using random values.
A short piece of code is shown below. Re-write the code so that it does the same thing, but doesn’t use any for-loops.
# This sets the random number generator so that
# you will get the same values every time you run the code.
set.seed(42)
x <- runif(100) # Generate 100 random values between 0 and 1
cat("The total is:", length(x[x<0.2]), "\n") #just uses indexing to find values less than 0.2
## The total is: 19