Due date

Friday, January 27, by the start of class time (2:00pm).

Instructions

For this homework, I am requiring you to use R markdown, and place your answers directly in this document as code chunks. You can upload your .Rmd file as your solution. This is probably the easiest approach.

If you want, you can upload a ‘knit’ document as either HTML or PDF. However, some students have reported that LMS does not allow HTML files as uploads. And producing PDF files directly from RStudio also requires that you separately install the LaTeX software on your computer. I can attempt to help you with the installation process if you would like to go that route, but no guarantees.

Problem 1

Download the file NumberGameDataCombined2023.csv. This file contains the complete dataset of student responses to the number guessing game (the topic of HW 01).

Read the file into a tibble in R, using read_csv() (the tidyverse replacement for the built-in function read.csv).

How many rows, and how many columns, does this tibble have? Use R code to answer the question, rather than using relying on your eyeballs.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
filename = "NumberGameDataCombined2023.csv"

data <- read_csv(filename)
## Rows: 70 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Name
## dbl (3): Year, X, Y
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
row = nrow(data)
cols = ncol(data)

cat("There are",row, "and", cols, "columns")
## There are 70 and 4 columns

Problem 2

Generate a graph of the data in this dataframe, using ggplot. Your graph should be a ‘scatterplot’, ie, showing each \(x,y\) pair as a separate point in the plot. In addition your graph should have the following features:

* The x and y axes should be labeled ‘X value’ and ‘Y value’, respectively * The x and y axes should be logarithmic (base 10) * The color of each plot marker should be based on the value in the year column of the tibble, so that for example, all the datapoints from 2023 are shown using the same color

X_value = data$X
Y_value = data$Y
Year = data$Year

fig <- ggplot(data) + 
  geom_point(aes(x = X_value, y = Y_value, color = Year)) +
  scale_x_log10() + scale_y_log10() 

print(fig)

Problem 3

Compute the mean, and standard deviation, for the x and y values in the dataset (ie, report the mean of \(x\), SD of \(x\), mean of \(y\), and SD of \(y\))

x_mean = mean(data$X)
y_mean = mean(data$Y)

stan_dev_x = sd(data$X)
stan_dev_y = sd(data$Y)

cat("The mean and standard deviation for all X values is:", x_mean, "and", stan_dev_x, "respectively",
    "\nThe mean and standard deviation for all Y values is:", y_mean, "and", stan_dev_y, "respectively")
## The mean and standard deviation for all X values is: 331.4357 and 2542.935 respectively 
## The mean and standard deviation for all Y values is: 186.6943 and 870.3445 respectively

Problem 4

Let’s crown a grand champion for the number guessing game. Out of the entire dataset, who wins? (ie, who is closest to the average of the entire dataset?)

To make things slightly more interesting, let’s use a different definition of distance. Previously we used Euclidean distance:

\[ \textrm{distance}(x_1, y_1, x_2, y_2) = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2} \] This time, I want you to use the so-called ‘Manhattan distance’:

\[ \textrm{distance}(x_1, y_1, x_2, y_2) = | x_2 - x_1 | + | y_2 - y_1 | \] where \(|\cdot|\) refers to the absolute value.

(It’s called Manhattan distance, because it measures the number of city blocks you would need to traverse to get from one point to another on a grid.)

dist <- c() #empty container for storing manhattan distances

manhattan_dist <- function(x1, x2, y1, y2) { #function for finding manhattan distance
  abs((x2 - x1)) + abs((y2 - y1))
}

for (i in 1:length(data$Name)) { #loop through list of students, append values to dist
  student_x <- data$X[i]
  student_y <- data$Y[i]
  dist[i] <- manhattan_dist(student_x, x_mean, student_y, y_mean)
}

name_and_guess <- data.frame(Name = c(data$Name), Dist = c(dist)) #create a data frame with the names and their respective manhattan distances

winner_id <- which.min(dist)
winner <- name_and_guess$Name[winner_id]

#I tried doing "min(name_and_guess$Name)", but that didn't work again, so I used your solution instead. Any reason why that might be the case?

cat("\n\nThe winner is:", winner, "\n\n")
## 
## 
## The winner is: Gwyn

Problem 5

Generate a vector of 1000 random values, drawn from a Gaussian distribution (aka normal distribution) with a mean and standard deviation equal to the mean and standard deviation of the \(x\) values in the student dataset (determined in Problem 3)

set.seed(42) 

random_vector <- stan_dev_x * runif(1000) + x_mean 
#multiply the random vector by the SD, then add the mean to it
print(random_vector)
##    [1] 2657.7284 2714.3580 1059.0701 2443.2104 1963.3532 1651.4632 2204.5323
##    [8]  673.8842 2002.1247 2124.3700 1495.4435 2160.0918 2708.2469  980.9747
##   [15] 1507.0165 2721.8320 2819.0024  630.1985 1539.3226 1756.3257 2630.3292
##   [22]  684.1667 2846.1236 2738.7519  541.0691 1639.0431 1323.6979 2634.6693
##   [29] 1468.0506 2457.3406 2207.0938 2393.8966 1318.3700 2073.7781  341.4761
##   [36] 2449.4876  350.0860  859.4991 2636.8646 1887.1493 1296.6304 1439.5747
##   [43]  426.6204 2807.0849 1429.3513 2766.4912 2588.9391 1958.8604 2800.5411
##   [50] 1905.1013 1179.3196 1213.1941 1344.7584 2326.8588  430.4487 2235.5741
##   [57] 2053.7070  766.9499  995.3656 1639.5546 2049.4614 2830.6764 2262.9078
##   [64] 1771.9792 2492.1418  813.2557 1021.3001 2437.3893 2094.2108  943.1255
##   [71]  440.7534  688.6650  881.6899 1550.5153  833.4375 2160.7112  351.4861
##   [78] 1286.2825 1639.5413  335.4295 1810.4172  732.9785 1244.4215 1973.2359
##   [85] 2304.3045 1764.7533  925.7284  560.2504  549.1417 1107.5863 2028.6583
##   [92]  332.0432  861.8157 2704.0813 2685.2906 2198.1901 1178.4163 1641.2085
##   [99] 2223.3152 1905.9177 1923.9372  883.6537  882.1524 1320.4978 2728.0397
##  [106] 2779.2858 2212.8399 2196.0327 1693.8421  337.2157 1879.9244 2459.3681
##  [113] 2242.5091 1482.7029 1693.9151 1697.9500  334.9471 1235.8713 1888.0507
##  [120] 2439.3821 1238.5567 1375.6543 1789.7479 1830.9496 2161.4778 1335.8267
##  [127] 2668.9120 2779.1898  925.2710 2173.7864 2629.3200 1866.0314 1937.3180
##  [134] 2715.1474 2494.1585 1805.8828 2420.2129  620.6148 2275.5296 1917.2445
##  [141]  708.9259  535.5431 1511.5346 2313.3187 2196.7500 2409.6000  764.1479
##  [148] 2733.7985 1078.1022  710.5163 2160.7690 1155.5654 2311.8980 1334.4737
##  [155] 2057.0536 2304.3087  809.1746  405.3991  676.5471 2061.0493 2708.6302
##  [162] 1731.3066 1861.6884  832.3800 1692.5079  788.0344 1480.5539 1137.6819
##  [169]  626.8604  804.6815 2187.0923 1378.7998 1384.3373 1552.8334 1418.5266
##  [176]  678.5219 2428.5422 1837.6272 2351.5360 2287.0355 2665.9939 2525.0476
##  [183] 1137.4833  990.7186 2218.9714 2231.9268 2665.6064 2348.4698  670.4843
##  [190] 1063.1648  826.4846 2325.3753  659.1493  659.7014  515.1707  466.5406
##  [197] 1683.9581  617.0283 2221.3141 2191.1238 2582.2329 1646.4158 2497.8412
##  [204] 1457.4380  732.9146 1456.2387 2792.3200 1563.7115  973.4212  991.8106
##  [211] 1709.7473 1984.0280 1186.9279  486.4270 1479.0901 2464.3356 1792.7014
##  [218] 1229.9829 1723.5049 2601.5615 1577.4501  767.8851 1712.3285 2776.3860
##  [225] 1129.1134 2417.9513 1112.2552  803.0322  454.3784  956.1692 1224.2780
##  [232]  735.8194 1104.7373  376.0600 2865.6049 2376.9560  551.6046 2543.6199
##  [239] 1741.7118 1402.9738  503.4318 1759.1362  511.2769  868.9918 1729.0850
##  [246] 1557.0834  736.9573  711.8054 1601.0544 2723.2315 1181.3644  810.6121
##  [253] 1017.3065 1681.0837  385.9823 2362.6316  612.0108 1704.1079 1784.0466
##  [260] 1905.3896 2149.2625  644.9811 1122.4148 2736.3894 1602.9673  675.3181
##  [267] 2541.9023  852.8636 2683.7677 2586.3929  678.0272 2328.5287 1484.1570
##  [274]  676.6199 2582.4957 1187.6765 1143.3292 1358.2294 1549.6984 1266.9863
##  [281] 1515.6568  458.3082  807.8717 2830.2752 1166.2155  766.2685 1573.0364
##  [288]  378.9566 1194.7244  406.8678 2536.7422 2192.1209 1133.1236 1314.1633
##  [295] 1176.8244  559.7353 2256.5793 1864.7457  701.1785  414.1256 1561.6269
##  [302] 1461.9473  484.9924 1164.2624 2565.2241 2697.9039 1328.7202  735.3728
##  [309] 1145.0418 1112.0295  605.5928 2821.8196 1595.0291  568.1186  869.9625
##  [316] 2697.6391 1086.2970 1992.5276 2622.7998 2850.9655 1425.7453 1332.7850
##  [323]  692.3009 1042.9661 1767.7422 2709.4351 1242.8242 2472.6056 2168.4757
##  [330] 2240.5089 2681.0787  337.4831  739.3986 1346.7611 2048.7298 1552.9907
##  [337] 1688.9279 1137.4195 2403.3117 1074.5311 1371.8322  562.6351 2362.2154
##  [344] 1246.3464  434.3930  435.9156 2758.5004 1280.8184 2382.1089 2645.6570
##  [351] 1450.5211 1797.0222  518.7682  750.0726 2212.9302 1541.1381 2074.6713
##  [358] 2751.0768 1596.4558 1527.8884 1755.9677 1990.7268 1042.3731 2823.2775
##  [365] 1968.7406 1812.8951 1897.5560 2684.0078 1323.2388 1063.5969  562.1714
##  [372] 1150.3471 2259.6677  596.9511 2137.6265 2789.1253  843.8149  607.3154
##  [379]  472.3199 2441.3691 1809.3841 1526.8503 1259.6431 1043.7746 1856.4744
##  [386] 2413.0054  580.2169 2782.0360  760.5216  550.3403 2521.4389 1665.9443
##  [393] 2001.6634  915.0887 2165.4670 1579.3823 2786.0184 2637.7319 1733.2302
##  [400]  523.6793  389.1604 1636.5707 1935.3316 1396.3449 2567.3523  606.0399
##  [407] 2824.2212 1005.2288  545.7480 1312.7728  649.0377 1810.2906  942.1207
##  [414] 2167.1500  702.5230  720.0948  990.6222 2309.5503 1415.9120  484.1348
##  [421]  623.4474 1559.0553 2821.4109 2395.0705 1712.0280  515.4605 1517.6776
##  [428] 1195.7938 2085.8542 1638.9047 1640.8517 1718.6435 1469.2908  544.7494
##  [435] 2696.7057  373.2624 1384.4460  908.6213  584.8153 1559.4800 1984.6710
##  [442] 2674.3179 1253.5087 2505.9888 1096.6206 1515.5852  694.3906 2385.4130
##  [449] 2024.5416  488.9476 1427.2516 1340.6126 2103.7518 2008.0430 1367.3023
##  [456] 1111.9140  980.1572 2041.7331 2605.8205 2482.0882 1330.5598  532.4561
##  [463] 2438.0621  516.7949  623.2698 1958.8744 1146.6149  811.4136 1332.9191
##  [470] 2523.5122 1216.1520  335.0820 2648.7733 2751.6070 1579.8110 1510.4721
##  [477] 1848.2256 2635.4648  771.3665 2329.8806  923.7728 1798.8321 2469.7317
##  [484]  667.6214 2609.6292 1476.1059 2605.1825  963.4701  544.2674  455.1268
##  [491] 2823.0442 1562.6430 2481.2156 1390.0435 1575.8022  797.5248 2261.9343
##  [498] 1107.3954  752.7448  414.8672  678.5596  781.8822 1652.6444 2394.0635
##  [505]  624.7939 2603.3497 1794.5210  704.1599 2627.2126  974.8047  714.3957
##  [512] 2285.8015  916.6245 1108.0465 1650.1248 1182.3011  724.1539 1008.7963
##  [519] 1223.3821 1802.4180 2387.6579 2704.6819 2451.8964  654.3956 1982.9552
##  [526] 2086.9554  412.9236 2672.1802 1548.1512 1009.1803 2509.4872  914.1404
##  [533] 2345.3055 1976.1424 1410.4914  573.1882  340.2539 1682.0744 1664.7149
##  [540]  873.5528 2154.5477 2776.0704 1649.3542  775.2492 1761.9390 2262.1802
##  [547] 2027.5008  903.2730 1210.9093 1144.7472 2632.5339  837.9843 2063.0807
##  [554]  681.1345  603.5163  567.5712 2661.9063 1035.9825 2583.9524 2296.7806
##  [561] 2353.1997  854.4504  453.9882  430.1421 1055.0894 1218.4366 2206.7319
##  [568]  971.4000 1647.2446 2262.6546 1948.9576  850.0437 2856.6858  332.4657
##  [575]  856.7300 1943.7282  375.4077  399.4223 1877.1351 1782.2856  956.6249
##  [582] 2669.7287 2040.8381 1972.7745  861.0922 1553.6156 2840.0888 2856.8754
##  [589] 1477.7250 2149.2500 1136.4315 2451.4494 1435.0076 2864.1797 2380.5616
##  [596] 1567.9182 1701.4981 1198.6782  463.7080 1497.9390 1598.2158 1050.4869
##  [603] 2305.8854 1104.1137 1642.4491 1546.8337 2286.5773  748.2962 1454.8389
##  [610] 1662.4276 1544.2104 1665.5697 2073.9269 1226.0201 2466.5639 2675.8027
##  [617]  601.7742  856.6254 1568.9372 1154.8942 1530.9738 1821.8210  900.9261
##  [624] 2198.0259  714.1071 2105.2689 1003.0451 1415.9085 2165.7404 1537.9866
##  [631] 1192.4405  943.2460  881.3701  946.9739 1767.9518 2143.4504  606.3184
##  [638] 1651.4122 2498.5008 2148.6986 1129.8709  625.9512 1793.6339 2445.5609
##  [645] 1654.6261 2096.2610 1602.6593 1232.0428 2784.8704 2676.6866  851.7018
##  [652] 2759.6922 2553.9431 2104.7386 1634.9286 1554.1211  844.6024  586.3125
##  [659] 1533.1987 1125.2884 2295.5973 2622.8087 1183.9021 2426.6728  890.9651
##  [666] 2279.6389 1757.2796 1408.2558 2463.8646 1062.0734 1528.4481 2317.0730
##  [673]  541.7793 1334.6684  574.6965  670.4935  893.7453 1028.5040 1091.9241
##  [680] 1284.4943 1581.1677 2565.6655 2235.4073 2760.4121 1175.4874 2427.2285
##  [687]  668.7952  648.5318 2182.9717 1726.0345 1878.8562  644.0812 2560.1526
##  [694] 2473.4744  449.0155 2129.3377 1691.3749 2735.6724 2650.6691 1490.9597
##  [701] 2278.1049 1615.7315  846.5034 2155.0732 2284.6059 2127.7070  818.7235
##  [708] 1088.9757 1158.2235 1785.6801 2575.5547 2561.8530 1312.6338 2207.5427
##  [715] 2356.4245 2172.9316 1311.5812 1089.6177  477.5811 2044.5934  834.9164
##  [722] 1921.0448  682.8258 1280.1503 1130.1459 2404.3983 1268.0449 1449.4224
##  [729] 1447.9319 1861.8462 1358.5450 2768.3773 2800.0360 2657.0215  542.1782
##  [736] 1544.0589 1193.8348 2043.9395 1949.9784 2516.5245 2742.8618 1790.9969
##  [743] 1692.5904 2726.1039 2600.6373 1374.8183  458.6607 2310.5243 2686.0958
##  [750] 1291.3359  954.2540  554.2882 1325.9994  795.6776  671.2349  986.1473
##  [757] 1738.3073  707.9663 2780.0202 2733.2964 2854.7544 2127.2836  689.4260
##  [764]  354.2779 2077.1496 2063.7296 2664.6067 1370.0245 1405.6718 1697.8060
##  [771] 1383.3120  579.7398 1298.4318  834.9384 2175.1255  359.4310 2668.0317
##  [778] 2620.6593 2249.2003 2752.1115 1922.2633 1999.4756 1301.2493  498.2666
##  [785] 2088.3011  655.6190 1072.2399 1939.1326 1323.2416  510.5236 2527.4459
##  [792] 1310.9496 2037.1354  385.3365 1845.0640 1033.8992 1346.3897  702.2349
##  [799] 2427.0224  688.4890 2643.0849 2619.8864  820.5675 1685.0155 1659.1652
##  [806]  739.1591 1652.6611  902.3336 1047.3025 1612.2807  735.6624 2672.5403
##  [813] 1600.4354 1896.2160 2223.2039 1899.9646 2696.7244  535.8330  402.9461
##  [820] 1708.5077  944.7983 1971.2159 1291.9960  450.5816  453.8539 2779.8825
##  [827]  907.3260  360.4088  887.0052  473.6813  928.9547 1355.4663  367.8110
##  [834] 1251.3761 1817.7808 1871.9301 1002.0551 2048.2952 2711.1305  340.5684
##  [841] 2332.6526 2383.4146  517.1676 1773.2784 1702.5632  681.5347 1034.1318
##  [848] 1226.1510 2398.3470  767.6543 1656.7173 2275.8223 1065.7227 1442.1348
##  [855] 2033.8058  425.3305 1757.1232 2702.4261 1436.6527 2712.2148 2819.1962
##  [862] 2052.3985 1089.9143  861.0107 1142.3688 2504.0561  486.7071  794.3672
##  [869] 1561.8010 1191.6690 2399.7646 2544.8893  380.1616 2428.7171 1351.8356
##  [876] 2040.4366  633.2115 1542.9521 2684.5155 1793.2151 2652.5041 2252.1198
##  [883] 1936.0278 2351.5962 2679.9548  749.6693  821.9271  579.7347 2439.8074
##  [890] 2380.6958 1041.6013 1473.2890 1923.0049 1274.6649 1303.6481 2656.9016
##  [897] 1323.9377  510.1882 2130.5794 2093.6819  575.9321  425.2782 1351.0006
##  [904]  360.7741 1775.5459 2734.3824 1204.4310  953.1791 1708.8853  630.7504
##  [911] 1284.8037 1164.2753 2740.1188 2646.9690 2270.2834 2259.1713 2870.5335
##  [918] 1807.3590  873.8036  671.0496 2346.1585 2828.3589 2828.7224  650.4304
##  [925] 1903.6949 2513.7494  685.8892 1263.6263  731.0077 1470.2635 1566.9834
##  [932] 2747.7407 2632.1664 2722.2139 2298.3791  565.9223  372.7417 1179.5291
##  [939]  946.4525 2345.6204 1593.7159 1638.1971  515.5490 1209.1022 2254.7212
##  [946] 1310.7686  349.0355 2158.3803 1747.7728  509.6513 1215.9107 1381.6822
##  [953] 1123.9548 2207.0235 1318.9600 1906.5917 1917.0609 2371.9185 1994.6883
##  [960] 2059.2411 1375.9587 2018.0457  400.3945 2406.2904  522.2647 2622.5323
##  [967]  511.5448  817.4501 2307.1167  847.9911 1358.1828 1466.0443 1873.2060
##  [974] 1797.3000  482.7576  795.9412 1084.4540 1287.7806 2144.7927  649.6805
##  [981] 1422.9486  408.6826 1486.3573 2351.3100 1529.8614  932.5592 2587.1085
##  [988]  651.7255 1016.2772 2575.4698 2239.9592  579.8626 2405.3700 2514.2738
##  [995]  629.3592  680.8278 1725.6179  481.5443 1591.5195  350.2980

Problem 6

Create a plot (using ggplot) that shows a histogram of the values in this vector.

library(ggplot2)
Values <- random_vector
#ggplot(df, aes(x=weight)) + geom_histogram()
fig2 <- ggplot() + aes(Values)+ geom_histogram(binwidth=100, colour="black")

print(fig2)

# was not sure if I should've randomized the y values, so I left it as is

Problem 7

A normal distribution is not really a good model for the data in this dataset. Briefly discuss why it is not a great model in this case. (Your answer should be ~1 paragraph.)

A normal distribution isnt a good model for this dataset since it does not exactly appear to have the bell curve like one usually expects. This is due to the vector containing completely random values. A probability density function would be better to use in this scenario since we are using random values. 

Problem 8

A short piece of code is shown below. Re-write the code so that it does the same thing, but doesn’t use any for-loops.

# This sets the random number generator so that 
# you will get the same values every time you run the code.
set.seed(42) 

x <- runif(100) # Generate 100 random values between 0 and 1


cat("The total is:", length(x[x<0.2]), "\n") #just uses indexing to find values less than 0.2
## The total is: 19