Name: Megan Panier (uni: map2365). Due date: 09.21.2024. File Description: This file was created as a part of my submission for Homework 1 for P8105. Below, you will find my work for questions 1 and 2. Thank you for taking the time to read my submission.
*This dataset has information about Adelie, Gentoo, Chinstrap penguins from Torgersen, Biscoe, Dream. The variables recorded in the dataset are species, island, bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, sex, year.
*This dataset contains information about 344 penguins across 8 variables.
*The mean flipper length in mm of the penguins included in these data is 200.9152047mm.
From attempting to take the means of each of these variable types, we observe that the returned output is NA when we are attempting to take the mean of a character vector or the mean of a factor vector. We expect this to happen since these are not numeric variables. We were able to take the mean of the random sample and this mean is a descriptive characteristic of the sampled population. We were able to calculate the mean of the logical vector, but because the logical vector represents a binary variable, the mean represents the proportion of cases vs. non cases (in this case, when the value is greater than 0, this is a “case”).
*The mean of the random sample of size 10 from a standard Normal distribution (norm_samp) is 0.0958931.
*The mean of the logical vector indicating whether elements of the sample are greater than 0 is 0.3.
*The mean of the character vector of length 10 is NA.
*The mean of the factor vector of length 10, with 3 different factor “levels” is NA.
When the logical vector variable was converted to numeric, the two possible outcomes became 0 and 1. This makes sense since the logical vector is a binary outcome which only has two possible outcomes. As mentioned previously, We were able to calculate the mean of the logical vector, but because the logical vector represents a binary variable, the mean represents the proportion of cases vs. non cases (in this case, when the value is greater than 0, this is a “case”).When the character vector was converted to as.numeric, NAs were introduced by coercion. This is expected because the character vector variable is not numeric and not able to be ranked because they are not assigned to different categories. This explains why were not able to calculate the mean. Finally, when the factor vector variable was converted to numeric, the three categories were assigned to 1, 2, and 3. Again, we were not able to take the mean in this case because the mean would be describing the distribution of observations present amoung the different categories as opposed to presenting the mean of the sample population. Furthermore, the categories are numeric or able to be ranked by order of magnitude which further explains why were not able to calculate the mean.
usethis::use_github() 1