rm(list = ls())
aliens <- read.csv ("aliens.csv", header = TRUE, stringsAsFactors = TRUE)
library(skimr)
source('special_functions.R')
my_sample <- make.my.sample(33377932, 30, aliens)
2+3
## [1] 5
10-6
## [1] 4
10*5
## [1] 50
20/2
## [1] 10
round(24.674, digits = 3)
## [1] 24.674
There are 10 individuals represented in this dataframe.
head(aliens, 10)
## ID age color island college income antennae politics anxiety
## 1 1 33 Blue Blick Ganymede 27000 Curly Republicant 46
## 2 2 47 Pink Plume Ganymede 124000 Straight Independone 49
## 3 3 39 Pink Plume Io 43000 Straight Democrulite 51
## 4 4 24 Pink Blick Io 46000 Straight Republicant 45
## 5 5 53 Pink Blick Io 44000 Straight Democrulite 46
## 6 6 36 Blue Blick Europa 28000 Curly Republicant 49
## 7 7 58 Pink Nanspucket Europa 29000 Curly Democrulite 60
## 8 8 25 Pink Blick Io 37000 Straight Republicant 51
## 9 9 38 Pink Nanspucket Europa 35000 Straight Democrulite 52
## 10 10 40 Pink Plume Europa 33000 Straight Independone 48
## depression sociable control memory intelligence time1 time2 time3 food1
## 1 92 108 68 94 119 5.86 4.36 4.11 5
## 2 94 110 72 109 127 5.07 4.35 4.97 8
## 3 119 79 62 83 112 5.66 6.13 6.15 7
## 4 92 117 65 88 115 7.81 8.13 6.12 6
## 5 93 109 56 106 122 5.04 4.55 4.15 8
## 6 98 101 49 103 104 4.81 3.65 5.11 10
## 7 107 89 42 71 86 4.88 4.09 4.35 13
## 8 99 101 55 94 116 4.24 3.80 2.69 7
## 9 83 105 68 102 108 4.69 3.35 3.89 7
## 10 106 93 56 101 104 4.78 3.89 3.04 11
## sleep food2 reasoning_trials
## 1 6.0 9 1
## 2 7.8 11 1
## 3 4.4 9 1
## 4 6.0 9 1
## 5 4.8 9 1
## 6 5.5 7 1
## 7 4.0 8 1
## 8 6.6 9 1
## 9 3.2 6 1
## 10 5.7 7 1
tail(aliens)
## ID age color island college income antennae politics anxiety
## 9995 9995 54 Pink Nanspucket Ganymede 176000 Straight Democrulite 48
## 9996 9996 66 Blue Blick Europa 52000 Straight Republicant 52
## 9997 9997 33 Pink Plume Callisto 89000 Straight Republicant 53
## 9998 9998 60 Pink Nanspucket Callisto 23000 Straight Independone 51
## 9999 9999 51 Blue Blick Europa 37000 Straight Republicant 39
## 10000 10000 24 Pink Nanspucket Io 14000 Straight Democrulite 49
## depression sociable control memory intelligence time1 time2 time3 food1
## 9995 90 115 73 84 115 11.89 11.12 9.91 7
## 9996 110 80 57 91 100 4.79 2.92 2.95 6
## 9997 108 92 74 99 108 3.71 2.88 3.89 11
## 9998 107 89 75 87 102 3.40 3.61 3.76 7
## 9999 92 108 64 106 109 6.58 6.45 5.18 6
## 10000 99 101 69 104 124 5.13 3.32 3.47 8
## sleep food2 reasoning_trials
## 9995 6.2 8 4
## 9996 5.4 7 4
## 9997 5.5 5 2
## 9998 3.9 10 3
## 9999 5.9 9 2
## 10000 6.8 7 1
There are 20 variables represented in this dataframe.
Two variables in this dataframe that are numerical are time1 and income. Two variables that are categorical are college and politics.Time is continuous because it can be infinite while income is discrete because it is a set amount. They are both regarded as interval because they can be categorized and ranked and we are able to tell the difference between that variable of each participant.
class(aliens$age)
## [1] "integer"
class(aliens$time1)
## [1] "numeric"
class(aliens$income)
## [1] "numeric"
class(aliens$college)
## [1] "factor"
class(aliens$politics)
## [1] "factor"
help(summary)
The output shows the summary of objects in the file. Integers and how many levels of factors should be shown appears. I do not understand when it says “S3 method for class”.
Most of the aliens come from the island Blick. The most popular political parties are Republicant and Democrulite. The highest sociability score obtained by an alien is 117. The lowest memory score is 71.
library(skimr)
I do not understand what the changes made are.
The code is showing the difference between the variables food1 and food2.
aliens$food.diff <- aliens$food1 - aliens$food2
head(aliens)
## ID age color island college income antennae politics anxiety depression
## 1 1 33 Blue Blick Ganymede 27000 Curly Republicant 46 92
## 2 2 47 Pink Plume Ganymede 124000 Straight Independone 49 94
## 3 3 39 Pink Plume Io 43000 Straight Democrulite 51 119
## 4 4 24 Pink Blick Io 46000 Straight Republicant 45 92
## 5 5 53 Pink Blick Io 44000 Straight Democrulite 46 93
## 6 6 36 Blue Blick Europa 28000 Curly Republicant 49 98
## sociable control memory intelligence time1 time2 time3 food1 sleep food2
## 1 108 68 94 119 5.86 4.36 4.11 5 6.0 9
## 2 110 72 109 127 5.07 4.35 4.97 8 7.8 11
## 3 79 62 83 112 5.66 6.13 6.15 7 4.4 9
## 4 117 65 88 115 7.81 8.13 6.12 6 6.0 9
## 5 109 56 106 122 5.04 4.55 4.15 8 4.8 9
## 6 101 49 103 104 4.81 3.65 5.11 10 5.5 7
## reasoning_trials food.diff
## 1 1 -4
## 2 1 -3
## 3 1 -2
## 4 1 -3
## 5 1 -1
## 6 1 3
help(summary)
In this code I found the difference between variables time1 and time 3.
aliens$time.diff <- aliens$time1 - aliens$time3
Most aliens come from the islands Blick and Nanspucket. The most popular political party is Democrulite. The highest sociability score is 122. The lowest memory score is 79.
head(my_sample, 30)
## ID age color island college income antennae politics anxiety
## 88 88 27 Pink Blick Europa 43000 Straight Republicant 48
## 249 249 26 Pink Nanspucket Io 136000 Curly Democrulite 48
## 582 582 29 Pink Nanspucket Europa 78000 Straight Democrulite 52
## 600 600 38 Blue Blick Ganymede 84000 Straight Democrulite 47
## 852 852 66 Pink Nanspucket Callisto 39000 Straight Independone 50
## 1017 1017 43 Pink Blick Ganymede 19000 Straight Republicant 48
## 1139 1139 41 Pink Blick Ganymede 13000 Straight Independone 48
## 1390 1390 65 Blue Blick Ganymede 95000 Curly Republicant 45
## 2524 2524 34 Pink Nanspucket Callisto 234000 Straight Independone 54
## 2727 2727 42 Pink Nanspucket Ganymede 20000 Curly Democrulite 47
## 2928 2928 54 Pink Plume Europa 28000 Straight Republicant 53
## 2988 2988 56 Pink Plume Europa 49000 Straight Independone 41
## 3442 3442 53 Pink Nanspucket Europa 34000 Straight Republicant 46
## 3499 3499 12 Pink Blick Io 101000 Straight Republicant 46
## 3884 3884 35 Pink Plume Callisto 99000 Straight Independone 53
## 4037 4037 15 Blue Plume Ganymede 22000 Straight Democrulite 51
## 4097 4097 68 Blue Plume Ganymede 57000 Curly Republicant 57
## 4469 4469 16 Blue Nanspucket Europa 55000 Straight Independone 55
## 4728 4728 38 Pink Nanspucket Callisto 80000 Straight Independone 50
## 5397 5397 57 Pink Plume Callisto 180000 Straight Independone 54
## 5405 5405 58 Pink Plume Callisto 27000 Curly Independone 48
## 5566 5566 46 Pink Nanspucket Ganymede 121000 Curly Republicant 56
## 5661 5661 68 Blue Nanspucket Europa 45000 Straight Democrulite 51
## 6295 6295 33 Pink Nanspucket Callisto 42000 Straight Independone 54
## 6391 6391 58 Pink Plume Europa 17000 Straight Republicant 51
## 7208 7208 55 Pink Plume Ganymede 136000 Straight Independone 38
## 7488 7488 42 Pink Plume Ganymede 39000 Straight Independone 47
## 8019 8019 27 Pink Plume Ganymede 158000 Straight Democrulite 53
## 8225 8225 32 Pink Nanspucket Callisto 96000 Straight Republicant 47
## 9981 9981 40 Pink Plume Io 72000 Straight Republicant 51
## depression sociable control memory intelligence time1 time2 time3 food1
## 88 97 100 64 96 104 7.48 6.95 5.88 8
## 249 81 122 75 84 116 3.47 3.10 3.38 11
## 582 101 98 55 92 100 6.30 5.95 5.98 10
## 600 90 108 56 89 113 5.54 4.19 6.04 8
## 852 102 97 61 81 96 5.89 4.56 5.23 9
## 1017 98 101 57 79 109 8.26 6.33 6.90 12
## 1139 101 98 67 87 115 5.19 4.77 4.64 9
## 1390 91 111 56 80 109 8.95 9.42 9.20 3
## 2524 117 91 52 91 98 7.19 5.48 7.08 11
## 2727 100 100 56 94 116 6.01 6.01 5.92 10
## 2928 109 92 49 98 101 9.25 9.65 7.35 11
## 2988 87 124 68 94 104 4.77 3.35 4.71 10
## 3442 92 111 69 86 100 4.48 4.11 3.15 9
## 3499 96 103 83 95 123 3.46 3.42 2.61 12
## 3884 93 108 63 107 109 2.97 2.00 1.79 11
## 4037 97 101 51 86 111 6.41 6.32 4.75 6
## 4097 116 75 71 95 120 7.03 5.73 7.30 7
## 4469 99 102 57 117 113 6.02 5.18 5.36 6
## 4728 95 103 63 94 103 5.62 5.68 4.47 7
## 5397 109 96 58 75 92 6.89 6.84 5.37 11
## 5405 99 100 45 84 93 6.36 6.62 5.88 10
## 5566 102 98 61 95 118 6.26 4.74 5.05 8
## 5661 93 107 67 89 101 8.23 6.66 8.61 8
## 6295 108 90 58 105 107 6.55 5.58 5.96 6
## 6391 109 86 61 100 105 5.07 3.30 4.10 13
## 7208 79 137 53 105 121 7.40 7.34 6.48 5
## 7488 103 98 73 89 118 4.97 3.65 4.81 12
## 8019 113 80 54 115 126 5.50 4.46 3.71 9
## 8225 101 99 48 100 102 8.63 7.33 7.16 8
## 9981 105 95 41 76 103 7.00 5.55 6.99 6
## sleep food2 reasoning_trials
## 88 7.5 8 2
## 249 6.1 6 1
## 582 5.2 6 6
## 600 7.4 11 3
## 852 4.2 3 4
## 1017 6.5 9 5
## 1139 5.9 7 2
## 1390 5.2 11 1
## 2524 6.3 10 3
## 2727 6.1 10 3
## 2928 5.6 8 3
## 2988 5.1 6 1
## 3442 6.1 9 1
## 3499 5.8 12 1
## 3884 6.9 6 1
## 4037 7.3 8 3
## 4097 5.0 6 1
## 4469 6.5 9 2
## 4728 6.6 7 3
## 5397 6.8 5 7
## 5405 5.6 7 9
## 5566 6.0 8 2
## 5661 7.2 12 1
## 6295 4.3 7 1
## 6391 5.8 8 2
## 7208 6.1 11 3
## 7488 6.7 3 6
## 8019 6.1 11 3
## 8225 5.9 10 2
## 9981 6.9 8 3
One notable difference that I observed is there is more variety in the reasoning trials in my sample than the first ten of the whole population.
library(skimr)
Most aliens come from the island Blick. The most popular political party is Republicant. The highest sociability score is 122. The lowest memory score is 76. Nothing in this sample really surprised me because there was not a huge contrast from the size 30 sample.
my_sample_100 <- make.my.sample(33377932, 100, aliens)
head(my_sample_100)
## ID age color island college income antennae politics anxiety
## 74 74 47 Blue Blick Europa 37000 Straight Republicant 45
## 88 88 27 Pink Blick Europa 43000 Straight Republicant 48
## 100 100 19 Blue Nanspucket Callisto 71000 Curly Democrulite 44
## 249 249 26 Pink Nanspucket Io 136000 Curly Democrulite 48
## 339 339 32 Pink Blick Io 34000 Curly Democrulite 45
## 424 424 48 Pink Plume Callisto 29000 Straight Independone 47
## depression sociable control memory intelligence time1 time2 time3 food1
## 74 87 122 74 80 98 9.53 8.67 8.11 10
## 88 97 100 64 96 104 7.48 6.95 5.88 8
## 100 82 117 66 103 108 3.08 1.53 2.57 7
## 249 81 122 75 84 116 3.47 3.10 3.38 11
## 339 99 102 62 111 126 7.52 5.87 6.19 10
## 424 94 106 61 99 105 7.88 6.73 6.59 10
## sleep food2 reasoning_trials food.diff time.diff
## 74 5.4 7 11 3 1.42
## 88 7.5 8 2 0 1.60
## 100 6.0 7 1 0 0.51
## 249 6.1 6 1 5 0.09
## 339 6.9 7 6 3 1.33
## 424 6.9 10 2 0 1.29
tail(my_sample_100)
## ID age color island college income antennae politics anxiety
## 8858 8858 41 Pink Nanspucket Callisto 19000 Straight Democrulite 51
## 8940 8940 62 Blue Plume Callisto 57000 Straight Independone 42
## 9142 9142 19 Blue Nanspucket Io 58000 Straight Republicant 51
## 9391 9391 39 Pink Blick Callisto 59000 Straight Republicant 51
## 9683 9683 51 Blue Blick Europa 68000 Curly Republicant 55
## 9981 9981 40 Pink Plume Io 72000 Straight Republicant 51
## depression sociable control memory intelligence time1 time2 time3 food1
## 8858 107 88 48 97 100 3.44 1.63 1.60 12
## 8940 96 107 49 87 96 7.10 6.49 6.01 7
## 9142 88 107 64 102 122 8.35 6.77 7.72 9
## 9391 102 97 78 100 110 4.18 4.31 2.90 8
## 9683 90 111 62 97 104 3.59 2.90 3.78 9
## 9981 105 95 41 76 103 7.00 5.55 6.99 6
## sleep food2 reasoning_trials food.diff time.diff
## 8858 5.0 7 3 5 1.84
## 8940 5.7 11 3 -4 1.09
## 9142 5.8 10 2 -1 0.63
## 9391 6.5 7 1 1 1.28
## 9683 5.6 8 3 1 -0.19
## 9981 6.9 8 3 -2 0.01
This is a template for doing your homework assignments. It’s an R Markdown document, with the .Rmd extension.
Start by saving it with a new name, using Save As from the File menu above. Please make sure to save it in the special directory that you created for your homework (see my instructions for getting started in R). Please name it with your own name and the number of the homework assignment, followed by the .Rmd extension. For example: JohnDoeHW1.Rmd.
Make sure that you start by filling in the header information at the top of this file (e.g., name, student ID). Each time you do a new assignment, just change the file name and the header info.
Please leave in place the little block of code, above, that starts with ‘r setup’.
When you submit your homework, you will submit two files: this .Rmd file, and an .html file that you will create by **knitting* your .Rmd file. The .html file is a very nice-looking file, viewable in any internet browser (e.g., Chrome, Safari, etc.). It includes everything you’ve written in your homework file, including your code, but also includes the results of any R code that you put in your homework. In other words, by knitting the file, you are running all the code, and the output is included in the .html file along with the code that generated it.
To knit your file, go to the Knit menu above, and click Knit to HTML. The file will show up in the same directory that your .Rmd file is in.
Knitting your file will also provide a preview of what the .html file looks like, in the Viewer pane of RStudio on the right side of the screen. To make sure that you’ve got RStudio set up to give you a preview, click the little settings wheel above, and click Preview in Viewer Pane.
Try knitting this file, right now!
You can re-Knit your file as often as you like, so you can correct your mistakes. It will over-write the old version every time you do it.
For each new section of your document, make a header by typing two hash marks, followed by a space, and then the name of your section; see, for example, the header of this section.
Please make a new section for each question in the homework. That is, one section is called Question 1, the next is called Question 2, etc.
The first section of each homework assignment will be where you’ll do some preliminary things that you’ll need for the assignment, but which aren’t part of the assignment itself. You can call this section Preliminaries.
Within each section, you can write sentences, which you will need to do to answer parts of many of the questions, and you can put in blocks of R code, which will be executed when you knit your file.
There is an R Markdown formatting cheat sheet that I have put on the class Moodle page, in case you want to play with more advanced formatting. Also note that there are many resources on the internet.
Let’s say my question was, ‘What’s the mean of the numbers 7, 11, 14, and 100? What’s the median? Why is the mean higher than the median?’ You could answer this by writing a little code chunk (don’t worry, I don’t expect you to understand the details of this code yet), and then writing the explanatory part. You insert a code chunk by clicking on the drop down menu above with a letter c, and then R. When you knit the file, the code chunk will be executed, and the results will show up in your .html file right after the code.
mean(c(7,11,14,100))
## [1] 33
median(c(7,11,14, 100))
## [1] 12.5
The mean (33) is much higher than the median (14) because the mean is strongly influenced by the one extreme value (100), while the median is not.
Very important: You can run the code chunk without Knitting the entire file just by clicking the little ‘play’ symbol (triangle pointing to the right) in the upper right of the code chunk itself. It’s a very good idea to run each of your code chunks before Knitting the file, to check that they work the way you intended.
Let’s say my question was, “What kind of graph should you use to show
the relationship between weight and gas mileage in the
mtcars data set? Make the graph. Interpret it.” You could
answer it like this:
Here we are examining the relationship between two quantitative variables, so we should make a scatterplot. Here is the plot:
data(mtcars)
plot(mtcars$wt, mtcars$mpg)
As weight goes up, gas mileage goes down. The relationship between the two variables appears to be linear.
Again, you can run this little bit of code by clicking the ‘play’ symbol, and you’ll see the plot show up right under the code itself.
If there’s a problem with your code, the file will not Knit properly; you’ll get an error message. First, you should try to fix it. But if you can’t, you still don’t want that to prevent you from doing the rest of the homework. In this case, just put ‘eval = FALSE’ in your code chunk, just like this:
mean(c(7,b,14,100))
This code is ‘broken’ because I included the letter b in the
list of values that the mean function was supposed to deal
with, and this makes no sense. But because I also included ‘eval =
FALSE’, this error won’t stop my file from knitting, because the code in
this chunk won’t be run. In a case like this, you should explain in your
homework that you had an error in your code, but that you couldn’t
figure out how to fix it. (The more you explain to your TA about your
attempt to do a problem, the more partial credit you are likely to
get.)
Your TA is available to help you figure out how to format your homework assignments, and so am I. After the first one or two assignments, it will be very easy.