aliens <- read.csv ("aliens.csv", header = TRUE, stringsAsFactors = TRUE)
source('special_functions.R')
my_sample <- make.my.sample(34370790, 30, aliens)
240+350
## [1] 590
372817-5
## [1] 372812
10.6*80
## [1] 848
50/4
## [1] 12.5
round(56.781563, digits=3)
## [1] 56.782
round(56.781563, digits=1)
## [1] 56.8
head(aliens)
## ID age color island college income antennae politics anxiety depression
## 1 1 33 Blue Blick Ganymede 27000 Curly Republicant 46 92
## 2 2 47 Pink Plume Ganymede 124000 Straight Independone 49 94
## 3 3 39 Pink Plume Io 43000 Straight Democrulite 51 119
## 4 4 24 Pink Blick Io 46000 Straight Republicant 45 92
## 5 5 53 Pink Blick Io 44000 Straight Democrulite 46 93
## 6 6 36 Blue Blick Europa 28000 Curly Republicant 49 98
## sociable control memory intelligence time1 time2 time3 food1 sleep food2
## 1 108 68 94 119 5.86 4.36 4.11 5 6.0 9
## 2 110 72 109 127 5.07 4.35 4.97 8 7.8 11
## 3 79 62 83 112 5.66 6.13 6.15 7 4.4 9
## 4 117 65 88 115 7.81 8.13 6.12 6 6.0 9
## 5 109 56 106 122 5.04 4.55 4.15 8 4.8 9
## 6 101 49 103 104 4.81 3.65 5.11 10 5.5 7
## reasoning_trials
## 1 1
## 2 1
## 3 1
## 4 1
## 5 1
## 6 1
tail(aliens)
## ID age color island college income antennae politics anxiety
## 9995 9995 54 Pink Nanspucket Ganymede 176000 Straight Democrulite 48
## 9996 9996 66 Blue Blick Europa 52000 Straight Republicant 52
## 9997 9997 33 Pink Plume Callisto 89000 Straight Republicant 53
## 9998 9998 60 Pink Nanspucket Callisto 23000 Straight Independone 51
## 9999 9999 51 Blue Blick Europa 37000 Straight Republicant 39
## 10000 10000 24 Pink Nanspucket Io 14000 Straight Democrulite 49
## depression sociable control memory intelligence time1 time2 time3 food1
## 9995 90 115 73 84 115 11.89 11.12 9.91 7
## 9996 110 80 57 91 100 4.79 2.92 2.95 6
## 9997 108 92 74 99 108 3.71 2.88 3.89 11
## 9998 107 89 75 87 102 3.40 3.61 3.76 7
## 9999 92 108 64 106 109 6.58 6.45 5.18 6
## 10000 99 101 69 104 124 5.13 3.32 3.47 8
## sleep food2 reasoning_trials
## 9995 6.2 8 4
## 9996 5.4 7 4
## 9997 5.5 5 2
## 9998 3.9 10 3
## 9999 5.9 9 2
## 10000 6.8 7 1
10000 individuals are represented in this data frame.
There are 21 variables represented in this data frame. I figured this out by clicking on the data of the aliens data set, which told me.
class(aliens$color)
## [1] "factor"
class(aliens$income)
## [1] "numeric"
class(aliens$time3)
## [1] "numeric"
class(aliens$politics)
## [1] "factor"
The two categorical variables I chose were politics and color. These are categorical because it has a set number of groups that are assigned based on observation. The two numerical variables I chose were time3 and income. Time3 would be best regarded as a continuous variable because there is an infinite number of possible values, and it is best regarded as an interval variable because its value is measurable and constant. Income would also best be regarded as a continuous variable because it can take on any value within a range, and it is best regarded as a ratio variable because it has a zero point.
summary(aliens)
## ID age color island college
## Min. : 1 Min. :10.00 Blue:3064 Blick :3504 Callisto:2472
## 1st Qu.: 2501 1st Qu.:26.00 Pink:6936 Nanspucket:3032 Europa :2533
## Median : 5000 Median :40.00 Plume :3464 Ganymede:2491
## Mean : 5000 Mean :40.21 Io :2504
## 3rd Qu.: 7500 3rd Qu.:55.00
## Max. :10000 Max. :70.00
## income antennae politics anxiety depression
## Min. : 5000 Curly :2155 Democrulite:3218 Min. :29 Min. : 65
## 1st Qu.: 34000 Straight:7845 Independone:3452 1st Qu.:47 1st Qu.: 93
## Median : 55000 Republicant:3330 Median :50 Median :100
## Mean : 69708 Mean :50 Mean :100
## 3rd Qu.: 90000 3rd Qu.:53 3rd Qu.:107
## Max. :559000 Max. :68 Max. :140
## sociable control memory intelligence
## Min. : 35.00 Min. :21.00 Min. : 52.00 Min. : 78.0
## 1st Qu.: 94.00 1st Qu.:53.00 1st Qu.: 85.00 1st Qu.:101.0
## Median :100.00 Median :60.00 Median : 92.00 Median :109.0
## Mean : 99.99 Mean :60.07 Mean : 92.06 Mean :108.5
## 3rd Qu.:106.00 3rd Qu.:67.00 3rd Qu.: 99.00 3rd Qu.:116.0
## Max. :167.00 Max. :96.00 Max. :129.00 Max. :135.0
## time1 time2 time3 food1
## Min. : 1.740 Min. : 0.570 Min. : 0.280 Min. : 1.000
## 1st Qu.: 5.130 1st Qu.: 4.290 1st Qu.: 4.298 1st Qu.: 7.000
## Median : 6.170 Median : 5.490 Median : 5.490 Median : 9.000
## Mean : 6.615 Mean : 5.867 Mean : 5.867 Mean : 8.665
## 3rd Qu.: 7.620 3rd Qu.: 6.990 3rd Qu.: 6.970 3rd Qu.:10.000
## Max. :24.890 Max. :23.420 Max. :25.300 Max. :17.000
## sleep food2 reasoning_trials
## Min. :2.500 Min. : 0.000 Min. : 1.000
## 1st Qu.:5.400 1st Qu.: 7.000 1st Qu.: 1.000
## Median :6.000 Median : 9.000 Median : 2.000
## Mean :6.012 Mean : 8.674 Mean : 2.958
## 3rd Qu.:6.700 3rd Qu.:10.000 3rd Qu.: 4.000
## Max. :9.500 Max. :16.000 Max. :27.000
The output gives me the minimum, value of the 1st quartile, median, mean, the value of the 3rd quartile, and the maximum for each variable in the aliens data set.
aliens$food.diff <- aliens$food1 - aliens$food2
head(aliens$food1)
## [1] 5 8 7 6 8 10
head (aliens$food2)
## [1] 9 11 9 9 9 7
summary(aliens$food1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 7.000 9.000 8.665 10.000 17.000
summary(aliens$food2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 7.000 9.000 8.674 10.000 16.000
This code is giving me the values and summary of the data for the variables food1 and food2.
aliens$food.diff <- aliens$isolated
head(aliens$isolated)
## NULL
summary(aliens$isolated)
## Length Class Mode
## 0 NULL NULL
The variable I chose was isolated, the opposite of sociable, which is already one of the variables in the data set. This variable will tell us the scores of isolation obtained by any aliens and allow us to compare the scores to the sociability scores.
summary(my_sample)
## ID age color island college
## Min. : 131 Min. :13.00 Blue: 5 Blick :10 Callisto: 6
## 1st Qu.:3421 1st Qu.:34.25 Pink:25 Nanspucket: 6 Europa :13
## Median :6246 Median :49.00 Plume :14 Ganymede: 7
## Mean :5629 Mean :46.27 Io : 4
## 3rd Qu.:8130 3rd Qu.:58.75
## Max. :9815 Max. :66.00
## income antennae politics anxiety
## Min. : 14000 Curly : 4 Democrulite: 7 Min. :38.00
## 1st Qu.: 42250 Straight:26 Independone:11 1st Qu.:46.25
## Median : 63500 Republicant:12 Median :49.00
## Mean : 72367 Mean :50.23
## 3rd Qu.:106000 3rd Qu.:53.75
## Max. :162000 Max. :67.00
## depression sociable control memory
## Min. : 75.0 Min. : 81.00 Min. :27.00 Min. : 74.00
## 1st Qu.: 98.0 1st Qu.: 91.25 1st Qu.:54.50 1st Qu.: 83.25
## Median :102.0 Median : 97.50 Median :60.00 Median : 88.00
## Mean :102.2 Mean : 97.57 Mean :60.07 Mean : 90.30
## 3rd Qu.:108.8 3rd Qu.:101.75 3rd Qu.:66.00 3rd Qu.: 96.00
## Max. :120.0 Max. :125.00 Max. :74.00 Max. :119.00
## intelligence time1 time2 time3
## Min. : 91.0 Min. : 3.720 Min. :2.120 Min. : 2.940
## 1st Qu.: 96.5 1st Qu.: 5.400 1st Qu.:4.603 1st Qu.: 4.695
## Median :103.0 Median : 6.765 Median :5.720 Median : 5.520
## Mean :105.7 Mean : 6.842 Mean :5.950 Mean : 6.146
## 3rd Qu.:114.0 3rd Qu.: 8.127 3rd Qu.:7.548 3rd Qu.: 7.617
## Max. :131.0 Max. :11.430 Max. :9.620 Max. :10.890
## food1 sleep food2 reasoning_trials
## Min. : 5.000 Min. :4.300 Min. : 7.000 Min. : 1.000
## 1st Qu.: 7.250 1st Qu.:5.600 1st Qu.: 8.000 1st Qu.: 1.000
## Median : 9.000 Median :6.200 Median : 9.000 Median : 2.500
## Mean : 9.033 Mean :6.317 Mean : 9.333 Mean : 3.533
## 3rd Qu.:10.000 3rd Qu.:6.975 3rd Qu.:10.000 3rd Qu.: 5.000
## Max. :13.000 Max. :9.400 Max. :13.000 Max. :11.000
my_sample_2 <- make.my.sample(34370791, 30, aliens)
summary(my_sample_2)
## ID age color island college
## Min. : 64 Min. :15.00 Blue:11 Blick :13 Callisto:8
## 1st Qu.:1954 1st Qu.:27.25 Pink:19 Nanspucket: 9 Europa :8
## Median :4186 Median :41.00 Plume : 8 Ganymede:8
## Mean :4411 Mean :42.37 Io :6
## 3rd Qu.:6677 3rd Qu.:58.75
## Max. :9844 Max. :69.00
## income antennae politics anxiety
## Min. : 15000 Curly : 6 Democrulite:13 Min. :40.00
## 1st Qu.: 28500 Straight:24 Independone: 5 1st Qu.:45.25
## Median : 49500 Republicant:12 Median :49.00
## Mean : 79267 Mean :49.43
## 3rd Qu.:104750 3rd Qu.:51.75
## Max. :266000 Max. :60.00
## depression sociable control memory
## Min. : 77.0 Min. : 67.0 Min. :42.0 Min. : 74.00
## 1st Qu.: 95.0 1st Qu.: 94.0 1st Qu.:55.5 1st Qu.: 92.25
## Median :102.0 Median : 98.0 Median :62.0 Median : 95.50
## Mean :100.3 Mean : 98.2 Mean :61.7 Mean : 95.83
## 3rd Qu.:106.0 3rd Qu.:104.8 3rd Qu.:67.0 3rd Qu.:103.75
## Max. :122.0 Max. :130.0 Max. :88.0 Max. :112.00
## intelligence time1 time2 time3
## Min. : 94.0 Min. : 3.130 Min. : 2.760 Min. : 1.170
## 1st Qu.:100.2 1st Qu.: 5.070 1st Qu.: 4.465 1st Qu.: 4.082
## Median :110.0 Median : 5.990 Median : 5.215 Median : 5.055
## Mean :110.3 Mean : 6.377 Mean : 5.827 Mean : 5.619
## 3rd Qu.:116.8 3rd Qu.: 7.815 3rd Qu.: 7.325 3rd Qu.: 7.140
## Max. :128.0 Max. :10.300 Max. :10.310 Max. :10.420
## food1 sleep food2 reasoning_trials
## Min. : 3.00 Min. :4.000 Min. : 4.0 Min. : 1.0
## 1st Qu.: 6.25 1st Qu.:5.400 1st Qu.: 7.0 1st Qu.: 1.0
## Median : 8.00 Median :6.000 Median : 8.0 Median : 2.0
## Mean : 7.70 Mean :5.937 Mean : 8.1 Mean : 2.9
## 3rd Qu.: 9.00 3rd Qu.:6.525 3rd Qu.:10.0 3rd Qu.: 4.0
## Max. :11.00 Max. :7.400 Max. :12.0 Max. :10.0
The first sample and the second sample had values that were pretty close to each other, but were very different from the population. The first sample captures the truth about the population more than the second sample because the values in the summary are closer to the population, and the values of the second sample were less than the first. The two samples were fairly consistent with each other, but not with the population. I am not surprised by the discrepancy between the population and two samples because in the aliens data set, there are 10000 objects of 21 variables, and in the first and second data sets, there are only 30 objects of 21 variables. The discrepancy between the first and second sample can also be explained by the addition of 1 to my student ID number in question 10.