Is 4 greater than 3?
4>3
## [1] TRUE
Is 3 or 8 greater than or equal to 3?
c(3,8) >= 3
## [1] TRUE TRUE
Is 3 or 8 less than or equal to 3?
c(3,8) <= 3
## [1] TRUE FALSE
Is 1, 4, or 9 exactly equal to 9?
c(1,4,9) == 9
## [1] FALSE FALSE TRUE
Is 1, 4, or 9 not (exactly) equal to 9?
c(1,4,9) != 9
## [1] TRUE TRUE FALSE
NCbirths <- read.csv("~/Desktop/spring 2020/stats 10/Lab_data_sets/births.csv")
What is the number of babies that weighed more than 100 ounces?
sum(NCbirths$weight > 100)
## [1] 1643
What is the proportion of babies that weighed more than 100 ounces?
mean(NCbirths$weight > 100)
## [1] 0.8247992
What is the proportion of female babies?
mean(NCbirths$Gender == "Female")
## [1] 0.4804217
What is the proportion of babies NOT assigned male?
mean(NCbirths$Gender != "Male")
## [1] 0.4804217
fem_weights <- NCbirths$weight[NCbirths$Gender == "Female"]
fem_weights
## [1] 177 144 98 104 123 153 106 125 115 83 130 84 147 106 117 112 115 107
## [19] 105 119 143 119 33 118 134 106 118 130 102 134 116 119 57 118 123 135
## [37] 77 122 117 112 89 122 83 151 125 114 109 93 96 86 137 86 142 136
## [55] 105 98 119 106 109 139 97 85 121 115 91 120 116 122 99 128 114 77
## [73] 115 74 122 117 136 143 92 121 109 130 125 108 132 130 100 114 108 131
## [91] 95 116 118 117 102 108 95 108 120 91 135 95 145 34 84 111 110 131
## [109] 103 113 104 107 36 136 109 135 148 111 104 139 113 145 133 117 142 99
## [127] 105 105 120 125 120 131 124 117 126 126 104 126 121 106 111 128 101 128
## [145] 135 104 116 103 84 136 108 119 106 113 113 131 118 98 114 117 113 126
## [163] 121 120 119 130 111 121 122 99 113 120 77 113 144 128 104 50 69 117
## [181] 80 116 85 111 112 123 124 80 106 143 122 109 73 109 57 102 107 124
## [199] 118 113 104 131 123 101 120 132 122 118 108 123 110 122 112 137 133 111
## [217] 120 120 128 47 107 134 96 127 132 125 87 99 104 137 108 76 103 153
## [235] 125 112 130 104 70 120 108 123 112 72 115 136 110 141 121 93 108 100
## [253] 108 131 161 111 105 107 109 123 112 95 121 114 123 111 124 139 106 109
## [271] 117 104 124 113 117 122 83 124 121 106 123 138 128 116 119 111 96 115
## [289] 117 120 115 106 121 144 124 145 115 145 126 143 140 109 96 120 105 93
## [307] 92 111 134 110 128 128 147 83 123 146 130 112 109 102 102 107 96 81
## [325] 102 108 137 120 117 99 51 146 114 106 110 96 135 122 107 115 104 120
## [343] 107 118 99 115 100 157 126 126 102 136 125 96 77 111 102 98 122 117
## [361] 124 114 111 140 99 120 129 87 106 114 111 147 135 110 131 146 117 116
## [379] 113 104 114 111 115 112 91 103 119 119 132 106 106 124 119 85 115 105
## [397] 17 126 121 116 139 128 104 112 104 116 93 142 118 118 118 123 126 114
## [415] 129 120 105 43 103 116 99 119 131 118 137 104 108 109 89 105 130 119
## [433] 113 114 76 119 133 107 124 136 102 130 94 105 139 109 114 91 121 103
## [451] 120 100 151 132 96 121 142 112 115 120 117 72 130 122 115 91 117 131
## [469] 100 121 118 141 100 115 97 93 97 117 146 112 94 117 120 71 131 89
## [487] 14 118 117 133 114 120 120 133 87 111 91 118 102 110 108 98 108 114
## [505] 100 100 130 157 140 113 121 118 128 82 110 139 137 103 126 114 117 118
## [523] 138 107 106 137 116 107 143 112 133 127 100 108 98 95 110 108 129 120
## [541] 102 109 133 69 96 136 121 59 115 135 105 112 107 123 127 115 113 115
## [559] 117 104 114 101 115 92 134 130 110 104 116 89 118 131 113 110 118 73
## [577] 106 111 91 128 131 142 109 100 105 44 126 140 139 129 119 98 124 109
## [595] 117 112 116 102 114 100 78 134 135 155 134 143 97 143 122 145 76 119
## [613] 107 120 128 131 161 127 28 125 118 146 122 85 111 127 135 118 121 102
## [631] 101 105 133 78 90 102 23 145 143 119 130 118 116 105 141 129 125 52
## [649] 124 105 116 95 107 123 98 139 130 116 119 129 114 110 107 98 124 125
## [667] 146 105 81 105 98 124 132 115 132 123 113 133 131 120 140 110 106 129
## [685] 151 112 110 126 118 100 115 95 109 101 109 128 58 148 120 126 117 129
## [703] 137 105 90 102 119 143 144 119 119 108 120 115 117 112 116 128 127 110
## [721] 57 111 127 106 150 126 111 106 115 116 135 147 25 127 115 78 110 124
## [739] 120 126 113 94 112 104 96 108 108 134 147 121 122 112 122 110 128 100
## [757] 135 94 130 107 111 146 98 134 96 118 108 115 119 139 155 129 128 104
## [775] 105 125 113 147 113 84 106 96 116 108 108 115 105 114 108 93 126 133
## [793] 122 122 118 119 96 99 105 113 113 126 104 144 99 99 127 113 72 141
## [811] 98 135 139 116 112 91 135 118 106 116 121 115 107 111 95 128 109 100
## [829] 106 94 77 125 100 97 129 112 133 103 131 125 124 87 129 88 135 117
## [847] 86 145 114 122 132 103 130 105 125 108 125 119 161 97 143 135 149 68
## [865] 143 87 86 108 111 122 123 133 103 88 94 98 123 109 119 122 102 117
## [883] 118 96 105 114 109 76 120 114 145 105 125 75 120 104 129 117 97 106
## [901] 128 76 119 83 117 107 115 120 95 99 60 140 117 100 138 123 134 124
## [919] 125 144 128 90 113 125 20 118 94 89 121 123 119 109 120 105 88 122
## [937] 92 141 131 107 130 105 104 118 144 110 120 114 105 133 112 139 91 112
## [955] 104 115 129
Create an object with the baby weights from NCbirths
baby_weight <- NCbirths$weight
Create an object with the baby genders from NCbirths
baby_gender <- NCbirths$Gender
Create a logical vector to describe if the gender is female
is_female <- baby_gender =="Female"
Create the vector of weights containing only females
fem_weights <- NCbirths$weight[NCbirths$Gender == "Female"]
Download the data from CCLE and read it into R. When you read in the data, name your object “flint”.
flint <- read.csv("~/Desktop/spring 2020/stats 10/Lab_data_sets/flint.csv")
head(flint)
The EPA states a water source is especially dangerous if the lead level is 15 PPB or greater. What proportion of the locations tested were found to have dangerous lead levels?
mean(flint$Pb >= 15)
## [1] 0.04436229
The proportion of the locations tested found to have dangerous lead levels is 0.04436229.
Report the mean copper level for only test sites in the North region.
mean(flint$Cu[flint$Region == "North"])
## [1] 44.6424
Report the mean copper level for only test sites with dangerous lead levels (at least 15 PPB).
mean(flint$Cu[flint$Pb >= 15])
## [1] 305.8333
Report the mean lead and copper levels.
mean(flint$Cu)
## [1] 54.58102
mean(flint$Pb)
## [1] 3.383272
Create a box plot with a good title for the lead levels.
boxplot(flint$Pb, xlab = "Lead levels", main = "Lead level boxplot")
Based on what you see in part (f), does the mean seem to be a good measure of center for the data? Report a more useful statistic for this data.
The mean is not a good measure of center for the data, since the data is right-skewed. The median is a better measure of center.
median(flint$Pb)
## [1] 0
Construct a scatterplot of Life against Income. Note: Income should be on the horizontal axis. How does income appear to affect life expectancy?
life <- read.table("http://www.stat.ucla.edu/~nchristo/statistics12/countries_life.txt", header = TRUE)
plot(y = life$Life, x = life$Income, xlab = "Income", ylab = "Life Expectancy")
Life expectancy increases as income increases.
Construct the boxplot and histogram of Income. Are there any outliers?
hist(life$Income, xlab = "Income", main = "Life Expectancies vs. per Capita Income")
boxplot(life$Income, xlab = "Income", main = "Life Expectancies vs. per Capita Income")
Yes, there are several outliers.
Split the data set into two parts: One for which the Income is strictly below $1000, and one for which the Income is at least $1000. Come up with your own names for these two objects.
below1000 = life[life$Income < 1000,]
above1000 = life[life$Income > 1000,]
Use the data for which the Income is below $1000. Plot Life against Income and compute the correlation coefficient. Hint: use the function cor()
plot(below1000$Life~below1000$Income, xlab = "Income", ylab = "Life Expectancy")
cor(x = below1000$Life, y = below1000$Income)
## [1] 0.752886
Compute the summary statistics for lead and zinc using the summary() function.
maas <- read.table("http://www.stat.ucla.edu/~nchristo/statistics12/soil.txt", header = TRUE)
summary(maas$lead)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 37.0 72.5 123.0 153.4 207.0 654.0
summary(maas$zinc)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 113.0 198.0 326.0 469.7 674.5 1839.0
Plot two histograms: one of lead and one of log(lead).
hist(maas$lead)
hist(log(maas$lead))
Plot log(lead) against log(zinc). What do you observe?
plot(log(lead) ~ log(zinc), data = maas, xlab = "Logarithm of zinc concentration", ylab = "Logarithm of lead concentration")
Thhe scatterplot is linear. The correlation coefficient is positive. However, we are unable to imply a causal relationship.
The level of risk for surface soil based on lead concentration in ppm is given on the table below: Mean concentration (ppm) Level of risk Below 150 Lead-free Between 150-400 Lead-safe Above 400 Signif. environmental lead hazard Use techniques similar to last lab to give different colors and sizes to the lead concentration at these 155 locations. You do not need to use the maps package create a map of the area. Just plot the points without a map.
lead_colors <- c("green", "yellow", "red")
lead_levels <- cut(maas$lead, c(0, 150, 400, 1000))
plot(maas$x, maas$y, cex = maas$lead/mean(maas$lead), col = lead_colors[as.numeric(lead_levels)], pch = 19)
Plot the data point locations. Use good formatting for the axes and title. Then add the outline of LA County by typing: map(“county”, “california”, add = TRUE)
LA <- read.table("http://www.stat.ucla.edu/~nchristo/statistics12/la_data.txt", header = TRUE)
find.package("maps")
## [1] "/Library/Frameworks/R.framework/Versions/3.6/Resources/library/maps"
library(maps)
plot(x = LA$Longitude, y = LA$Latitude, xlim = c(-120,-117), ylim = c(33,35), ylab = "Latitude", xlab = "Longitude", main = "Schools in LA")
map("county", "California", add = TRUE)
Do you see any relationship between income and school performance? Hint: Plot the variable Schools against the variable Income and describe what you see. Ignore the data points on the plot for which Schools = 0. Use what you learned about subsetting with logical statements to first create the objects you need for the scatter plot. Then, create the scatter plot.
LA.subset <- LA[LA$Schoolsd!=0,]
plot(LA$Schools~LA$Income, data = LA.subset)
The variables are moderately associated and the scatterplot is linear. It is not possible to conclude that there is causation. However, school performance generally increases as income increases