setwd(“~/Desktop/ filename”) read.csv(“mydata.csv”) to run something, put the string/vector within chunks and then run it…then type ‘response_my_data’ or whatever it’s called in order to get the results kjchoi@bu.edu
2.1
7 a) China had the most internet users b) The UK had about 50 million internet users in 2010 c) China had about 355 million more internet users than Germany in 2010 d) This graph could me misleading because there is a single outlier, China, while the other countries for the most part appear to have the same amount of internet users in 2010. This should’ve been a relative frequency graph because China has so many more users than the other countries.
9 a) About 68% of the respondents believe that divorce is morally acceptable. b) About 55.2 Americans believe that divorce is morally wrong. c) Inferential, because Gallup is studying a small group of adult Americans in order to make summarizations about the entire population.
11 a) 44% of respondents ages 18-34 said that they would more likely buy a product made in America. 31% of respondents ages 35-44 said that they would also be more likely to buy a product made in America. b) The group of 55+ people are the sample group that is most likely to buy products that are made in America. c) The group of 18-34 people are the sample group that is least likely to buy products that are made in America. d) The apparent association according to the graph is that the older you are, the more likely you are to buy something that is made in America versus other countries.
13 a)
my_data <- c(125, 324, 552, 1257, 2518)
groups <- c("never", "rarely", "sometimes", "most of the time", "always")
rel_freq <- (my_data / sum(my_data))
barplot(rel_freq, main = "College Survey", names.arg = groups, col = c("red", "blue", "green", "yellow", "grey"))
my_data <- c(125, 324, 552, 1257, 2518)
barplot(rel_freq, main = "College Survey", names.arg = groups, col = c("red", "blue", "green", "yellow", "grey"))
my_data <- c(125, 324, 552, 1257, 2518)
barplot(rel_freq, main = "College Survey", names.arg = groups, col = c("red", "blue", "green", "yellow", "grey"))
pie(my_data, labels = groups, main = "College Survey")
15 a)
my_data <- c(81, 132, 192, 243, 377)
groups <- c("A few times a month or less", "A few times a week", "Up to 1 hour a day", "Never", "More than 1 hour a day")
table <- my_data / sum(my_data)
barplot(table, main = "Frequencies", names.arg = groups, col = c("red", "blue", "green", "yellow", "grey"))
my_data <- c(81, 132, 192, 243, 377)
groups <- c("A few times a month or less", "A few times a week", "Up to 1 hour a day", "Never", "More than 1 hour a day")
barplot(my_data, main = "Responses for Internet Usage", names.arg = groups, col = c("grey", "green", "red", "purple", "pink"))
my_data <- c(81, 132, 192, 243, 377)
barplot(rel_freq, main = "Reponse for Internet Usage", names.arg = groups, col = c("red", "blue", "green", "yellow", "grey"))
barplot(rel_freq, main = "Response for Internet Usage", names.arg = groups, col = c("red", "blue", "green", "yellow", "grey"))
2.2
9 a) The most frequent outcome of the experiment was 8 b) The least frequent outcome of the was 2 c) We observed a 7 fifteen times d) There were about 4 more fives observed than fours e) A 7 was observed about 15% of the time f) The destribution is a bell curve
10 a) 4 was the most frequent number of cars sold in a week b) 2 cars were sold 18 times for every 52 weeks (in 1 year) c) Two cars were sold 17.3% of the time d) This distribution is skewed to the right.
11 a) 200 students were sampled. b) The class width is 10 c) Class 60-69 has the frequency of 2, class 70-79 has the frequency of 3, class 80-89 has the frequency of 13, class 90-99 has the frequency of 42, class 100-109 has the frequency of 58, class 110-119 has the frequency of 40, class 120-129 has the frequency of 31, class 130-139 has the frequency of 8, class 140-149 has the frequency of 2, class 150-160 has the frequency of 1 d) The class with the highest frequency is 100-109 e) Class 150-159 has the lowest frequency f) 94.5% of students had an IQ of at least 130 g) No students had an IQ test of 165
12 a) The class width is 200 b) The classes are 0-199, 200-399, 400-600, 1000-1200, and 1400-1600 c) The class with the highest frequency is 0-199 d) The distribution is skewed to the right e) Since this study was an observational study, not an experiment, one cannot draw conclusions; this would be an error in correlation vs. causation.
13 a) This would probably be skewed to the right, since most household incomes would be closer to the left (as in lower), while there would be a smaller amount of incomes that would be closer to the right (large incomes). b) Scores on the SAT probably follow a bell-shaped curve, since there are always some students who don’t do well on the exam, some who do extremely well, and many students who do an average job. c) This would probably be skewed to the right, since most families have a smaller number of family members (1-4), while some outlier families may have many more members d) This would probably be skewed to the left, since most patients with Alzheimer’s are older aged people rather than younger.
14 a) This would probably be skewed to the right because most people would consume a small number of drinks, while there are some people who consume a large number of drinks per week. b) This would probabaly be skewed to the right because there are probably more children who begin in the public school system; as they grow older, some of the origional population may transfer to private schools, thus decreasing the size of the population. c) This would probably be scewed to the left because there is more likely to be more hearing-aid patients as the ages of those people grow older. d) This would probably be a bell curve shape, because there will be some very short and very tall adult men, however most will be average size.
15 a) Sepal Length Histogram
response_my_data <- c(1, 3,12, 16, 18)
groups <- c("0", "1", "2", "3", "4")
hist(iris$Sepal.Length)
16 a)
response_my_data <- c(16, 11, 9, 7, 2, 3, 0, 1, 0, 1)
groups <- c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10")
table(response_my_data)
## response_my_data
## 0 1 2 3 7 9 11 16
## 2 2 1 1 1 1 1 1