Submit your R Markdown to an appropriate dropbox in D2L.
When submitting your file, include a link to your app created with the learnr package. (File -> New File -> R Markdown -> From Template -> Interactive Tutorial)
Register with https://rstudio.cloud/. Then get logged in. Start a new project. Name the work space appropriately; that is, change “Untitled” to something say “Stat 242”.
If everything is ok, you should see 4 windows on the screen (you may need to resize each of them to get a feel). Otherwise, contact me or watch a nice video: https://www.youtube.com/watch?v=SFpzr21Pavg&t=8s
Suppose you are staring the RStudio windows. Now do the following:
In the template, lines 21-23 are something called a code chunk, where “r” indicates that the coding language is R. RStudio allows you to run other languages. The phrase “two-plus-two” is called the label of the code chunk, and you could rename it as anything relevant to what the code in the chunk does. The option “exercise=TRUE” allows an interaction between the server and a user; that is, a user can modify the code within the chunk. In line 22, type the code 2 + 2 or anything that is grammatically or syntactically correct. Remove everything below line 24. Click “Run Document” again.
Now, go to https://www.shinyapps.io/. You will need to take less than two minutes to register with the service provider. Now, come back to the output document you saw after clicking “Run Document”. Click the “Eye” icon. You will see your account with shinyapps.io. Choose an appropriate title of at least 4 letters or digits. Click “Publish” to inform the shinyapps server to publish your work online. The process can be seen in the lower-left window of your RStudio screen. Once your work is successfully published, you will have a link to it.
Submission: Submit your R Markdown code along with the link to an appropriate D2L drop box, such as assignment #1, assignment #2, etc.
Grading: Link = 5 points, Markdown = 5 points
Refer to data: https://github.com/washingtonpost/data-police-shootings/releases/download/v0.1/fatal-police-shootings-data.csv.
Read it using the following R code:
link = "https://github.com/washingtonpost/data-police-shootings/releases/download/v0.1/fatal-police-shootings-data.csv"
D=read.csv(link, header = TRUE)
Answer each of the following questions:
Go to this page https://github.com/washingtonpost/data-police-shootings. Write a paragraph introducing the background of the data. This should include the 5 W’s we introduced in class.
Use the code: D[D$state==“MN”,] to subset the data for Minnesota. Rename this data frame “MN”.
Use the code: barplot(sort(table(MN$city), decreasing = TRUE), las = 2) to make a barplot for the shootings in different cities in Minnesota. What are the top 10 cities that have most fatal shootings by a police officer?
As similar code as in part (c) to create a barplot for each of the following variables: manner_of_death, armed, gender, race, signs_of_mental_illness, and threat_level. Write a summary for your findings.
Create a histogram and a boxplot for the age variable. How would you examine the graphs?
A survey of 200 students is selected randomly on a large university campus. They are asked if they use a laptop in class to take notes. The result of the survey is that 70 of the 200 students responded “yes.”
Use the R function “prop.test” to find a 90% confidence interval for the proportion of all students in the university who would take notes using a laptop.
Interpret your result in the context of the problem.
Use the R function “prop.test” to test the hypothesis that the proportion of all students in the university who would take notes using a laptop is below 40%. Define, in the context, the parameter with notation. Specify the null and alternative hypotheses using the notation.
Assuming the significance level is 0.05, interpret your result in the context of the problem.
You can also use an app I wrote (https://sjzhang.shinyapps.io/Statistics/) to check your result. Your result can be slightly different from the app, since they use different methods.
A marketing manager surveyed a random sample of 30 customers regarding their expense on food last month.
Here are the responses (in $) by the 30 customers:
732, 842, 931, 1023, 561, 967, 783, 450, 1247, 380, 440, 691, 985, 1039, 1402, 475, 301, 684, 970, 1035, 880, 620, 940, 1136, 1205, 438, 750, 639, 1058, 771
Find a 95% confidence interval for the mean expense (in $) in last month for all customers.
Interpret the result in the context.
Test, at significance level 0.05, whether the mean expense (in $) in last month for all customers is greater than $800. Specify the null and alternative hypotheses.
Summarize your results in one or two sentences.
You can also use an app I wrote (https://sjzhang.shinyapps.io/Statistics/) to check your result.
In a survey conducted by the Gallup organization September 6-9, 2012, 1,017 adults were asked “In general, how much trust and confidence do you have in the mass media - such as newspapers, TV, and radio - when it comes to reporting the news fully, accurately, and fairly?” The results are summarized in the provided table.
Great deal of confidence: 81 Fair amount of confidence: 325 Not very much confidence: 397 Not confident at all: 214
We are interested in testing whether or not the four responses are equally likely. Conduct an appropriate test using RStudio.
Specify the null hypothesis and the alternative hypothesis.
Use the R function “chisq.test” to do the test.
Interpret your result in the context.
You can check your result with: https://sjzhang.shinyapps.io/Statistics/
A public opinion poll surveyed a simple random sample of 1000 voters. Respondents were classified by gender (male or female) and by voting preference (Republican, Democrat, or Independent). Results are shown in the contingency table below.
Gender | Republican | Democratic | Independence |
---|---|---|---|
Male | 200 | 150 | 50 |
Female | 150 | 300 | 50 |
Is there a gender gap? Do the men’s voting preferences differ significantly from the women’s preferences? Use a 0.05 level of significance. Use RStudio to carry out an appropriate test. Interpret your result.
Specify the null hypothesis and the alternative hypothesis.
Use the R function “chisq.test” to do the test.
Interpret your result in the context.
You can check your result with: https://sjzhang.shinyapps.io/Statistics/
Refer to the “HomesForSale” data from http://www.lock5stat.com/datapage.html. The data can be read remotely by the R code:
where D is the data frame you can work on.
Use RStudio to fit a multiple regression model with Price as the response variable and State, Size, Beds, and Baths as explanatory variables. Which variables are significant? Note: R will create indicator/dummy variables automatically for an exploratory variable that is categorical. If any of the dummy variable is significant, we treat the original categorical variable as significant.
Remove non-significant variables if any and refit your model.
Write a separate model for each of the 4 states.
Write one or two sentences to summarize your result.
You can check your result with: https://sjzhang.shinyapps.io/Statistics/
You need to submit this assignment #5 separately from other assignments to D2L.
Watch the video: https://www.youtube.com/watch?v=zsisAyzLzDs first. You will need to do this part using the Excel method.
Use the Yahoo Finance weekly stock data (https://finance.yahoo.com/quote/FB/history?p=FB, the column “Adj Close”, between the two dates 12/31/2017 and 12/04/2021) for Facebook,
Calculate the 5-week simple moving averages.
Make a forecast for the next week after 12/04/2021.
Plot the original data and the moving averages on the sample graph.
Summarize your findings in a few sentences.
Note: you need to save your data first.
Watch the video: https://www.youtube.com/watch?v=IQizXdouK84 first. You will need to do this part using the Excel method.
Use the same data as in Question (1).
Calculate the smoothed values by the exponential smoothing method. Choose alpha = 0.3 and the initial smoothed value to be the average of the first 6 values.
Calculate the smoothed values by the exponential smoothing method. Choose alpha = 0.8 and the initial smoothed value to be the average of the first 6 values.
Plot values in (a) and (b) along with the original values in one graph.
Summarize your findings in a few sentences.