\(~\)
In lieu of class this week, I put together an assignment meant to test everything we have learned so far, as well as provide some opportunities to apply your programming skills outside of what we have covered. This assessment is by no means necessary but I highly recommend completing it. Take notes of any topics you are stuck or confused on and let me know in the feedback form at the end of the assignment. On Friday, 2/19, I will post the “answers” on the dropbox and the LCAM server.
\(~\) \(~\)
Open RStudio and create a new R script. Name your script “Class5Assessment” and save it somewhere on your computer that you can easily navigate to in the future. I suggest creating a folder on your computer’s desktop called “R Files” or something like that and saving all your work there.
Start your script with a comment containing the date, your name, and a brief description of what the script is.
It is a best practice to put all your import statements at the very top of your script, regardless of where you use them later in the code. In R, “import statements” are the library( ) functions that you use to load the packages. To keep very tidy code, it is also considered a best practice to differentiate the chunk of code containing the import statements from the rest of the script in some way. I do this using comments similar to the ones shown below. Recreate the comments below in your script.
############## Library Functions #####################
######################################################Next, you will need to install the package that contains the dataset used in this exercise. The package is called survival. In the RStudio console install the package using the appropriate command/function. 1 If you get any orange warning messages, don’t worry about it. R is just informing you that some of the functions you are loading in the survival package might be overriding functions in other packages. You are not using those other packages right now, so the warnings don’t matter.
There are many datasets/dataframes in the survival package. Try using the help command ? to get more information about the package. What happens, or what message do you get? Read the message in the console and try the suggested command. What happens then?
R is a free, open-source programming language, meaning nothing is proprietary and anyone (including you!) can create and contribute to R packages. This often creates inconsistencies in the level of documentation available across the different packages. Some people put a lot time into creating clear and easy to read documentation with well laid out instructions and examples. But, as you can see here, many people don’t. The help command couldn’t find any documentation, and the ?? function seems to just generally search R, but you don’t want to sift through all that. Luckily there is a command that can be useful in this situation. The command is library(help = "packagename"). In the console, use this function to find more documentation on the datasets included in the package.
Now you can see the full list of dataframes/datasets available. In this assessment you will use the dataframe diabetic. In the console, use the help command to try and get more information about the dataframe. What message do you get?
You probably received and error that there is no documentation for diabetic. That could be, but you also skipped a crucial step. Spend a few minutes trying to figure out what went wrong, and see if you can fix it. This is a great opportunity to challenge yourself to apply what you know to a troubleshooting situation in coding.
If you couldn’t seem to find the solution, don’t worry. And if you figured it out, it’s best to double check. Getting this step right is important for the rest of the assessment, so find or double check your answer in this footnote. 2
Think about what you have done so far in the console vs. in your script. Think about why it is better to run some lines of code in the console as opposed to adding them directly into you script. What would happen if you had added all the code to install and view the package documentation to your script as opposed to running it in the console? What would happen if you tried to run that script later?
Now, use the help command to get more information about the dataframe diabetic. Specifically, look in the format section for a list and description of the different columns of the data frame. Below this section, there is a details section that outlines some background information on the data itself. Read this as well.
A better way to get a sense of the data is to view it. In the console, use the View( ) function to view the diabetic dataframe. Functions are case sensitive, so make sure to pay attention to how the function names are spelled. Try view(). Does it work?
Focus specifically on the age and risk columns. Refer to the information about the dataframe available with the help function. What does the risk column represent?
It might be interesting to know how many times a xenon laser was used versus an argon laser. The table( ) function is useful when counting repeated appearances of something within a vector. In your script, and beneath the import statements, create a “section header” using a comment to describe the dataframe you are working with. Then, isolate the laser column and use the table( ) function to count how many times each laser was used. Remember to highlight the line and click “run” to run code within a script.
Next, isolate the age column and use the hist( ) function to create a histogram of the study participants ages. Then, do the same with the risk column.
The main title and axis titles of the plots are kinda ugly. Refer to the class 5 plotting exercise available on the dropbox/LCAM server and add better titles and axis labels to the histograms.
Add the breaks = n argument, where n is the number of desired histogram cells, to set the number of cells in the age histogram equal to 25. Don’t forget to separate all arguments in the hist( ) function with a comma.
Change the fill and border color of your age histogram with the argument col = "color" and border = "color". Refer to the class 5 assignment or try googling “colors in base R” to see if you can find a list of colors available. 90% of programming ends up being learning how to google the right questions, and how to interpret the answers. A lot of times people feel overwhelmed trying to google a programming question because the answers make no sense. The internet is essentially just a giant programming public forum, and everyone has a different level of programming experience. If the webpage doesn’t make any sense, it is probably meant for someone with more background than you. Try another result until you find one with more explanation that you can understand. 3
For an extra challenge, refer to the Class 5 assignment and use the wesanderson package to assign different colors to the histograms. Remember to include your import statement at the top of the script with the other library() functions. What happens when you set n equal to 5?
Lastly, use the plot() function to plot different variables against each other. Is any of the information gained from the plots useful? This is an example of how the plot() function is nice for just quickly looking at things so you get a better sense of where to focus your time.
\(~\) \(~\)
Please take the time to fill out this form: [https://forms.office.com/Pages/ResponsePage.aspx?id=H9sOck5cQ0CBQSFKY6fq1VywYto3iYJNroYc2G03X_VUMThIQVFKTzJKSVpHNEw3R05CMkVGRFpQMC4u] It just asks if there are any steps of the assessment that you were confused about and if there is any other feedback you haveß
install.packages("PackageName")↩︎
To view information about the contents of a package, the package needs to be loaded first. In your script load the survival package in the section of your code reserved for import statements that you created in Question 3. Next, highlight the line of code and hit “run” at the top right corner of the scripting window.↩︎
https://www.r-graph-gallery.com/42-colors-names.html This page has a graphic containing the names of colors available in base R↩︎