Goals
This week my goal was to finish all of the coding tutorials and i have done it!! I found the last tutorials so much more engaging because I understood the link between the content and the reproducibility task.
Successes
I started work on the assignment this week and specifically worked on reproducing the descriptive statistics of Means and Standard Deviations for the first and second table. I did this using “summarise(mean(variable), sd(variable))”. All descriptives for table 1 and 2 were correct except the sd for assumed immunity in those who think they have had covid. Whilst it was very exciting to find something, the difference was only 0.01. I found an error in the variable of correctly identifying symptoms of covid. THis discrepancy was larger and potentially explained in the paper (explained below). I also discovered that there is 1072 not applicable data pieces. This was luckily only in the demographic descriptives which are not important to the overall results and conclusions fo the study however it will be important to remember at the end when we each choose something to investigate.
In the coding tutorials I:
- learnt to mutate - for when variables have different scales (for example create a ranking variable)
- learnt to bind - this makes coding much quicker, becuase you don’t have to keep retyping the same thing
- pivot - move wide data to long data (pivot_longer) and long data to wide data (pivot_wider)
The biggest success was getting to the end of the coding tutorials with the correct output!!
##Challenges I found it difficult to get the descriptives that i reproduced for each individual variable in a tibble. When put together to create a long tibble I was having lots of error messages. Bart found how to do this and helped me realise my errors were in the mutate function. I am challenged by the graph in our article as the percentages reported are not written in a table or descriptive paragraphs. I will have to do some trial and error to figure out where these percentages come from…
What Next?
Next I will continue to use my coding skills to progress on the assignment. I will now work on getting the percentages reported in the graph from some trial and error mutating of the descriptives given in the tables. Then I will get the percentages for the data that was wrong AND THEN I will put this reproduced data into a graph. With the constant help and back and forth questions with group members.
CODING QUESTIONS FROM GROUP 4:
- We have a graph in our article (Smith - The impact of believing you have had COVID-19 on self-reported behavior). We are a bit confused with the scale of this graph as the percentages reported aren’t written in the paper as descriptive. Do we just need to do some trial and error/guess work to try and get percentages that ‘look similar’ to those in the graph?
- We got different results for the descriptives in table 3 under those that did not correctly identify cough and fever. We got no = 1729 and yes = 788 which is not the results in the paper. They make the disclaimer that those who answer “I don’t know” were coded as incorrect. Because our values are more than what is in their table, perhaps they didn’t end up including those people in their analysis. Are we correct to say this is an error in reproducibility as they didn’t follow their analysis. And should we create our graph/table with the correct values that we found.