class: center, middle, inverse, title-slide # R ## What is it and why should you care? ### Steve Crawshaw ### 2020-12-25 --- # What is R? + R is a programming language designed for statistics and data science + It is usually run in a program called R Studio. <https://www.rstudio.com/> <img src="data:image/png;base64,#https://www.r-project.org/logo/Rlogo.png" width="200", height="200" /> --- # What can it do? -- ## Data Science -- + Data Analysis -- + Data Processing -- + Modelling and machine learning -- + Visualisations -- + Reporting -- + Web apps (with Shiny) -- + Mapping - like GIS --- # Why Use R? -- ## It's Open Source -- + It's free! -- + It's easier to share your data & code -- + Innovations spread quickly -- + *you* can contribute! --- # Why Use R? -- ## It's powerful & flexible -- + You can use R for more than data analysis, including: + creating websites + creating documents that reproduce your analyses + slideshows (including *this one*) + books -- + In R, it is never *if* but *how* --- # Why Use R? -- ## Reduce Errors, Enhance Reproducibility & Transparency + Generate publication quality figures & tables + Create detailed and fully documented scripts showing every step between raw data & a statistic + You can use R to report your analyses, reducing all too common errors in reported statistics see [see Nuijten et al.](https://link.springer.com/article/10.3758/s13428-015-0664-2). --- # Why Use R? -- ## It's Efficient + Once you get used to it and start using R, it saves you time in the long run. + Scripts make re-using past work or using others' work as a starting point much easier. + Typing scripts is much faster than clicking through menus, *especially* after you get the hang of keyboard shortcuts + It runs faster and is less bloated than GUI stat software (e.g., SPSS) --- # What are the alternatives? -- ## Excel + Does not separate data from code + Limited to 1m lines + Slow + Very prone to human error + See <https://blogs.oracle.com/smb/10-of-the-costliest-spreadsheet-boo-boos-in-history> -- ## Python -- + General purpose language + Used for data science + R has more libraries (packages) for specialised fields like Air Quality(!) --- # Alternatives ## Business Intelligence Products -- + Power BI + Microsoft - O365 + BCC uses it + Modelling and ML capabilities are limited -- + Tableau + Expensive + Similar limitations on modelling and ML --- # But: ## R is difficult to learn! <img src="data:image/png;base64,#https://bcullen.rbind.io/post/2020-10-19-teaching-an-r-bootcamp-remotely/r_rollercoaster.png" height=424 width = 720 /> --- # Examples of using R in BCC -- + **Using the Open Data API to publish ratified data** + *Extract* - connect to corporate database and National AQ site + *Transform* - reshape and process data + *Load* - POST data to remote server -- + **<a href="S://SUSTAIN//EnvQual//Air_Quality//Projects//R%20Projects//airquality_GIT//covid-19-air_quality_Bristol_REPORT.html" target="_blank">Regular updates to AQ Board on COVID and air quality</a>** + Uses machine learning to "De - weather" air quality measurements + Analysis is updated with new data automatically every month + Graphics saved as files to drop into PPT -- + **<a href="S://SUSTAIN//EnvQual//Air_Quality//Projects//R%20Projects//tubes//asr_tubes_tables.html" target="_blank">Annual reporting of air quality to Defra (QA process)</a>** + Streamline statutory workflow and add QA steps + Carries out multiple adjustment calculations and derives KPI's + Generates pre - formatted tables to drop into Defra report -- + **<a href="S://SUSTAIN//EnvQual//Air_Quality//Projects//R%20Projects//dashboard_demo//short_report_ods.html" target="_blank">Short report on air quality for public</a>** + Summarises AQ data in short format + Compelling and interactive graphics with HTML and javascript --- # Observations and next steps -- + Data science is transforming business -- + Will this be true for local authorities? -- + What insights might be possible with your / our data? -- + Learning about ML and statistical modelling (thanks!) --- # **Questions?**