class: center, middle, inverse, title-slide # Data Visualisation - Introduction ### Eugene ### February 03 2020 --- class: center, inverse # Course Contents --- 1. System Configuration - installing software 2. Using RStudio 3. Introduction to R 4. Getting and Cleaning Data 5. Exploratory Analysis - making rough plots 6. Different Types of Plots 7. Playing with Aesthetics 8. Using Plotting Themes 9. Advanced Topics - Maps, Networks --- ## Why We're Here - Alternative to Excel - Enables Reproducible Research - Can Make Lots of Plots Quickly - Good for Exploratory Analysis - Publication Ready Figures --- ## And.... a gateway to so much more - data capture - statistical analysis - machine learning - artificial intelligence - writing your thesis - writing a blog --- ## Not Why We're Here - Won't discuss choices for data presentation - Nor good practices in visualisations - but these are sort of in the background - This isn't a machine learning course - but lots of the techniques we'll use are relevant - So, this course it about skills development, how you use these is up to you. --- ## We said we wouldn't discuss this....but - Graphics are important, overlooked, and inconsistent - Need to tell a story - Can be misleading, almost always by accident - Choice of colours - we'll spend some time on this - Choice of fonts - Keep it simple - reduce amount of ink - Increasing number of options for showcasing your data --- <img src="01_introduction_files/figure-html/bar_plot-1.png" style="display: block; margin: auto;" /> --- <img src="01_introduction_files/figure-html/line_plot-1.png" style="display: block; margin: auto;" /> --- <img src="01_introduction_files/figure-html/climate_plot-1.png" style="display: block; margin: auto;" /> --- class: center, inverse # Let's Begin --- ## Install R and RStudio --- ### *R* - Go to [*CRAN*](https://cran.r-project.org/) ### *RStudio* - this is the IDE we will use (and pretty much everyone else uses) - R is the engine, RStudio is the cockpit - download from [*RStudio*](https://rstudio.com/products/rstudio/download/) --- ## Using RStudio - toolbar across the top - I don't use this very much - set of quick links below that - top left (green plus sign) is about the only one I use - 4 Panes - top left for files or looking at data - bottom left for the console - top right for *Environment* - tells what variables are stored - bottom right for plots and help --- <img src="images/rstudio.PNG" height="600px" width="800px" align="center"/> --- - usual work flow is: - try commands out at the console (bottom left) - when that works, store them in a file (top left) - when sequence of commands works, put them into a document (also top left) --- ## Extending R - installing R just gives you *base* R - beauty of this tool lies with *packages* - we'll look at installing these from three sources: - CRAN - Bioconductor - github --- - [CRAN](https://cran.r-project.org/) - example, on console type *install.packages("tidyverse")* - this installs the tidyverse package (or rather, family of packages) - over 20k packages on CRAN (see list [here](http://cran.nexr.com/web/packages/available_packages_by_name.html)) - sometimes esoteric ([engsoccerdata](http://cran.nexr.com/web/packages/engsoccerdata/index.html)) - sometimes cutting edge ([deep learning](http://cran.nexr.com/web/packages/keras/index.html)) - each package heavily curated and maintained --- - [Bioconductor](www.bioconductor.org) - set of bioinformatics packages (lots of genomics) - start with *install.packages("BiocManager")* - then *install("some_genomics_package")* to use - list of packages [here](http://bioconductor.org/packages/release/BiocViews.html) - about 3,000 packages, including genome builds --- - github - packages in development - start with *install.packages("devtools")* - then *install_github("developer_name/package_name")* - almost 80k packages [here](http://rpkg.gepuro.net/) - the package *githubinstall* is useful to search these --- background-position: center background-size: contain class: center, inverse # Resources --- - books - *recommended text* **Data Visualization** by Kieran Healy (ISBN = 978-0691181622). ~€25. Also online at [https://socviz.co/index.html](https://socviz.co/index.html) - [Hadley's book, R for Data Science](https://r4ds.had.co.nz/) - [Data Visualization by Wilke](https://serialmentor.com/dataviz/), lots of his actual code is on github at [https://github.com/clauswilke/practical_ggplot2](https://github.com/clauswilke/practical_ggplot2) <br/> <br/> <br/> <img src="images/hadley.jpg" height="100px" width="100px" align="right"/> --- - websites - Karl Broman (https://www.biostat.wisc.edu/~kbroman/), and particularly [this presentation](https://www.biostat.wisc.edu/~kbroman/presentations/graphs_MDPhD2014.pdf) - course by Boemhke on github [github.com/uc-r/Intro-R](https://github.com/uc-r/Intro-R) - the good people at RStudio have lots of help at [resources.rstudio.com/](https://resources.rstudio.com/) - [Cedric](https://cedricscherer.netlify.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/). <br/> <br/> <br/> <img src="https://github.com/yihui/xaringan/releases/download/v0.0.2/karl-moustache.jpg" height="80px" width="100px" align="right"/> --- - Blogs and Podcasts - [www.simplystatistics.org](www.simplystatistics.org) - [varianceexplained.org](http://varianceexplained.org/) - [Not So Standard Deviations](http://nssdeviations.com/) <br/> <br/> <br/> <br/> <br/> <br/> <br/> <img src="images/hillary.jpeg" height="100px" width="100px" align="right"/> --- - Online Courses - Coursera: [Data Science from Johns Hopkins](https://www.coursera.org/specializations/jhu-data-science). The course notes are on [github](http://datasciencespecialization.github.io/) - edx.org [course from Irizarry](https://www.edx.org/course/data-science-visualization) - [datacamp](www.datacamp.com) <br/> <br/> <br/> <br/> <br/> <br/> <img src="images/rafael.jpg" height="100px" width="100px" align="right"/> --- - Miscellaneous - [Dublin R MeetUp](https://www.meetup.com/DublinR/) - [RWeekly.org](rweekly.org), round up of events in the world of R - [#TidyTuesday](https://twitter.com/search?q=%23TidyTuesday&src=typeahead_click) on twitter - [R Cheatsheets](https://rstudio.com/resources/cheatsheets/) - if you get stuck, google is your friend. Often sends you to stackoverflow.com or stackexchange.com --- - Some stuff about graphics in general - [again, from Irizarry](http://genomicsclass.github.io/book/pages/plots_to_avoid.html) - [hit parade of graphs in R](https://www.r-graph-gallery.com/index.html) - [Cedric Scherer again](https://cedricscherer.netlify.com/) - some stuff from [Christian Burkhard](https://ggplot2tor.com/make_any_plot_look_better/make_any_plot_look_better/) - and from [Laura Ellis](https://www.littlemissdata.com/) - and from [Peter Aldhous](http://paldhous.github.io/ucb/2016/dataviz/) - [colours in R](https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/colorPaletteCheatsheet.pdf) - cool book on good graphics from [Stephen Few](https://nces.ed.gov/programs/slds/pdf/08_F_06.pdf) - [The Glamour of Graphics](https://www.williamrchase.com/slides/assets/player/KeynoteDHTMLPlayer.html#0) talk from last months RStudio Conference --- ## What You Have to Do - Do a datacamp course every week - Produce four data graphics - the first three of these will be specified - last one will your chance to express yourself - do one #TidyTuesday challenge --- ## Course Communications - moodle, of course - twitter via #data_vis_2020 - I'll be doing lots of retweets here - but also for general communications - email at eugene.hickey@tudublin.ie - you'll need to create an account at datacamp.com and send me the email you'll use for this account - I can check your progress on datacamp courses --- # Some Example Images --- ``` ## OGR data source with driver: ESRI Shapefile ## Source: "/home/eugene/Desktop/Coursera/Maps/Africa", layer: "Africa" ## with 762 features ## It has 3 fields ``` ``` ## Response [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5918692/bin/NIHMS958804-supplement-Supplementary_Table.xlsx] ## Date: 2020-02-02 20:27 ## Status: 200 ## Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ## Size: 1.27 MB ## <ON DISK> /tmp/RtmpNG1kEk/file4a3e3d5e3481.xlsx ``` --- <img src="01_introduction_files/figure-html/tidytuesday_rollercoaster-1.png" style="display: block; margin: auto;" /> --- <img src="01_introduction_files/figure-html/autism_plot-1.png" style="display: block; margin: auto;" /> --- <img src="01_introduction_files/figure-html/tb_plot-1.gif" style="display: block; margin: auto;" /> --- <img src="01_introduction_files/figure-html/knotweed-1.png" style="display: block; margin: auto;" /> --- <img src="01_introduction_files/figure-html/sz_brain-1.png" style="display: block; margin: auto;" /> --- <img src="01_introduction_files/figure-html/tweets-1.png" style="display: block; margin: auto;" />