class: center, middle, inverse, title-slide # Toolbox Approach to Ecosystems Research ## Data, software and computers ### 2017-02-26 --- # Toolbox Approach - key aspects [SOKI.aq/display/Data](http://soki.aq/display/Data/Data+Processing+Toolbox) - **R** is our core project-level tool - automation: reproducible outputs, and version control - common data library: many complex, varied, some large data, regularly updated - software tools: re-useable modules, built on standard tools and libraries - the research cloud: computers for individuals, sub-groups - community outreach and education with the **AAD** **ACE CRC**, **CMAR** via the **Research Bazaar** and **Software Carpentry**. Involves community groups local and international: *Atlantis User Community*, *R-Ladies*, *Gentle-R*, *Data Tas*, *Hobart R Users Group*, *Data Science Hobart*, *R-Spatial.org*. --- # Common data library - maps, polygons, lines, images, topography, etc. - time series remote sensing - model output, 2D, 3D, 4D ## **daily** synchronization tools data up to date 1. raw file collection is available 2. build index to the files needed 3. build function to read a file as/when needed 4. insert this into workflow 5. share the function/workflow as wiki, code, package, publication --- # Common software library - core is the R-spatial family - central package **raadtools** for the data library - systems package **raadsync** to build the data library - family of packages as solutions --- # Computing in the cloud - bridges the gap between desktop and clusters - enhances visualization and publishing and the interactive requirements of model development - we deploy toolbox components using facilities we all have access to - it's being made easier as we speak --- # Software development Modularity, orthogonal software components Development space is very active and diverse. https://cran.ms.unimelb.edu.au/web/checks/check_results_mdsumner_at_gmail.com.html https://github.com/mdsumner, https://github.com/raymondben, https://github.com/RTreb, https://github.com/SWotherspoon, https://github.com/AustralianAntarcticDivision, https://github.com/AustralianAntarcticDataCentre, https://github.com/Trackage, https://github.com/r-spatial --- # Software and data management themes * follow [*tidy data principles*](tidyverse.org), aligned with modern **R** standards * **R for Data Science**, Grolemund & Wickham, http://r4ds.had.co.nz/ * common tooling to access data, manipulate and merge data streams, analysis, reporting * access to simple data streams and very complex data streams with a compatible approach * leverage interactive and online tools where valuable, integrate with automated workflows --- # Examples * **daily extractions for K-Axis** from the data library with **raadtools** were sent to the voyage, adapting specific requirements day to day * **Habitat Assessment** planning, documentation, data summary and analyses, outputs and report generation, via *Github*, [SOKI.aq](http://soki.aq/display/Data/Habitat+Assessment), and *Authorea* * **Atlantis Spatial Models** collected by the community [within a package](http://australianantarcticdivision.github.io/rbgm/articles/BGM_examplefiles.html), enabling collaboration --- # Current topics - **topology** in *spatially-explicit models* is relevant to many groups, including CCAMLR, Atlantis, SOOS and ROMS - **curvilinear structured grids** are relevant to ocean colour processing, ROMS, CMIP5/6 data access - **networks** we have emerging tools for networks, there's a push from the *tidyverse** to build a strong R framework for network, graph, relational data - **temporal data collection** needs attention (voyage data, animal tracking data, interactive web tools ) - improved support between AAD and ACE compute needs with TPAC - new model for **raadtools**, easier to set up - improving and extending Software Carpentry materials, integration with Ecosystems and QMS - Reproducibility, interactive model creation, interactive visualzations, integrated web publishing.