Module 18: EUBON Data Mobilisation Examples: Packaging DINA

Background   

Elton Room, 14:00 - 16:00

Module 18: Data Mobilisation
Examples: DINA, Test Sites, CS
Gateway, IPT, taxonomic backbones
[WP1,5]

Viewing modes for this presentation

'f' enable fullscreen mode
'w' toggle widescreen mode
'o' enable overview mode
'h' enable code highlight mode
'p' show presenter notes

Packaging a data mobilization platform for the web

What is the DINA system?

The DINA system is a modular open source data mobilization platform for the web, currently in alpha, built using OSS tools such as git, jenkins (CI), maven, JMeter, bitnami etc:

Backend / Data: REST-style APIs
- implemented in Java, Python
Frontend / UI: Web UIs with uniform look
- bootstrap for shared style across devices
Import / Export: flexible data migration workflows for data wranglers
- using open source toolchains such as Open Refine, Python, R

System-level meta components

The Three Musketeers in WP 1.4: JACQ, Pluto-F and DINA-Web

JACQ - a virtual herbaria providing a botanical collections management web-oriented system
- based on Yii framework (php)
Pluto-F taxonomy module
- based on Django Rest framework for backend and Ember JS framework for frontend
DINA-Web - an open source web-centric collections management system based on the Specify data model
- modular implementation, using primarily Java 2 EE stack

How to simplify deployments?

Packaging components into a turn-key ready system is important to avoid too high barriers to entry for usage in terms of complicated installation procedures and configurations. Modules in the DINA system have vastly different system requirements, for example:

some require a Java 2 Enterprise Edition friendly technology stack
other require Python / Django + Ember
other require Yii Framework etc

The environment running the services needs to satisfy all system dependencies at various levels of the stack and must be able to be turned into a turn-key appliance. This is a system integration task.

Packaging the DINA system

Packaging the DINA system aims at assembling all DINA modules into one system. We have made significant progress for towards these goals:

higher-level deployment packages
enable turn-key deployments on open source platforms
simplify integration projects into existing systems
support virtualization and generation of virtual appliances

We capture configuration details and automate all settings in a reproducable way, using simple and understandable text files. This is similar to how we work with source code for lower level software components.

Creating reproducible system integration projects

Tools like Vagrant and Docker and Ansible allows DevOps users to create servers quickly in the form of one Virtual Machine or virtual disk image that can be deployed using hypervisors like Virtual Box or VMWare on the local network (intranet or extranet) or even directly to the cloud.

With all required integration details captured for all components, for example in a "system integration project" where all complete steps are outlined in detail for installation procedures along with all steps to get required dependencies, data and modules, it suddenly becomes possible to bring these components closer together and for example run them together on the same platform in a reproducible and deterministic way.

Reference implementations are online

Links to sources and packages / integration projects at GitHub:

JACQ http://sourceforge.net/projects/jacq/ with instructions
Pluto-F Taxonomy Module https://github.com/DINA-Web/dw-taxonomy
DINA-Web https://www.github.com/DINA-Web/server-vm

Live URLs for reference implementations:

JACQ https://herbarium.botanik.univie.ac.at/herbarium-wu/login.php
Pluto-F Taxonomy Module https://taxonomy.plutof.ut.ee
DINA-Web https://dina-web.net

Future - DINA "black box" deployment

Requiring a lot from the hardware is often an easy way out but can encourage throwing money at problems rather than avoiding them altogether or providing the proper engineering.

Challenge Explanation

The challenge: Supporting a "least common denominator hardware platform" - 35 USD Raspberry Pi 2 hardware with Debian … providing a portable, reproducable packaging that that can deploy and run the DINA system with JACQ/DINA-Web/Pluto-F on a low-end, cheap, standardized server - in order to provide a common frame of reference for testing and comparison and to ensure good performance and cost-effectiveness for larger deployments and to avoid unneccessary bloat.

Challenge	Explanation
	The challenge: Supporting a "least common denominator hardware platform" - 35 USD Raspberry Pi 2 hardware with Debian … providing a portable, reproducable packaging that that can deploy and run the DINA system with `JACQ/DINA-Web/Pluto-F` on a low-end, cheap, standardized server - in order to provide a common frame of reference for testing and comparison and to ensure good performance and cost-effectiveness for larger deployments and to avoid unneccessary bloat.

Details and background to packagings

Details: DINA-Web packaging

An alpha version of the DINA Web-system has been packaged as an integration project at https://www.github.com/DINA-Web/server-vm.

By retrieving this project and running it using the Vagrant toolset, the DINA-Web system will be created from scratch with all dependencies and established as a virtual machine running under VirtualBox.

The requirement to do this in any existing system is to have git, Vagrant and VirtualBox available (all license free open source tools). The integration project will pull in relevant modules and datasets.

Modules in DINA-Web

With regards to the modules, the alpha version of DINA Web consist of several web applications modules: an inventory client, a central loan form for loan requests, the DNA-Key application, Naturarv etc. These modules are available here: https://github.com/DINA-Web/modules

With regards to the datasets, the alpha version of DINA Web makes use a CC0 licensed dataset available from here: https://github.com/DINA-Web/datasets

For the collections management part it is possible to use Specify 6 thick client or Specify 7 as front-end, since the Specify data model is used in the backend. This software is available as a virtual appliance from here

Details: JACQ package

The JACQ system has been packaged in in Open Virtualization Format (OVF) – which runs both in VirtualBox and VMWare - and can be downloaded in a 64-bit Debian package (~ 580 MB) and in a 32-bit variant approximately equally large.

Sources and instructions related to deploying and using JACQ can be found here:

Details: Pluto-F Taxonomy Module package

The Pluto-F taxonomy module is an application for working with taxon names (adding, editing, linking) and multiple classifications (adding, editing, cloning). It provides RESTful API written in Django REST Framework, and is using PostgreSQL as an underlying DBMS.

A reference implementation is running online in a demo environment and can be used for testing at: - https://taxonomy.plutof.ut.ee/ (login required)

Access to the taxonomy module source code repository in GitHub (https://github.com/TU-NHM) can be provided to interested parties.