Whatever new Research, Science and Innovation infrastructure emerges, collaboration needs to extend well beyond that of “Science New Zealand.” Current structures divide up the public good scientific enterprise in ways that overly reflect industry sector interests when Crown Research Institutes were set up in 1992. Industry sector interests would now divide up differently, with greater changes in prospect in the next decade and beyond. The division of work by industry sector has at the same time fragmented the skill base, creating obstacles to shared learning, to transfer of skills, to cooperation, and to sharing the benefits of lessons learned and experience gained.
At Mount Albert Research Centre where I was at the time, we lost benefits of a shared library. For my area of work, we had to rely on our personal libraries.1 It would have made sense to negotiate access to University library facilities. Also unfortunate was that what had been a common tearoom ceased to be a meeting place for scientists at the centre. There may well have been gains from the freedom that individual CRIs gained to pursue their own path. Important international links have been established. Joint appointments have worked to create channels of communication with universities.
A 2018 seminar on “retooling primary health care for the 21st Century,” impressed me with the detail of the scrutiny now being given to the effects of the organisation of patient care services on achieving good patient outcomes, and on costs.2 A comparable level of scrutiny, albeit addressing very different issues, should be paid to the processes by which research organisations (the organizations, not the scientists who work for them) organize their work for scientific outcomes. Measures must be more incisive and focused on the public interest than standard forms of feedback from commercial or government customers, and more reliable than common measures of research quality.3
Other needed measures, with progress regularly reported, include:
An over-riding concern is that all research, both in science organisations and in Government, should be seen and reported as part of an ongoing historical process. More than specific scientific results are at stake — all worthwhile research adds to an ongoing body of data, of resources, and of scientific understanding. Data, and research results, are taonga that future generations should be able to use with some reasonable confidence as a points of departure for their own research.
Areas of expertise for which there is a demand across several CRIs would in many cases be better shared within a dedicated agency, where a strong focus can be placed on maintaining a critical mass – for recruiting, for mentoring, for skill development, and for provision of advice. It is in any case wasteful for multiple agencies to duplicate the effort and inputs required to develop resources and skills that would better be developed jointly, with a better result.
Projects proceed in a stop-start manner will often need to rely on consultants who are brought in for the duration of the project. This increases the risk that skills and organisational learning will have dissipated when and if closely work is taken up at some later time. I judge this to be a particular issue in biosecurity. As with Covid-19 and other human pathogens, it is important to move beyond ad hoc responses to issues as they arise to preparing for new challenges that will arise in the future. See Note 1.
Biosecurity shares challenges that are common with those for human pathogens: in maintaining records, contact tracing (e.g., for farm animals), and modeling spread of disease. Connections with human health issues surely warrant careful attention, especially as in the case of animal health issues such as created by Mycoplasma bovis.
Features of the changing scene are new data sources, often giving very large datasets, automated “machine learning” type mechanisms for processing data, and advances in the tools available for statistical modeling. It has, unfortunately, become easier than ever to identify spurious as well as real associations, deceiving untrained human intuition in ways such as are documented in Kahneman (2011).4 Experts in a specialist area may, because of gaps in their understanding, make serious mistakes of judgment. Sally Clark’s wrongful 1999 conviction in the UK for the murder of her two children, overturned only after her life had been ruined by three years in prison, is an extreme example of what can go wrong.5
Managers need to understand these issues well enough to ensure that, as need may arise, they seek comment and/or help from suitably skilled advisers. Attention to these issues becomes more than ever important as tasks that were previously handled manually are automated.
Since the latter part of the last century. there has been a huge expansion, in many research areas, in the databases that are available to researchers, making possible advances that would not otherwise be possible. Areas where this is particularly obvious include climate science, earthquake science, geology more generally, and molecular biology. In molecular biology, they have had a pivotal role in the technology that allowed a much more effective response to the Covid-19 pandemic than would have been possible two decades earlier. The 2022 annual Nucleic Acids Research database issue (Rigden and Fernández 2022) notes 1645 database entries in its online collection. Seven of the 89 new databases listed related to Covid-19 and the Covid-19 virus.
Areas where the gains that stand to be made are less obvious have not availed themselves of the new opportunities. Making data publicly available places a strong discipline on those who create the data, and helps ensure appropriate documentation. It opens it up to wider use, where PhD and post-doc students can expose it to methodology that they may be in the process of developing, and can look for features that may have been missed or misrepresented in the published analysis, or that earlier modeling software was unable to handle well. It increases the chances that data will be preserved in a form that can be used by posterity. See further, Note 2.
Data that is in the public domain will from time to time attract the attention of “citizen science” data analysts. Depending on the nature of the data and on the background information needed to use it effectively, such data will from time to time find its way into the hands of analysts who have the skills needed to do a really effective analysis job. Another model that can work well in some contexts comes from the Kaggle7 organisation’s success in making predictive modeling, with data provided publicly, a commercial enterprise.
Issues that relate to the use of commercial databases, with claims of commercial sensitivity used as reason for not making data available for any external check, came into strong focus when a May 2020 Lancet article appeared where results were stated to to be based on data from 700 hospitals across six continents that been provided by the healthcare company Surgispere. The article quickly attracted a letter from 100 scientists worldwide that raised a number of methodological and data integrity, and ethical concerns. The article was retracted two weeks after publication. Note 3 has further details.
Where possible, all relevant data that bear on an issue under investigation should be brought together, from international as well as from New Zealand sources. In clinical medicine, the use of a “meta-analysis” to bring together evidence from multiple studies has become common practice. These can be much more effective if the data on which the studies were based is available.
Reproducible reporting, so that all code used to handle analyses and produce figures and graphs is available to allow others to check and/or repeat or update what has been done, although a harder ask, should be mandated, albeit with access limited if there are over-riding confidentially concerns. There are large benefits – updates to reports or papers are straightforward, with minimal risk that new errors will be introduced. It makes it more likely that errors or omissions will be identified. See Note 4
An obituary for glaciologist Trevor Chinn (d. 20 December 2018) drew attention to his meticulous work in recording and keeping together data on Southern Alpine glaciers that would otherwise have been lost in the course of successive public sector restructures.8 In other cases, without a Trevor Chinn to maintain the data intact, important data has been lost in the restructuring and downsizing of government agencies has led to serious losses of historical data, seriously compromising current and future work for which it would have added important insight. Paper resources have often been “recycled.” It is ironic that, while the National library has responsibility for “collecting, preserving, and protecting documents” relating to New Zealand, and for making them available, there is no body that has any comparable responsibility for maintaining collections of historical scientific data. Note also the work that the Department of Statistics does in making its data publicly available.
In GNS and in NIWA, there has been good work in setting up databases, with much of the data publicly available. Why does this not happen more widely in science? It has to be emphasized that databases have to be maintained in ways that ensure continuing access as technology and the demands on them change.
The needed resource(s) would be best shared across science agencies, giving at least limited protection against the losses of data and historical records that have in the past accompanied restructuring of public sector agencies. Genomics for Aotearoa New Zealand (GFANZ)9, set up to facilitate the sharing of genomics data between New Zealand researchers, may be a useful model for what is needed more widely.
Biological scientists are among those who will commonly not be comfortable moving data from the Excel spreadsheets to which they are accustomed to the style of database needed for long-term storage (though this is changing with a new generation of graduates.) Specialists may be required who can take over the work, or at least help.
Reports internationally on the extent to which the majority (in some cases, the great majority) of published experimental studies have proved irreproducible make a clear case for paying much better attention than in the recent past to the dependability of published work. Published work that is not replicable wastes the time and resources of those who try to build on it. It is then a serious concern that in studies of published results from laboratory experiments have shown reproducibility rates that have commonly been at best around 40%, and at worst as low as 12%. Areas covered include pre-clinical medicine, psychology, and laboratory economics. See Note 6. A recently concluded study that attempted to replicate 193 experiments from the 53 “most impactful” cancer biology studies from 2010-2012 was able to replicate only 50 experiments, from just 23 of the 53 papers. Note 7 gives summary details.
The inevitable gap between results from work that has sufficient credibility to warrant further investigation, and well-established results such as those that underpin approvals for the use of vaccines, is not an adequate excuse for very low rates of reproducibility of published work.
Thus, in laboratory studies, refereeing processes have in the recent past done little to ensure scientific credibility. Studies that “simply” try to reproduce the results of others have, in many areas, not been considered for publication. T This needs to change. Insistence on independent replication of laboratory results should be standard practice. P-values or other statistical measures are important as adjuncts to to independent replication, but are not a substitute. Replication places a focus back on all aspects of the experimental process – experimental design, experimental procedure, and the quality of statistical analysis – in ways that no other mechanism can.
In a paper entitled “Cargo-cult statistics and scientific crisis,” Stark and Saltelli (2018), comment:
Statistics was developed to root out error, appraise evidence, quantify uncertainty, and generally to keep us from fooling ourselves. Increasingly often, it is used instead to aid and abet weak science.
There is no lack of work that melds effective use of statistical methodology with strong science. That melding should be the standard for all areas of statistical application. The challenge is to ensure that statistical analysis gives insightful and defensible results, however the contexts for that challenge may change and widen.
The poor quality of experimental design and of statistical analysis in much published work is addressed in scathing terms in Collins and Tabak (2014):
Factors include poor training of researchers in experimental design; increased emphasis on making provocative statements rather than presenting technical details; and publications that do not report basic elements of experimental design. Crucial experimental design elements that are all too frequently ignored include blinding, randomization, replication, sample-size calculation and the effect of sex differences. Exacerbating this situation are the policies and attitudes of funding agencies, academic centres and scientific publishers. . . .
[Folowing a distinguished career in medicine, Francis Collins became Director of the US National Institutes of Health in 2009, a post from he retired at the end of 2021.]
C. G. Begley (2013) commenting on the report in C. Begley and Ellis (2012) that Amgen scientists had been able to replicate 6 only of 53 ‘landmark’ cancer studies, identified very similar issues (“Six red flags”), commenting
What is also remarkable is that many of these flaws were identified and expunged from clinical studies decades ago. In such studies it is now the gold standard to blind investigators, include concurrent controls, rigorously apply statistical tests and analyse all patients — we cannot exclude patients because we do not like their outcomes.
There should be regular reviews of published work and associated online documentation, focusing on issues such as those identified in Collins and Tabak (2014) and C. G. Begley (2013). I am not aware of any overview of the statistical content of published work in New Zealand biological journals, comparable to Maindonald and Cox (1984), that has appeared since that paper. Reviews of this general type do appear in the international medical literature from time to time. See, e.g., Parsons et al. (2012), and papers cited there.
As Collins and Tabak (2014) argue, funding agencies have an important role to play in making scientific processes more scientifically credible. Greater use of expert statistical would help, both in study design and in statistical analysis.
Munafò et al. (2017) is a manifesto for change.10 Proposals are wide-ranging in their implications. There should be pre-registration of study design, primary outcome, and analysis plan. Methodological training and support should be a strong focus. Team collaboration should be encouraged. Reporting guidelines have an obvious role, but will not on their own be enough to address reporting biases. Review processes can and should be extended to include public forms of both pre- and post-publication evaluation and review. Reward structures that look broadly at researchers’ output and focus less on apparent novelty would change researcher behaviour in positive ways. See also the wide-ranging critique in Ritchie (2020).
There is, in at least some areas of public health research, an over-confidence in what can be done using regression approaches on data from observational (“population based”) studies. Claims that are based on such modeling require rigorous critique. The use of new tools for collecting data, in public health and in other areas of society and government, combined with the use of machine learning tools to automate attempts to extract meaning from the data, will open new opportunities for over-interpretation and/or misinterpretation. See Note 8.
It is ironic that both climate science and the science that underpins the safety and effectiveness of the vaccines – areas where widely drawn scrutiny and critique have ensured that standards are high – have been prime targets of sustained attempts to undermine credibility. In climate change especially, work almost inevitably requires co-operation between individuals with different areas of technical expertise who can be expected to look carefully over each other’s work. The safety and effectiveness of Covid-19 vaccines have had extraordinarily high levels of testing and checking, both in clinical trials and in day to day use.
Cases where drug manufacturers has been able to manipulate their way around US Food and Drug Administration requirements destroy trust in pharmaceutical approval processes more generally. There was a huge increase in drug overdose deaths in the United States from 6.1 to 21.6 per 100,000 between 1999 and 2019.11 A major factor was the increased use of prescription opioids. Purdue Pharma stands out for its aggressive marketing of oxycodone, sold under the brand name OxyContin, arguing that concerns over addiction and other dangers from the drugs were overblown.(Kolodny 2020)
The term “artificial intelligence,” used as a catch-all for a variety of types and applications of algorithmically based automation, is unfortunate. It suffers from the same potential as human minds to make mistakes. These reflect, however indirectly, flaws that result from faulty coding and/or issues with the data used. Fry (2018) is a fascinating overview, accessible to the lay reader, with a huge range of what often read like well-told detective stories to illuminate the exposition. Fry comments that:
. . . the hype over AI is a distraction from much more pressing concerns and — I think — much more interesting stories
Currently in view are many different forms of “narrow AI.” The term “machine learning” makes good sense for systems that allow machines to act autonomously – here note robotics systems, and automated guidance systems such as are in use for aircraft and for self-driving cars. These take and use feedback directly from the environment. System failures, albeit with potentially catastrophic results, are directly obvious. As an example of the very serious consequences that may follow, see Note 9.
Contrast such autonomous systems with systems that, relying on data supplied to them, or on data that have been extracted from administrative records, are designed to assist or direct decision-making.12 Great care is needed to ensure that automated systems do not offer much increased scope for missing or ignoring issues with the available data, for allowing faulty analysis to go unrecognized, and for misinterpretation of analysis results. Without access to the data, and without clear explanation of the criteria used, there is no way to expose unfairness that is built into the data used, or manipulation of the data, or mistakes. Failures of human intuition, such as are documented in Kahneman (2011), can readily find their way into automated systems.
In the colourful language of the title of O’Neil (2016), automated systems readily become “Weapons of math destruction.” O’Neil documents issues for systems that control the deployment of police resources, or that determine hiring and firing decisions, or that may be used to drive public health decision-making.
The article Lazer et al. (2014) is an interesting commentary on Google’s attempts, over 2008 - 2013, to use their own algorithms with data collected from the web to predict flu outbreaks. The authors argue that while a statistically informed use of data from the web can usefully supplement other sources of insight, it cannot be an effective replacement for the use of data sources that more directly indicate flu incidence.
Most ‘AI’ practitioners have to date come from a training in computer science, with limited exposure to statistical issues that arise for the collection, analysis and use of data to extract meaningful information, e.g., for setting policy. It may be hoped that the new “data science” courses that are becoming common in statistics and computer science departments will incorporate substantial practically oriented statistical theory and analysis components. See Note 10.
In the attempt to extract meaning from available data, subtleties are readily overlooked. As argued at length in Barrowman (2018), in an article that makes a number of important points, no data is ever totally “raw.” The processes that generated it, and the wider context of scientific understanding, have large implications for the conclusions that it can be used to draw.
In public policy, the available data is, much more commonly than officials recognize, not an adequate substitute for the data that is needed for sound judgment. This point has relevance to measures used to assess the worth of scientific research.13
In assessing the effectiveness of cancer treatment, it is clearly simplistic to assume a direct link from number of apparent cancers found to lives saved. The increased chance of detecting a life-threatening cancer must be set against the increased risk of serious side effects or hastened death from detecting and treating an apparently cancerous tumor that will never cause serious harm.14 As tests are developed that have an increasing ability to detect apparent “cancers,” these considerations can only increase in importance. It becomes increasingly important to educate the public to understand the trade-offs.
Some patient organisations, and some medical specialists, have seemed unwilling to argue the case in terms of balance of risk. In addition to the references just noted, see Note 11. The issues are important, both for effective use of public resources, and for avoiding tests and treatments that are on balance likely to harm patients.
It is strongly in the public interest that scientists have reasonable freedom for responsible expression of their minds on issues of public concern. In an informal 2015 survey, 151 CRI scientists (out of 384 who responded) answered yes to the question “Have you ever been prevented from making a public comment on a controversial issue by your management’s policy, or by fear of losing research funding?” Hon Joyce’s response was an evasion, in effect arguing that as this was not a scientific survey of all CRI scientists (to this extent, true), its evidence of large concern could be ignored. Equally disturbing was the reaction of the NIWA management, suggesting a determination to brush the concerns raised under the carpet.15
A situation where commercial organisations can use the threat of loss of commercial contracts to prevent public comment from those who are best qualified to give it brings serious risk to the body politic. It need not be a contract with the individual expert involved, merely one involving the individual’s organisation.
This issue is crucially important in areas where current manufacturing practices create huge environmental, public health, and other such concerns. Consider, among other concerns: fossil fuel prospecting and use, environmental issues, use and disposal of plastics, and processed food manufacture.
Issues of this type, as they relate to the UK government’s handling of the Covid-19 epidemic, are documented in astonishingly forthright comments in Abbasi (2020).16
Whatever is done, it is important that the several years of chaos that preceded and followed the breakup of DSIR in 1992 are avoided. If some version of the present CRI structure is retained, the RSIs that result should operate within a more cohesive and cooperative organizational structure than Science New Zealand provides. Data analysis, database design and deployment, information technology and IT security, are areas where there should be wide sharing of skills between the new entities. Links into relevant University departments would have the potential to benefit both parties.
Te Pūnaha Matatini rose to the modeling challenge of Covid-19 very effectively – notwithstanding the detailed criticisms that might be made of the models used. Its work might be expanded, or another body set up, with a brief to work with funding agencies to examine critically how research proposals match up against standards such as those set out in Collins and Tabak (2014) and munafoEtAl_2017. Its brief would extend to reviews of published studies and of reports arising from work undertaken by New Zealand agencies, with a role also in ensuring that scientists in these have access to high quality advisory services.
Funding needs to put a high priority on continuity of work, especially where environmental and biosecurity issues are concerned. Attention to the interests of hapu and iwi as kaitiaki, in research execution and planning as well as in steps that may follow, will help ensure that gains made are not lost.
Funding agencies have an important role to play in insisting that, in experimental work, that independent replication becomes standard practice. Standards for the use of observational (“population based”) data to demonstrate claimed causative effects need to be tightened. There should be regular reviews such as Maindonald and Cox (1984), of published experimental and population-based studies that appear in Royal Society journals.
Following a first in Mathematics at what was then Auckland University College, and several teaching and lecturing positions, John Maindonald worked with other researchers, for the major part of his career, as a quantitative problem solver. He has held positions at Victoria University of Wellington, in DSIR, in HortResearch, and at The Australian National University (ANU).
Between 1983 and 1996, and occasionally after moving to Australia in 1996, he reviewed the statistical content of numerous papers that appeared in DSIR (later, Royal Society) journals, notably the New Zealand Journal of Agricultural Research and the New Zealand Journal of Crop and Horticultural Research.
The move to Australia opened up new and interesting vistas – work in the Centre for Clinical Epidemiology and Biostatistics at the University of Newcastle for 20 months, the work in the ANU Statistical Consulting Unit that gave contacts widely over the university. He joined the newly formed ANU Centre for Bioinformation Science in 2001. Following formal retirement in 2005, he was until 2021 a visiting fellow in the ANU Mathematical Sciences Institute. Between 2003 and 2015, he fronted a total of 35 short courses (most one week or less) that demonstrated the use of the open source R system for a wide range of data analysis and related purposes. These were conducted at the request of, or under the auspices of, a variety of Australian and other academic, research, and government organisations.
He is the author of a book on Statistical Computation, and the senior author of “Data Analysis and Graphics Using R — An Example-Based Approach” (Maindonald and Braun 2010). This has sold more than 11,000 copies over the three editions. A new text, derivative from the fourth edition, is in the late stages of preparation.
Upon returning to New Zealand in 2015 he was persuaded to become involved in three projects with the Plant and Food CRI, one of them the Plant Biosecurity project “Developing pre-approved and standardised quarantine treatments for fruit flies.”
See http://www.statsresearch.co.nz/john_maindonald.htm for a more detailed curriculum vitae.
Commentary in Kenna and Berche (2010) is pertinent, and for work in statistics, Kenna and Berche (2012).↩︎
www.victoria.ac.nz/events/2018/04/health-care-home-retooling-primary-health-care-for-the-21st-century↩︎
On measures of research quality, see Kenna, Mryglod, and Berche (2017).↩︎
See also Smith (2014). This entertainingly written book comments on examples, from published papers and from the media, of common types of data misinterpretation.↩︎
“Expert” evidence from a pediatrician who apparently had not seriously considered that “cot death” risk was likely to vary between families, would appear to have been a major contributor to the guilty verdict. Why did the legal experts involved, including the judge, not pick up on this point? See https://en.wikipedia.org/wiki/Sally_Clark↩︎
Links for NZ and international Cochrane websites are http://nz.cochrane.org/ and http://www.cochrane.org/↩︎
https://www.stuff.co.nz/environment/109932454/life-story-trevor-chinn-the-man-who-saved-glaciology↩︎
See also Higginson and Munafò (2016) on the effects of reward structures.↩︎
https://www.cdc.gov/nchs/data/databriefs/db394-tables-508.pdf#page=1↩︎
This use of machine learning algorithms is a form of “regression,” with the same scope for mistakes in use and in interpretation of output. Where the interest is accurate prediction, it may be termed “predictive modeling.”↩︎
See, e.g., Glauser (2018), commenting on Welch et al. (2018).↩︎
https://sciblogs.co.nz/infectious-thoughts/2015/08/28/niwa-in-astonishing-attack-on-scientist-association/↩︎
https://www.health.govt.nz/publication/nzdep2013-index-deprivation↩︎
Boeing faced fraud charges, and paid more than $2.5 billion in penalties and compensation. Serious failures of regulatory oversight were identified. Issues arose from the use of an add-on software system to adapt, for use with a larger engine that was placed higher up and further forward, software that had been designed and tested for earlier 737 models.↩︎
https://www.harding-center.mpg.de/en/box/magazin1/9433-fact-boxes↩︎
A more complete CV can be found at http://www.statsresearch.co.nz/john_maindonald.htm↩︎