Initial EDA

Introduction.

This data set was sent by Dr. Matthew Rees from CSIRO. This is only a part of the data set and it contains data collected from several projects.

OVERVIEW: This document describes data collected as part of CSIRO / Grains Research and Development Corporation house mouse monitoring projects (from 2013). The purpose of these projects is to predict house mouse abundance across grain growing regions to provide early warning of potential mouse plagues. This data has been collected by CSIRO and subcontractors, CSIRO has collated and cleaned this data into this joint dataset. House mice are surveyed in crops using three different methods: live-traps, active burrow counts and chewcards. Live-trapping currently only occurs at three properties regularly (two paddocks are monitored simultaneously at a property). Active burrow counts and chewcards (i.e., rapid assessment) surveys are also conducted in each paddock simultaneously over one night.

Overview

data_source region farmer site subsite longitude latitude session_start_date session_end_date session_length_days year_adj month season crop_type crop_stage date survey_night number_functional_traps number_mice_caught n_males_site n_females_site number_unique_individuals_session number_functional_traps_session crop_category crop_age state year month_year season_year_adj
ecology central west andrew field anfield 1 anfield 1 148.1271 -33.1853 19/1/2021 23/1/2021 5 2021 NA Summer wheat stubble 19/1/2021 1 63 17 42 30 72 307 wheat fallow_stubble NSW 2021 1_2021 Summer-2021
ecology central west andrew field anfield 1 anfield 1 148.1271 -33.1853 19/1/2021 23/1/2021 5 2021 NA Summer wheat stubble 20/1/2021 2 62 17 42 30 72 307 wheat fallow_stubble NSW 2021 1_2021 Summer-2021
ecology central west andrew field anfield 1 anfield 1 148.1271 -33.1853 19/1/2021 23/1/2021 5 2021 NA Summer wheat stubble 21/1/2021 3 59 23 42 30 72 307 wheat fallow_stubble NSW 2021 1_2021 Summer-2021
ecology central west andrew field anfield 1 anfield 1 148.1271 -33.1853 19/1/2021 23/1/2021 5 2021 NA Summer wheat stubble 22/1/2021 4 60 29 42 30 72 307 wheat fallow_stubble NSW 2021 1_2021 Summer-2021
ecology central west andrew field anfield 1 anfield 1 148.1271 -33.1853 19/1/2021 23/1/2021 5 2021 NA Summer wheat stubble 23/1/2021 5 63 25 42 30 72 307 wheat fallow_stubble NSW 2021 1_2021 Summer-2021
Data summary
Name data_live_trapping
Number of rows 1900
Number of columns 29
_______________________
Column type frequency:
character 16
numeric 13
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
data_source 0 1.00 5 10 0 5 0
region 0 1.00 9 15 0 5 0
farmer 0 1.00 9 15 0 21 0
site 0 1.00 3 18 0 46 0
subsite 0 1.00 3 18 0 64 0
session_start_date 0 1.00 8 10 0 158 0
session_end_date 0 1.00 8 10 0 164 0
season 0 1.00 6 6 0 4 0
crop_type 15 0.99 4 14 0 18 0
crop_stage 622 0.67 7 13 0 6 0
date 14 0.99 8 10 0 475 0
crop_category 18 0.99 5 13 0 6 0
crop_age 431 0.77 3 14 0 3 0
state 0 1.00 2 3 0 3 0
month_year 0 1.00 6 7 0 74 0
season_year_adj 0 1.00 11 11 0 47 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
longitude 0 1.00 142.71 4.15 138.43 138.59 142.02 147.86 148.41 ▇▅▁▁▇
latitude 0 1.00 -33.77 1.34 -35.14 -34.41 -34.38 -33.15 -30.82 ▇▁▂▁▁
session_length_days 32 0.98 3.83 1.00 1.00 3.00 3.00 5.00 7.00 ▁▇▃▅▁
year_adj 0 1.00 2020.82 3.53 2012.00 2019.00 2022.00 2023.00 2026.00 ▂▂▅▇▅
month 951 0.50 6.22 2.70 2.00 3.00 6.00 9.00 11.00 ▇▂▇▆▃
survey_night 0 1.00 2.40 1.21 1.00 1.00 2.00 3.00 7.00 ▇▃▂▁▁
number_functional_traps 0 1.00 44.14 19.59 0.00 34.00 36.00 63.00 64.00 ▁▃▅▁▇
number_mice_caught 0 1.00 6.98 12.92 0.00 0.00 1.00 7.00 89.00 ▇▁▁▁▁
n_males_site 0 1.00 10.79 18.21 0.00 0.00 2.00 13.00 122.00 ▇▁▁▁▁
n_females_site 0 1.00 9.21 17.54 0.00 0.00 2.00 9.00 120.00 ▇▁▁▁▁
number_unique_individuals_session 0 1.00 19.99 34.93 0.00 0.00 3.00 23.00 242.00 ▇▁▁▁▁
number_functional_traps_session 0 1.00 183.44 113.65 29.00 104.00 108.00 310.00 448.00 ▇▁▂▅▁
year 0 1.00 2020.80 3.53 2012.00 2019.00 2022.00 2023.00 2026.00 ▂▂▅▇▅

Data Description

Extracted from the data description provided by Dr. Matt Reese
Type / Unique Values Description Values Source
data_source ecology, monitoring, nsw dpi, dpird, csiro [Variable not yet documented] [See data] [See processing functions]
region central west, coonamble, narromine, adelaide plains, northern mallee Broad geographic region within state where cropping systems are similar and mouse populations are assumed to follow similar trajectories (after accounting for site-level variables) Various regions (e.g., Yorke Mid North, Wimmera, Coleambally) Field data, condensed and standardized using data_fix_regions_add_state()
farmer andrew field, george somers, richard rice, ross armstrong, fergus lefebvre, blake hodgson, richard quigley, matt farley, nat wilson, andrew barr, brett tucker, adrian mccabe, brett davies, richard fabry, paul lush, brad griff, sam irish, jake harkness, richard konzag, kelvin tiller, jim wakefield [Variable not yet documented] [See data] [See processing functions]
site anfield 1, anfield 2, bap east, bap west, big red, daisy, dam green, george, george rail, george shed, henry, rice north, rice south, ross, ross armstrong gr2, nardoo crop, nardoo pasture, rosedale crop, rosedale pasture, enmore crop, enmore pasture, paper road crop, paper road pasture, tottenham crop, tottenham pasture, barr no sheep, barr sheep, brett tucker, cemetery, clary, davies tg, fabry tg, fidge, finch 10, griff tg, irish tg, jharkness, mccabe no sheep, mccabe sheep, paul lush m, rk home, rk murphy, stockyard, tiller tg, vee, jim wakefield Farm or monitoring site name Site-specific identifiers Field data from database
subsite anfield 1, anfield 2, bap east, bap west, big red, daisy, dam green, george, george rail, george shed, henry, rice north, rice south, ross, gr2 tg1 ab, gr2 tg2 ab, nardoo crop, nardoo pasture, rosedale crop, rosedale pasture, enmore crop, enmore pasture, paper road crop, paper road pasture, tottenham crop, tottenham pasture, barr no sheep, barr sheep, brett tucker south, bthb fl, bthb tg, plhb, tuckeast, tuckeastfl, tuckwest, cemetery, clary, davies tg, fabry tg, fidge, finch 10, griff tg, irish tg, jharkness, mccabe no sheep, mccabe sheep, jla tg, jlaf1scrub, jlb tg, jlbf2crop, rk home, rk murphy fl, rk murphy tg, stockyard, tiller tg, vee, jw1stubfence, jw1stubpad, jw2crop, jw2edge, jwa tgcrop, jwaf1crop, jwaf2scrub, jwb tgcrop Specific paddock within the farm name Subsite-specific identifiers Field data from database
longitude Numerical Variable Longitude coordinate of monitoring location Decimal degrees (EPSG:4326) Field data from GPS coordinates
latitude Numerical Variable Latitude coordinate of monitoring location Decimal degrees (EPSG:4326) Field data from GPS coordinates
session_start_date character Start date of survey session YYYY-MM-DD format Field data (dateset or first capture date)
session_end_date character End date of survey session YYYY-MM-DD format Field data (daterecovered or last capture date)
session_length_days Numerical Variable Duration of survey session in days Typically 1-5 days Calculated as: session_end_date - session_start_date + 1
year_adj Numerical Variable Adjusted year for seasonal analyses (December → next year) Four-digit year Calculated: if month==12 then year+1 else year
month Numerical Variable Month of session start 1-12 Extracted from session_start_date
season Summer, Winter, Spring, Autumn Austral season based on session midpoint Summer (Dec-Feb), Autumn (Mar-May), Winter (Jun-Aug), Spring (Sep-Nov) Calculated from session midpoint date
crop_type wheat, canola, NA, barley, fallow, cereal unknown, chick peas, pasture, oats, chickpea, cereal, faber beans, fence line, lentils, peas, bean unknown, lupins, vetch, unknown Specific crop type at monitoring location Various crop names (lowercase) Field data (cropname), standardized to lowercase
crop_stage stubble, seedling, NA, ripening/ripe, seeding, tillering, flowering Growth stage of crop at time of monitoring e.g., tillering, seedling, flowering, mature, stubble, fallow Field data (cropstageold), standardized to lowercase
date character [Variable not yet documented] [See data] [See processing functions]
survey_night Numerical Variable [Variable not yet documented] [See data] [See processing functions]
number_functional_traps Numerical Variable Number of deployed traps which were available to mice over the night [See data] [See processing functions]
number_mice_caught Numerical Variable total number of mice captures across traps over the night [See data] [See processing functions]
n_males_site Numerical Variable Count of male mice across whole session (3-5 days usually) [See data] [See processing functions]
n_females_site Numerical Variable Count of female mice across whole session (3-5 days usually) [See data] [See processing functions]
number_unique_individuals_session Numerical Variable Count of known unique individuals across whole session (not counting for recaptures of same individual) [See data] [See processing functions]
number_functional_traps_session Numerical Variable Sum of functional across each night in the trapping session [See data] [See processing functions]
crop_category wheat, oilseeds, NA, coarse_grains, pulses, grass, fence Broad crop category for analysis wheat, coarse_grains, pulses, oilseeds, fallow_stubble, fence, grass Derived from crop_type and crop_stage using pattern matching
crop_age fallow_stubble, young, NA, old Simplified crop age classification young, old, NA (for non-crop habitats) Derived from crop_stage patterns; NA for fence/grass/fallow_stubble
state NSW, SA, VIC Australian state where monitoring occurred NSW, VIC, SA, QLD, WA Derived from spatial join with Australian state boundaries
year Numerical Variable Calendar year of session start Four-digit year Extracted from session_start_date
month_year character Combined month and year of session start Format: ‘M_YYYY’ (e.g., ‘1_2013’, ‘12_2014’) Extracted from session_start_date
season_year_adj character Combined season and adjusted year (ordered factor) Format: ‘Season-YYYY’ (e.g., ‘Summer-2013’, ‘Autumn-2014’) Calculated from session midpoint; December assigned to following year

Missing Values

Data Collection Sites

Dynamic map of the collection sites

Static map of the collection sites

Temporal Distribution of Mice

Distribution of House Mice by crop type
Crop Type Total Sites Percentage (%)
wheat 184.00 53.64
fence line 44.00 12.83
pasture 27.00 7.87
canola 25.00 7.29
barley 21.00 6.12
lentils 17.00 4.96
cereal unknown 9.00 2.62
peas 4.00 1.17
chick peas 3.00 0.87
faber beans 3.00 0.87
bean unknown 1.00 0.29
cereal 1.00 0.29
unknown 1.00 0.29
chickpea 0.00 0.00
fallow 0.00 0.00
lupins 0.00 0.00
oats 0.00 0.00
vetch 0.00 0.00
NA 3.00 0.87
Total 343.00 100.00
Percentages are based on the total number of occupied sites.
Distribution of House Mice by crop stage
Crop Stage Total Sites Percentage (%)
stubble 136.00 39.65
seedling 35.00 10.20
ripening/ripe 34.00 9.91
flowering 33.00 9.62
tillering 18.00 5.25
seeding 13.00 3.79
NA 74.00 21.57
Total 343.00 100.00
Percentages are based on the total number of occupied sites.
crop type × crop stage (Count + Row %)
crop_type NA flowering ripening/ripe seedling stubble tillering seeding
barley 0 (0%) 1 (4.76%) 3 (14.29%) 1 (4.76%) 15 (71.43%) 1 (4.76%) 0 (0%)
bean unknown 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (100%) 0 (0%) 0 (0%)
canola 0 (0%) 7 (28%) 2 (8%) 7 (28%) 7 (28%) 0 (0%) 2 (8%)
cereal 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (100%) 0 (0%) 0 (0%)
cereal unknown 0 (0%) 0 (0%) 0 (0%) 2 (22.22%) 0 (0%) 7 (77.78%) 0 (0%)
chick peas 0 (0%) 0 (0%) 1 (33.33%) 1 (33.33%) 0 (0%) 1 (33.33%) 0 (0%)
chickpea 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%)
faber beans 1 (33.33%) 1 (33.33%) 1 (33.33%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
fallow 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%)
fence line 42 (95.45%) 0 (0%) 0 (0%) 2 (4.55%) 0 (0%) 0 (0%) 0 (0%)
lentils 0 (0%) 3 (17.65%) 0 (0%) 5 (29.41%) 9 (52.94%) 0 (0%) 0 (0%)
lupins 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%)
oats 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%)
pasture 27 (100%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
peas 0 (0%) 2 (50%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2 (50%)
unknown 0 (0%) 0 (0%) 0 (0%) 1 (100%) 0 (0%) 0 (0%) 0 (0%)
vetch 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%) 0 (NaN%)
wheat 1 (0.54%) 19 (10.33%) 27 (14.67%) 16 (8.7%) 103 (55.98%) 9 (4.89%) 9 (4.89%)
NA 3 (100%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)

Research Gap

In Australia, monitoring mouse populations is mainly conducted by the Grains Research and Development Corporation (GRDC) and coordinated by the Commonwealth Scientific and Industrial Research Organisation (CSIRO). Their main approach to forecasting and predicting mouse plagues is through advanced monitoring and intensive surveillance techniques. A special benchmark of the mouse monitoring system is the introduction of ground monitoring of mouse populations in 2012. Since then, CSIRO has conducted ground monitoring surveys three times per year, during Autumn, Winter, and Spring. In addition to that, they use data collected from the “MouseAlert” system, a free resource that grain producers and farmers can use to report the prevalence of mice in their fields, to gain insights into mouse plagues. Using data collected through these methods, they provide early warnings to farmers and other interested parties about the possibility of mouse plagues and the necessary control measures that should be taken to prevent them (CSIRO (2026) ; CSIRO (2021) ).

Since the 1980s, there have been four main strategies employed in Australia to control mouse plagues, namely predicting mouse plagues, informing the responsible parties, controlling mouse populations, and assessing the effectiveness of these control measures, collectively known as the PICA strategies (Redhead and Singleton (1988) ). As a result, there have been many efforts to forecast mouse plagues in Australia, and a number of models have been developed (Cantrill (1992) ; R. P. Pech et al. (1999) ; Brown et al. (2024) ). Since then, CSIRO and GRDC have also been able to employ statistical methods to predict mouse outbreaks and successfully predicted the 1994 and 2001–2003 mouse plagues (CSIRO (2021) ).

Currently, in Australia, the National Mouse Group (previously the National Mouse Management Working Group), a group coordinated by CSIRO and funded by GRDC, is responsible for forecasting, predicting, and managing mouse outbreaks. The group has made several attempts to improve mouse surveillance and management. However, according to Brown et al. (2024), the forecast models introduced by Kenney et al. (2003) and R. P. Pech et al. (1999) are still used as the foundation for current forecasts. The issue with using such models in recent times is that rainfall patterns, crop diversity, and farming practices can change over time, making the validity of these models questionable (Singleton et al. (2005) ).

Even though there have been efforts to improve forecasting methods through studies such as R. Pech et al. (2015), which introduced a more advanced Bayesian modelling framework, these approaches have been unable to fully utilise live-trapping data because the collection of live-trapping data only began in 2012 (White et al. (2024) ). Additionally, there is still room for improvement by incorporating factors such as soil type and cropping systems into the models. Furthermore, rather than using citizen-reported data solely as a surveillance tool, it would be interesting to incorporate such data directly into the modelling process alongside data collected from surveys. From a statistical point of view, it may be possible to further improve forecasting performance through the use of modern machine learning and deep learning models while accounting for the spatio-temporal nature of the data.

Example Models

Questions to answer.

  • Does number of days the survey conducted, size of the trap used, number of traps used have a serious impact on number_unique_individuals_session ?

    • The response variable to be used in the modelling process is . However, this number does not solely depend on the number of mice in a subsite; it also depends on the effort put into catching the mice. For example, this number could depend on the number of days the survey was conducted, the size of the trap used, and the number of traps used. One solution to account for the impact of these variables is to include them in the model as predictor variables, or to normalise and derive a new response variable from the current response variable . Even if these effort-related variables are added as predictors, there is still a possibility that the response variable does not correctly represent the total mouse population in the subsite. Therefore, it is necessary to derive a variable that represents mouse density. For example, in the study by R. P. Pech et al. (1999), the fraction of traps that caught mice was used as an index to represent true population density.
  • Is there any data from Western Australian region?

References

Brown, Peter R, Patrick Giraudoux, Jens Jacob, Geoffroy Couval, and Christian Wolff. 2024. “Multi-Stakeholder Working Groups to Improve Rodent Management Outcomes in Agricultural Systems.” International Journal of Pest Management, 1–17.
Cantrill, Steven. 1992. “The Population Dynamics of the House Mouse (Mus Domesticus) in a Dual Crop Agricultural Ecosystem.” PhD thesis, Queensland University of Technology.
CSIRO. 2021. “Tracking Australia’s Mice.” https://www.csiro.au/en/research/animals/pests/Mouse-Census.
———. 2026. “Mouse Monitoring and Surveillance.” https://research.csiro.au/rm/research-projects/mouse-monitoring-surveillance/.
Kenney, Alice J, Charles J Krebs, Stephen Davis, Roger Pech, Greg Mutze, and Grant R Singleton. 2003. “Predicting House Mouse Outbreaks in the Wheat-Growing Areas of South-Eastern Australia.” ACIAR Monograph Series 96: 325–28.
Pech, Roger P, Greg M Hood, Grant R Singleton, Elizabeth Salmon, Robert I Forrester, and Peter R Brown. 1999. “Models for Predicting Plagues of House Mice (Mus Domesticus) in Australia.” Ecologically-Based Management of Rodent Pests. Canberra, Australian Centre for International Agricultural Research, 81–112.
Pech, Roger, Peter Brown, Jennyffer Cruz, Steve Henry, Lyn Hinds, Andrea Byrom, Peter West, and J Farrell. 2015. “Surveillance and Forecasts for Mouse Outbreaks in Australian Cropping Systems.” Invasive Animals Cooperative Research Centre: Canberra, ACT, Australia.
Redhead, T. D., and G. R. Singleton. 1988. “The PICA Strategy for the Prevention of Losses Caused by Plagues of Mus Domesticus in Rural Australia.” EPPO Bulletin 18 (2): 237–48. https://doi.org/10.1111/j.1365-2338.1988.tb00371.x.
Singleton, Grant R, Peter R Brown, Roger P Pech, Jens Jacob, Greg J Mutze, and Charles J Krebs. 2005. “One Hundred Years of Eruptions of House Mice in Australia–a Natural Biological Curio.” Biological Journal of the Linnean Society 84 (3): 617–27.
White, Jennifer, Joanne Taylor, Peter R Brown, Steve Henry, Lucy Carter, Aditi Mankad, Wei-Shan Chang, et al. 2024. “The New South Wales Mouse Plague 2020-2021: A One Health Description.” One Health 18: 100753.