Abstract

Bees are essential for food production for humans and for the maintenance of natural ecosystems. This paper presents a proposal to predict the health level of honeybee colonies using data from internal and external beehive sensors and from in-loco inspections by beekeepers. The data set was obtained by gathering inspection information and internal and external sensors measurements, based on the date of collection. However, obtaining inspection data frequently is not feasible due to the stress caused to the beehive, especially in periods such as winter, where the beehive becomes more sensitive. As a solution, the beehives health status was obtained through a partitioning clustering method and then validated by in-loco inspection data already obtained. We propose a logistic regression model with an elastic net penalty, which consists of a fusion of lasso (l1) and ridge (l2) methods. We obtained a flexible and robust model compared to the usual logistic regression and a diagnostic tool that can avoid unnecessary inspections and, consequently, reduce the stress of the beehives.

Packages

Preprocessing

hide

Description of dataset

O conjunto de dados foi obtido pela união dos dados de sensores internos, externos e de inspeção através de um algoritmo criado em python na ferramenta google colab. E foi preprocessado (limpeza, consistência, imputação…) no software R.

Algumas informações a respeito do dataset:

Turno Contagem %
dia 18107 50.5
noite 17744 49.5
##    TurnDay            Brood_Temp     Brood_Humidity    Hive_Temp     
##  Length:35855       Min.   :-3.467   Min.   :22.00   Min.   :-5.744  
##  Class :character   1st Qu.:22.133   1st Qu.:62.00   1st Qu.:21.961  
##  Mode  :character   Median :30.078   Median :67.00   Median :28.544  
##                     Mean   :27.200   Mean   :66.21   Mean   :26.442  
##                     3rd Qu.:33.528   3rd Qu.:71.00   3rd Qu.:33.144  
##                     Max.   :39.950   Max.   :89.00   Max.   :39.928  
##                                                                      
##  Hive_Humidity       Weight        Ext_Temperature     DewPoint      
##  Min.   :19.00   Min.   :  1.034   Min.   :-10.00   Min.   :-10.000  
##  1st Qu.:60.00   1st Qu.: 23.092   1st Qu.:  2.50   1st Qu.:  1.220  
##  Median :66.00   Median : 28.032   Median : 12.80   Median :  7.200  
##  Mean   :65.51   Mean   : 27.980   Mean   : 13.19   Mean   :  8.261  
##  3rd Qu.:72.00   3rd Qu.: 31.715   3rd Qu.: 22.80   3rd Qu.: 17.000  
##  Max.   :93.00   Max.   :129.936   Max.   : 36.00   Max.   : 20.000  
##                                                                      
##  WindDirection     WindSpeed         Brood            Bees      
##  Min.   :  0.0   Min.   : 0.00   Min.   :0.000   Min.   :0.000  
##  1st Qu.:  0.0   1st Qu.: 0.00   1st Qu.:1.000   1st Qu.:1.000  
##  Median : 70.0   Median :15.00   Median :1.000   Median :1.000  
##  Mean   :114.7   Mean   :16.92   Mean   :0.852   Mean   :0.926  
##  3rd Qu.:220.0   3rd Qu.:31.00   3rd Qu.:1.000   3rd Qu.:1.000  
##  Max.   :360.0   Max.   :99.00   Max.   :1.000   Max.   :1.000  
##                                  NA's   :18420   NA's   :18420  
##      Queen            Food         Stressors         Space      
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:0.000   1st Qu.:0.000  
##  Median :1.000   Median :1.000   Median :0.000   Median :1.000  
##  Mean   :0.912   Mean   :0.958   Mean   :0.441   Mean   :0.724  
##  3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000   3rd Qu.:1.000  
##  Max.   :1.000   Max.   :1.000   Max.   :1.000   Max.   :1.000  
##  NA's   :18420   NA's   :18420   NA's   :18420   NA's   :18420

Clustering Analysis

Principal Component Analysis (PCA)

Projected high dimensional data in two dimensions with T-Stochastic Neighbour Embedding (T-SNE)