In the beginning of the supply management project, Dr. Getu, Dr Slade and myself have reached out to Dairy Farmers of Ontario, Ontario Ministry of Agriculture, Food and Rural Affairs ,Canadian Agriculture Safety Association, Dairy Farmers of Ontario, The Office of the chief coroner for Ontario, the Geospatial Centre in University of Waterloo and librarians from UoS, UoT, UoG …. Albeit,the relevant information wasn’t available to be shared publicly owing to the sensitivity of the data. We even got denied access to the university of Guelph’s dairy Farmers of Ontario Annual Reports because the documents couldn’t leave the province.
The main issue with collecting vital data is that we first need to identify farmers’ names, as most obits’ search engines require the name of the deceased and often, we don’t have the option of searching by keyword or occupation. Thus, we decided to identify farmers’ names through land ownerships in Ontario throughout the years. Then, using the proportion of dairy farmers per region in a given year, we would identify the type of farmer. However, while this idea imposed a probabilistic model, we had no luck or access to land ownership data , due to propriety access, as “teranet”, the company that handles the Land Registry data was only willing to sell the information at a cost beyond our budget.
Finally, our librarian recommended Ancesstary.com to collect birth vital information. However, a closer look into the website revealed, that not all records, had an occupation field, let alone a dairy or beef farmer specification. Subsequently, building on the probabilistic approach, we opted for scraping voter rolls, which conveyed the subject’s occupation and would later allow us to filter by the voting region. The data spanned from 1935-1980. However, The problem with this data is that it only specified the year of voting and not death or birth year. Thus, the plan was to scrape voter rolls to generate a list of farmers in Ontario and Quebec. Then, we would scrape obits/death records per the collected data.
In the meantime, “ancestrylibrary.ca” was going through changes and added the option for a family tree search by keyword, which in turn, unlocked more and better observations since this record had birth/ death year information. This approach seemed more promessing, since it allowed us to skip the names’ collection process. Subsequently, we redirected our scrapper towards the ancessty family tree. Once we collected the data, we aimed to allocate a probability of being a dairy vs non dairy farmer.
Overall, we collected 632,432 potential farmers from the Ancestry Tree Search search engine. However, to remove any ambiguities, we restricted the data to 14,094 confirmed farmers such that the occupation field would clearly state “farmer” in the family tree. While some observations mentioned the type of farmer (hog farmer, dairy farmer..), Most didn’t. Therefore,in order to assess the probability of them being a dairy farmer, we had to geo-code the place of death of these observations. We process the birth year and death year columns, extract the relevant years and only keep the observations without ambiguities. Our analysis renders 12593 viable results.
Birth year ranged from 1609 till 1971 while death years spanned from 1672 to 2021 . The average death age is 71.14794.
Out of the 12593 observations, there were 172 confirmed dairy farmers with a lifespan average of 76.02907 . In what follows, we depict the leveraged methodology for how we assigned dairy probabilities for the remainder of the observations.
Upon generating a list of farmers in Ontario and Quebec, we identify dairy land proportions over the years. We reached out to STAT Canada and collected farm count data by farm type for the {1966-1971-1976-1981-1986-1991-2001-2006-2011-2016} census years at the consolidated subdivision level. We construct the following data set
The farm count data had ccs names from the earlier years that got consolidated in other regions throughout the years. Also, the regions’ ccs-cd-car encoding changed over the years and didn’t remain consistent per region.Moreover, region names lacked a naming convention that would allow us to merge by name (Ex: saint ~st ; Eastest.E ~ e ; unorganised ~uno ,… ). Some years would incorporate French Accent Marks while others didn’t. While, one might argue why not merge the farm count data to separate census shape files per year. We tried the aforementioned approach, however, we lacked census shape files for the 1966-1971-1976 years and since the region names and encoding structure changes fron one year to another. We couldn’t locate the 1966, 1971 and 1976 farm count data at the ccs level.
Thus, given the lack of consensus and structure among the files, the following methodology was developed to binary assign the probability of being a dairy farmer:
## Reading layer `dairy_farmer_0.5' from data source
## `/Users/fel817/Library/CloudStorage/OneDrive-UniversityofSaskatchewan/Impact of supply management on farmers health/DB_scrapping_farmers_death/2_FarmDistributionCanada/data/dairy_farmer_0.5.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 455 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -90.96695 ymin: 42.78244 xmax: -57.10549 ymax: 62.58286
## Geodetic CRS: NAD83
## Reading layer `non_dairy_farmer_0.2' from data source
## `/Users/fel817/Library/CloudStorage/OneDrive-UniversityofSaskatchewan/Impact of supply management on farmers health/DB_scrapping_farmers_death/2_FarmDistributionCanada/data/non_dairy_farmer_0.2.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 718 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -105.6364 ymin: 41.68144 xmax: -57.10549 ymax: 62.58286
## Geodetic CRS: NAD83
-9 : ambiguous, ie ; the relevant polygon that contains the longitude and longitude of the farmer’s death place is a 2016 census region that had a previous dairy and non dairy farm count ccs regions consolidated.
0 : non dairy region
1 : dairy region
A farmer is then considered a dairy farmer , if their place of death and death year falls within a dairy region
## Warning: NAs introduced by coercion
Another obits website that also offered relevant search results by keyword (ex: dairy farmer , beef farmer) and didn’t require the name of the deceased farmer was https://necrocanada.com/obituaries-2021/?s=dairy+farmer. However, while we were able to search by keywords, the search didn’t index the deaths by place. We process the text file to derive the age, birth year, death year and type of farmers. In doing so , we derive the following dataset.
The average age of death in this data set is 81.85542.## [1] 8004
We derive 8004 observations
## Warning: Removed 7 row(s) containing missing values (geom_path).
The steep dip for farmers that were born after 1946 could be explained by the censord data problem. In our data, we have only included dead people and we haven’t accounted for Canadians farmers that were born after 1946 and are currently alive (ie, age >75 )
##
## Call:
## lm(formula = dat$age ~ dairy * birth_year, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -71.743 -7.004 2.539 10.415 41.451
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 78.300652 9.267132 8.449 <2e-16 ***
## dairy -8.988815 21.693760 -0.414 0.679
## birth_year -0.003578 0.004975 -0.719 0.472
## dairy:birth_year 0.004948 0.011687 0.423 0.672
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.13 on 8000 degrees of freedom
## Multiple R-squared: 9.234e-05, Adjusted R-squared: -0.0002826
## F-statistic: 0.2463 on 3 and 8000 DF, p-value: 0.8641
While a linear regression model is not the right model; Do we need more controls to explain life expectancy ?? Too much noise in the data,