Goal: In this process there will be three general goals that are accomplished by proofing the site data and attaching “flags” to the data representing any modifications made to or problems with the current file. Goals include:
~/siteDataCleaning/data2beCleaned/[city]siteDefinition.csv
site: The site code uses the convention: [first 4 letter of last name][first 3 letters of first name][2-letter state abbrv.][1 or 2]. For example, Bob Reitsma’s site in Maryland is REITBOBMD1. The 1 or 2 at the end of each site code represents whether a site has moved. For example, if Bob moved his previous house would be REITBOBMD1 and his new house would be REITBOBMD2. If there is a bad site code, include that code in the oldSite field and apply the correct site code in this field.
oldSite: This is ANY previous site code definition. In some cases the old site may include a misspelling. If this is the case, ensure that the misspelled name and correctly spelled name remain in the data file and that both are associated with the current site code.
lastName The last name of the participant. If there are multiple spellings of the last name, include all in the modified file.
firstName: The first name of the participant. If there are multiple spellings of the first name, include all in the cleaned file.
street: The number and street address of the participant associated with a given site.
city: Obvious
state: Two-letter state abbreviation.
zip: The five-digit zip code of the participant
notes: This field is used to provide a brief description of any problems associated with a record.
For each record (i.e., row in the spreadsheet):
Check that site follows the correct naming convention and ensure that all fields are uniquely associated with the appropriate site field. There may be multiple different oldSite’s associated with a single site field – this is acceptable. If no modification of the record is necessary, enter “1” as the flag code.
If you discover a problem with any of these fields, search for a solution to the problem. For example, you may find that there are multiple zip codes associated with a single site. If this is the case, enter the address into Google and see what the zip code should be.
If there was a problem with a record, add a brief note in the notes field describing the nature of the problem.
Save file as a “.csv” (Microsoft will give you a warning message when doing so – ignore the warning)
Goal: To determine a site’s position along the rural-to-urban gradient, it is necessary that the spatial location of the site is provided as accurately as possible. In some cases, latitude and longitudes that we have on record have been found to not be representative of the site’s location. In other cases, we have multiple latitude and longitudes on file and need to choose the one closest to the site. Here, you will compare provided latitude and longitudes (or those that are absent) with the location of sites on Google Maps to determine the best approximation of the spatial location of Nestwatch sites.
~/siteDataCleaning/data2beCleaned/[city]SiteLocation.csv
site: The site code uses the convention: [first 4 letter of last name][first 3 letters of first name][2-letter state abbrv.][1 or 2]. For example, Bob Reitsma’s site in Maryland is REITBOBMD1. The 1 or 2 at the end of each site code represents whether a site has moved. For example, if Bob moved his old house would be REITBOBMD1 and his new house would be REITBOBMD2.
longitude: Ensure that longitude values are provided in decimal degrees
latitude: Ensure that latitude values are provided in decimal degrees
notes: This field is used to provide any data cleaning notes relavent to fixes that were necessary.
For each record (i.e., row in the spreadsheet), check the latitude and longitude of the site. To do so, open the file ~/siteDataCleaning/data2beCleaned/[city]siteDefinition.csv alongside the site location file. Use Google Maps to navigate to the street address of the site. Google’s current geocoding isn’t perfect, but is rather good and will let you know if a site’s address is in the right ballpark. Use the following criteria to flag sites (providing a description of any problems in the notes field):
If the address of a site projected on Google Maps is within 100 m or so of the latitude and longitude on the spatial data spreadsheet, you can assign field code “1” and this record does not require modification.
If a site has multiple latitude and longitudes associated with it and one was deemed “close to” the site in Google Maps, those that are not close to the site can be flagged as “2”.
If a site has only one latitude and longitude on record and it is not projected close to the site address in Google Maps, assign the flag code “3” and do not proceed further with this site. Likewise, if a site has multiple latitude and longitudes on record and none are projected close to the site address in Google Maps, assign the flag code “3” to each of the records.