About the Dataset

We will be studying a forest dataset that was collected by the United States Forest Service. The dataset includes four wilderness areas located in the Roosevelt National Forest of northern Colorado, where each observation is a 30m x 30m patch. Our goal is to predict the forest cover type (the predominant kind of tree cover) of each patch using strictly cartographic variables (instead of remotely sensed data).

Map of Roosevelt National Forest Wilderness Areas


Dataset Variables

The variables in this data set are:

Variable Name Description Description of Values
Elevation Elevation in meters
Aspect Aspect in degrees azimuth
Slope Slope in degrees
Horizontal Distance To Hydrology Horz Dist to nearest surface water features
Vertical Distance To Hydrology Vert Dist to nearest surface water features
Horizontal Distance To Roadways Horz Dist to nearest roadway
Hillshade 9am Hillshade index at 9am, summer solstice (0 to 255 index)
Hillshade Noon Hillshade index at noon, summer soltice (0 to 255 index)
Hillshade 3pm Hillshade index at 3pm, summer solstice (0 to 255 index)
Horizontal Distance To Fire Points Horz Dist to nearest wildfire ignition points
Wilderness Area Wilderness area designation (4 binary columns, 0 = absence or 1 = presence)
Soil Type Soil Type designation (40 binary columns, 0 = absence or 1 = presence)
Cover Type Forest Cover Type designation (7 types, integers 1 to 7)

Wilderness Area

Code Name
1 Rawah Wilderness Area
2 Neota Wilderness Area
3 Comanche Peak Wilderness Area
4 Cache la Poudre Wilderness Area

Soil Types

Code Name Code Name
1 Cathedral family - Rock outcrop complex, extremely stony. 21 Typic Cryaquolls - Leighcan family, till substratum complex.
2 Vanet - Ratake families complex, very stony. 22 Leighcan family, till substratum, extremely bouldery.
3 Haploborolis - Rock outcrop complex, rubbly. 23 Leighcan family, till substratum - Typic Cryaquolls complex.
4 Ratake family - Rock outcrop complex, rubbly. 24 Leighcan family, extremely stony.
5 Vanet family - Rock outcrop complex complex, rubbly. 25 Leighcan family, warm, extremely stony.
6 Vanet - Wetmore families - Rock outcrop complex, stony. 26 Granile - Catamount families complex, very stony.
7 Gothic family. 27 Leighcan family, warm - Rock outcrop complex, extremely stony.
8 Supervisor - Limber families complex. 28 Leighcan family - Rock outcrop complex, extremely stony.
9 Troutville family, very stony. 29 Como - Legault families complex, extremely stony.
10 Bullwark - Catamount families - Rock outcrop complex, rubbly. 30 Como family - Rock land - Legault family complex, extremely stony.
11 Bullwark - Catamount families - Rock land complex, rubbly. 31 Leighcan - Catamount families complex, extremely stony.
12 Legault family - Rock land complex, stony. 32 Catamount family - Rock outcrop - Leighcan family complex, extremely stony.
13 Catamount family - Rock land - Bullwark family complex, rubbly. 33 Leighcan - Catamount families - Rock outcrop complex, extremely stony.
14 Pachic Argiborolis - Aquolis complex. 34 Cryorthents - Rock land complex, extremely stony.
15 unspecified in the USFS Soil and ELU Survey. 35 Cryumbrepts - Rock outcrop - Cryaquepts complex.
16 Cryaquolis - Cryoborolis complex. 36 Bross family - Rock land - Cryumbrepts complex, extremely stony.
17 Gateview family - Cryaquolis complex. 37 Rock outcrop - Cryumbrepts - Cryorthents complex, extremely stony.
18 Rogert family, very stony. 38 Leighcan - Moran families - Cryaquolls complex, extremely stony.
19 Typic Cryaquolis - Borohemists complex. 39 Moran family - Cryorthents - Leighcan family complex, extremely stony.
20 Typic Cryaquepts - Typic Cryaquolls complex. 40 Moran family - Cryorthents - Rock land complex, extremely stony.

Cover Types

Code Name
1 Spruce/Fir
2 Lodgepole Pine
3 Ponderosa Pine
4 Cottonwood/Willow
5 Aspen
6 Douglas-fir
7 Krummholz



Problem Definition

Trees bring incredible value to life on Earth. They have several important functions, including:

Trees produce the oxygen we breathe. They convert carbon dioxide to oxygen via photosynthesis. Trees eat the greenhouse gases that cause climate change by storing the carbon dioxide, which helps slow the gas’s buildup in our environment. Trees provide homes to wildlife. Trees help control the climate by providing us with shade to protect us from the hot sun, providing a screen to protect us from harsh wind, and shielding us from rain. Trees absorb and store water, preventing the transport of chemicals into streems and flooding. We need to help trees help us. Accurately predicting forest cover types is important for more efficient and effective tree management so that trees can continue to do a good job of nourishing the planet and supporting our lives. Accurate prediction can help with tasks including:

Conservation measures Urban planning as Americans expand and continue to build Conservation of plant diversity Monitoring forest health and forest management

EDA

Sample size varies between wilderness areas and is lowest for Neota and highest for Commanche Peak.


Cover types vary by wilderness area with some cover types not present in certain areas. The number of cover types vary between 3 and 6 per wilderness area.


We also explored the correlations between all numeric features in the training set.


The box plots show distribution of cover types most correlated with elevation, with horizontal distance to roadways, horizontal distance to hydrology, and horizontal distance to fire points also showing some possible correlation with cover type.


We looked into the 4 features mentioned above showing possible correlation with cover type.


htmltools::includeHTML("file:///Users/JGJA/Desktop/W207/final_project/final_group_project/elevation.html")

Models

  1. KNN
  2. Logistic Regression
  3. Random Forest

References

Link to dataset: https://www.kaggle.com/c/forest-cover-type-prediction

https://www.analyticsvidhya.com/blog/2021/08/how-to-perform-exploratory-data-analysis-a-guide-for-beginners/ http://www.sangres.com/colorado/wilderness/neota.htm#.YlDwAVjMLG8 https://builtin.com/data-science/random-forest-algorithm