INTRO: For my final project I created two visualizations that explore various types of mining metrics in the state of California . All the data I used came from the DMR dataset, which can be found on California’s open data website. Key variables are used within this project where the reported years of mining, permitted acres, actively used acres, types of product mined, type of mining operation, and finally volume of material collected. The data was collected from an annual miners operators report, and submissions are required by law in California. The two main questions I sought out to explore were what was the overall difference between active and permitted acres, and do different mining practices produce different levels of material volume. I chose this topic because I’ve thought about going into the environmental field and was curious to learn more about which materials/operation types are the most impactful. (Data: https://lab.data.ca.gov/dataset/dmr-all-mines)
From lines 14-22 I loaded in my dataset and the necessary packages.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)library(highcharter)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Rows: 2205 Columns: 35
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (24): Mine_ID, MineName, Address, LAS_Name, MineStatus, Rec_Status, Oper...
dbl (11): X, Y, OBJECTID, Latitude, Longitude, LAS_CODE, ReportYear, ReportT...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
From lines 25-28 I used the selcet function to filter out the variables I’ll need
From lines 39-41 I summoned a table to show me each of the mined materials and decided it would be more useful to put them into groups
table(MS2$PriProduct)
Asbestos Bituminous Rock Borates
1 2 3
Cinders Clay Decomposed Granite
12 36 46
Decorative Rock Diatomite Dimension Stone
16 4 5
Dolomite Feldspar Fill Dirt
4 1 39
Gemstones Gold (Lode) Gold (Placer)
1 18 9
Gypsum Iron Ore Kyanite
8 3 1
Limestone NOT REPORTED Platinum Group Metals
26 19 1
Pumice Rare Earth Elements Rock
7 1 41
Saline Compounds Salt Sand and Gravel
3 1 552
Shale Silica Specialty Sand
13 8 7
Stone Talc Zeoltes
40 4 2
From lines 45-60 I manually placed each material into one of ten groups based on property. Fortunately I was able to find a website that would give me the definition of each item, called Mindat.org
When trying to do linear regression I discovered that Most my variables were listed down as characters, so I quickly switch PermitAcre and Acres_Dist to numerical.
From lines 72-77 I performed a linear regression that compared the distance between active and permitted acres, or unused acres, and compared it the material types.The lack of significant coefficients reinforces that differences in Materials do not have a meaningful impact on Acre_Difference. With large p-values and negative r-squared it is not the most accurate but I hypothesized that clay would have the biggest disparity between active and permitted acres
From lines 94-121 I combined elements a line and scatter plot to make what is known as a dumbbell plot, a plot that compares two related values and shows the difference between them. First I combined the two previously listed data frames into one. Then I calculated the gap between active acres and permitted acres and used the arrange command to sort them from biggest to smallest gap. Then I utilized geom_segment to create the connecting lines and made those lines gray. The following step was adding the points and adjusting their size to my preference. Finally I added the title, caption, legend and customized the sizes to my liking. I chose bright colors to stand out and went with grey simply because every dumbbell plot I’ve scene uses gray for the line.
dumbbell_data <-data.frame(Materials = Ovrl_Material$Materials,Active_Acres = Ovrl_Material$mean_acre,Permitted_Acres = Ovrl_Permitted$mean_permit_acre)dumbbell_data <- dumbbell_data %>%mutate(gap =abs(Active_Acres - Permitted_Acres)) %>%arrange(desc(gap)) ggplot(dumbbell_data) +geom_segment(aes(x = Active_Acres, xend = Permitted_Acres, y =reorder(Materials, gap), yend =reorder(Materials, gap)),color ="grey", size =1) +geom_point(aes(x = Active_Acres, y =reorder(Materials, gap), color ="Active Acres"), size =4) +geom_point(aes(x = Permitted_Acres, y =reorder(Materials, gap), color ="Permitted Acres"), size =4) +labs(title ="Dumbbell Chart: Permitted vs Active Acres",x ="Acres",y ="Materials",color ="Legend", caption ="Source: DMR's dataset from California.gov") +scale_color_manual(values =c("Active Acres"="violet", "Permitted Acres"="orange")) +theme_classic() +theme(plot.title =element_text(hjust =0.5, size =18, face ="bold"),axis.text =element_text(size =12),axis.title =element_text(size =14),legend.text =element_text(size =12),legend.title =element_text(size =14))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
The first visulaization displays the the drastic differences between which materials have the most unused materials. I hypothesized thta clay would have the biggest disparity, and while it cam second, it was really nowhere close. Minerals had by far the largest unused acres, thanks in part to having the highest average of active and permitted acres. The inverse also exposes how Ore ming waste next to no space. My only assumption is becuase ores are so far below the surfaces, you mine in one small spot and just head straight down. It makes ense that golds, metals and minerals would requat and use most space as they are very rare materials that probably require vast areas just to find small nuggets—while more common materials like rock, drit, and stones don’t need nearly as much space
From lines 128-130 I summoned a table to show we each Operation type, and like before, decided my best curse of action was to sort these into groups.
table(MS2$Opt_Type)
BRINE WELL
4
BRINE WELL,EVAPORATION SYSTEM
2
DREDGE
4
DREDGE,OPEN PIT
2
DREDGE,OPEN PIT,PLANT OR MILL
1
DREDGE,PLANT OR MILL,TAILINGS PROCESSING
1
DREDGE,STREAMBED OR GRAVEL BAR SKIMMING AND PITTING
2
DREDGE,STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,OPEN PIT,PLANT OR MILL
1
DREDGE,STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,PLANT OR MILL,QUARRY
1
HYDRAULIC
1
HYDRAULIC,OPEN PIT
1
OPEN PIT
537
OPEN PIT,PLANT OR MILL
15
OPEN PIT,PLANT OR MILL,QUARRY
3
OPEN PIT,PROSPECT
1
OPEN PIT,QUARRY
14
OPEN PIT,QUARRY,TAILINGS PROCESSING
1
OPEN PIT,TAILINGS PROCESSING
6
OPEN PIT,UNDERGROUND
1
PLANT OR MILL
8
PLANT OR MILL,PROSPECT
1
PLANT OR MILL,QUARRY
2
PLANT OR MILL,TAILINGS PROCESSING
1
PLANT OR MILL,UNDERGROUND
1
PROSPECT
2
QUARRY
174
QUARRY,TAILINGS PROCESSING
1
QUARRY,UNDETERMINED
1
STREAMBED OR GRAVEL BAR SKIMMING AND PITTING
85
STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,OPEN PIT,PLANT OR MILL
1
STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,OPEN PIT,PLANT OR MILL,QUARRY
1
STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,PLANT OR MILL
4
STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,QUARRY
2
STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,TAILINGS PROCESSING
1
TAILINGS PROCESSING
22
UNDERGROUND
6
UNDETERMINED
23
From lines 135-147 I created 7 groups based on operation types under the knew variable “Mining” using the function mutate. This one didn’t require much research outside of googling waht a dredge was (mining underwater).
MS7 <- MS2 %>%mutate(Mining =case_when( Opt_Type %in%c("BRINE WELL", "BRINE WELL,EVAPORATION SYSTEM") ~"Wells", Opt_Type %in%c("DREDGE", "DREDGE,OPEN PIT", "DREDGE,OPEN PIT,PLANT OR MILL", "DREDGE,PLANT OR MILL,TAILINGS PROCESSING", "DREDGE,STREAMBED OR GRAVEL BAR SKIMMING AND PITTING", "DREDGE,STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,OPEN PIT,PLANT OR MILL", "DREDGE,STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,PLANT OR MILL,QUARRY", "HYDRAULIC", "HYDRAULIC,OPEN PIT") ~"Dredges", Opt_Type %in%c("PLANT OR MILL", "PLANT OR MILL,PROSPECT", "PLANT OR MILL,QUARRY", "PLANT OR MILL,TAILINGS PROCESSING", "PLANT OR MILL,UNDERGROUND", "PROSPECT") ~"Plant/Mills", Opt_Type %in%c("STREAMBED OR GRAVEL BAR SKIMMING AND PITTING", "STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,OPEN PIT,PLANT OR MILL", "STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,OPEN PIT,PLANT OR MILL,QUARRY", "STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,PLANT OR MILL", "STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,TAILINGS PROCESSING", "STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,QUARRY") ~"Streamed", Opt_Type %in%c("QUARRY", "QUARRY,TAILINGS PROCESSING", "QUARRY,UNDETERMINED") ~"Quarries", Opt_Type %in%c("OPEN PIT", "OPEN PIT,PLANT OR MILL", "OPEN PIT,PLANT OR MILL,QUARRY", "OPEN PIT,PROSPECT", "OPEN PIT,QUARRY", " OPEN PIT,QUARRY,TAILINGS PROCESSING", "OPEN PIT,UNDERGROUND", "OPEN PIT,TAILINGS PROCESSING", "OPEN PIT,QUARRY,TAILINGS PROCESSING") ~"Open Pit", Opt_Type %in%c("UNDETERMINED", "UNDERGROUND", "TAILINGS PROCESSING") ~"Undetermined",TRUE~"Other"))
From lines 152-156 I created my second linear regression. This one proved to fair a little bit better with a positive R squared but still had large p-values, indicating that Mining type has no significant effect on Mech_Total, as none of the coefficients are statistically significant….Based on the p-values I hypotheized that Open pit would have the most mech_total
MS7$Mining <-as.factor(MS7$Mining)lm_model2 <-lm(Mech_Total ~ Mining, data = MS7)summary(lm_model2)
Call:
lm(formula = Mech_Total ~ Mining, data = MS7)
Residuals:
Min 1Q Median 3Q Max
-1678681 -1303670 -760169 -145984 103154279
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 625591 1572848 0.398 0.691
MiningOpen Pit 720130 1591782 0.452 0.651
MiningPlant/Mills 130007 2186956 0.059 0.953
MiningQuarries -260680 1634207 -0.160 0.873
MiningStreamed -558995 1685911 -0.332 0.740
MiningUndetermined -394157 1775654 -0.222 0.824
MiningWells 724135 2871614 0.252 0.801
Residual standard error: 5885000 on 927 degrees of freedom
Multiple R-squared: 0.007832, Adjusted R-squared: 0.001411
F-statistic: 1.22 on 6 and 927 DF, p-value: 0.2936
When Trying to make a grpah I kept running into the same issue: there was an obsured uptick of production in 2023, particularly amongst OPEN PIT, so I decided it was best to remove it for the sake of the visualization.
MS9 <- MS7 %>%filter(ReportYear !="2023")
From lines 168-211 I made an area graph using high charter. First I did group_by with Mining and ReportYear to calculate counts, then calculated mean Mech_Total per Mining type. Next I simply listed the colors I wanted to use. After this I created the stacked area chart with ReportYear on the x-axis and set the x-axis categories to ReportYear. When making making the y-axis I set the min to 0 and the max the count. Under plotOptions I set the stacking to normal, the line width to 2.5, and fillOpacity to 0.7 because to ensure each variable pops out a bit. Next I used the arrange function to get the years for proper stacking. Following that I got the mean Mech_Total for the current Mining type and then set the colors for the current mining type from the ‘colors’ vector. Finally I used brackets to extract the mean value of Mech_Total for a specific mining type from the mean_mech_total data frame. From then on it was just plugging in data, color and what I wanted in my tooltip. I ended up choosing the to add the mean mech_total and rounded up to a whole number to make it clearer.
The second visulaization reveals the sheer dominance of Open Pit mining in California since the late 90s with a mean mech total of almost 700K tons per year. With peaks going of the graph, it is important to remeber that this category alone made me remove 2023 data as its production dwarfed the previous years. The only mining category that remotely gives Open Pit a run for its money are Quaries, followed by Streemed who average 67.5K and 40K a year, respectively. Others mining categories like Dredges, Plant/Mills, and Wells, barely show up—if at all— when compared to the other forms of mining. Finally the last category of Undetermind may be for only specific types mining, hence why its so small. There are also some pretty noticeable dips in production, like in the years 2000, 2005, 2008, 2017 and 2020. The only category that seems to have recovered from 2017 dip would be Open Pit
CONCLUSION: Mining Operation Types seem to have an effect on Mech_total, but Mined Materials can be used get generalized sense of how much land may be wasted, just not too specific as the linear regression analysis pointed out. I struggled a lot with this assignment, especially considering I decided to redo both my visualizations only a day ago because I didn’t like how they turned out (which is just how it goes sometimes). The first thing I struggled with was figuring out a way to the log function, in hopes that it would allow me to use the 2023 data, but it only made my visualizations look wonky. The second thing I struggled with was trying to use categorical variables for the regression. It took me a while, and I was fortunate to remember the factor command. It’s unfortunate that the numbers didn’t have stronger correlations. I am pleased with my visualizations though and am glad I challenged myself one last time.
RESEARCHER: From my brief background reserch I was able to learn many new things, such as dredging is the process of excavating debris from the bottom ofs rivers, lakes, and harbors, to maintain navigation channels, and prevent flooding, in addition to being a form mining valuable resources (USDC). I also accidentally learn more about California’s rich mining history and how it stems far before the gold rush as well as how mining/railroad jobs made California one of the most diverse states in the country (The Nat). Lastly this project heavily expanded my geology knowledge. Like listed eraly, I found a website called MIndat, which made this stage of my reasearch a cake walk. One common trend I found is that almost none of the listed rocks, metals, mierals, ores, or others went below 15,000 feet (Mindat).
WORKS CITED:
“DMR All Mines.” DMR All Mines | CA Open Data, lab.data.ca.gov/dataset/dmr-all-mines.
“History of Mining in Southern California.” The Nat, www.sdnhm.org/exhibitions/all-that-glitters/history/history-of-mining-in-southern-california/.
“Mindat.Org.” Mindat.Org - Mines, Minerals and More, www.mindat.org/.
US Department of Commerce, National Oceanic and Atmospheric Administration. “What Is Dredging?” NOAA’s National Ocean Service, 6 Sept. 2013, oceanservice.noaa.gov/facts/dredging.html.