California Mines

Author

Asher Scott

INTRO: For my final project I created two visualizations that explore various types of mining metrics in the state of California . All the data I used came from the DMR dataset, which can be found on California’s open data website. Key variables are used within this project where the reported years of mining, permitted acres, actively used acres, types of product mined, type of mining operation, and finally volume of material collected. The data was collected from an annual miners operators report, and submissions are required by law in California. The two main questions I sought out to explore were what was the overall difference between active and permitted acres, and do different mining practices produce different levels of material volume. I chose this topic because I’ve thought about going into the environmental field and was curious to learn more about which materials/operation types are the most impactful. (Data: https://lab.data.ca.gov/dataset/dmr-all-mines)

From lines 14-22 I loaded in my dataset and the necessary packages.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)
library(highcharter)

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo

library(stringr)
library(ggplot2)
setwd("/Users/asherscott/Desktop/Data 110")
Mines <- read_csv("DMR_Mines.csv")

Rows: 2205 Columns: 35
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (24): Mine_ID, MineName, Address, LAS_Name, MineStatus, Rec_Status, Oper...
dbl (11): X, Y, OBJECTID, Latitude, Longitude, LAS_CODE, ReportYear, ReportT...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

From lines 25-28 I used the selcet function to filter out the variables I’ll need

MS1 <- Mines %>% 
  select(MineName,ReportYear, Acres_Dist, PermitAcre, MineStatus, PriProduct, Opt_Type, Mech_Total)

Then from lines 32-35 I removed any missing variables

MS2 <- MS1 %>%
 filter(if_all(c(MineName, ReportYear, Acres_Dist, PermitAcre, MineStatus,PriProduct, Opt_Type, Mech_Total), ~ !is.na(.)))

From lines 39-41 I summoned a table to show me each of the mined materials and decided it would be more useful to put them into groups

table(MS2$PriProduct)


             Asbestos       Bituminous Rock               Borates 
                    1                     2                     3 
              Cinders                  Clay    Decomposed Granite 
                   12                    36                    46 
      Decorative Rock             Diatomite       Dimension Stone 
                   16                     4                     5 
             Dolomite              Feldspar             Fill Dirt 
                    4                     1                    39 
            Gemstones           Gold (Lode)         Gold (Placer) 
                    1                    18                     9 
               Gypsum              Iron Ore               Kyanite 
                    8                     3                     1 
            Limestone          NOT REPORTED Platinum Group Metals 
                   26                    19                     1 
               Pumice   Rare Earth Elements                  Rock 
                    7                     1                    41 
     Saline Compounds                  Salt       Sand and Gravel 
                    3                     1                   552 
                Shale                Silica        Specialty Sand 
                   13                     8                     7 
                Stone                  Talc               Zeoltes 
                   40                     4                     2

From lines 45-60 I manually placed each material into one of ten groups based on property. Fortunately I was able to find a website that would give me the definition of each item, called Mindat.org

MS6 <- MS2 %>%
  mutate(
    Materials = case_when(
      PriProduct %in% c("Rock", "Decorative Rock", "Bituminous Rock", "Pumice", "Diatomite", "Shale") ~ "Rocks",
      PriProduct %in% c("Clay") ~ "Clays",
      PriProduct %in% c("Gemstones", "Limestone", "Cinders", "Stone", "Dimension Stone") ~ "Stones",
      PriProduct %in% c("Rare Earth Elements", "Platinum Group Metals") ~ "Metals",
      PriProduct %in% c("Fill Dirt", "Decomposed Granite") ~ "Dirts",
      PriProduct %in% c("Specialty Sand", "Sand and Gravel") ~ "Sands",
      PriProduct %in% c("Iron Ore", "Zeoltes") ~ "Ores",
      PriProduct %in% c("Salt", "Saline Compounds", "Silica", "Dolomite", "Kyanite", "Feldspar", "Gypsum", "Talc", "Asbestos", "Borates" ) ~ "Minerals",
      PriProduct %in% c("Gold (Placer)", "Gold (Lode)") ~ "Gold",
      PriProduct %in% c("NOT REPORTED") ~ "Unreported",
      TRUE ~ "Other" ))

When trying to do linear regression I discovered that Most my variables were listed down as characters, so I quickly switch PermitAcre and Acres_Dist to numerical.

str(MS6)

tibble [934 × 9] (S3: tbl_df/tbl/data.frame)
 $ MineName  : chr [1:934] "AL'S CONCRETE (SIMS)" "COALINGA PIT #1" "SIERRA PACIFIC MATERIALS" "PETRY SAND PIT -  - RECLAIMED" ...
 $ ReportYear: num [1:934] 2023 2023 2023 2003 1995 ...
 $ Acres_Dist: num [1:934] 30 315 107 0 87 ...
 $ PermitAcre: chr [1:934] "30.00" "472.00" "100.00" "25.00" ...
 $ MineStatus: chr [1:934] "CLOSED NO INTENT TO RESUME" "ACTIVE" "ACTIVE" "RECLAIMED" ...
 $ PriProduct: chr [1:934] "Sand and Gravel" "Sand and Gravel" "Sand and Gravel" "Sand and Gravel" ...
 $ Opt_Type  : chr [1:934] "OPEN PIT" "OPEN PIT" "OPEN PIT" "OPEN PIT" ...
 $ Mech_Total: num [1:934] 33840 -332960 245870 8000 1328800 ...
 $ Materials : chr [1:934] "Sands" "Sands" "Sands" "Sands" ...

MS6$PermitAcre <- as.numeric(MS6$PermitAcre)

Warning: NAs introduced by coercion

MS6$Acres_Dist <- as.numeric(MS6$Acres_Dist)

From lines 72-77 I performed a linear regression that compared the distance between active and permitted acres, or unused acres, and compared it the material types.The lack of significant coefficients reinforces that differences in Materials do not have a meaningful impact on Acre_Difference. With large p-values and negative r-squared it is not the most accurate but I hypothesized that clay would have the biggest disparity between active and permitted acres

MS6$Materials <- as.factor(MS6$Materials)
MS6$Acre_Difference <- MS6$PermitAcre - MS6$Acres_Dist
lm_model <- lm(Acre_Difference ~ Materials, data = MS6)
summary(lm_model)


Call:
lm(formula = Acre_Difference ~ Materials, data = MS6)

Residuals:
   Min     1Q Median     3Q    Max 
  -948   -201   -128    -64  39035 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)
(Intercept)            431.3      278.7   1.548    0.122
MaterialsDirts        -335.0      330.3  -1.014    0.311
MaterialsGold         -105.1      418.9  -0.251    0.802
MaterialsMetals        -17.3     1182.3  -0.015    0.988
MaterialsMinerals      478.4      397.1   1.205    0.229
MaterialsOres         -378.5      778.3  -0.486    0.627
MaterialsRocks        -299.9      330.9  -0.906    0.365
MaterialsSands        -220.4      287.1  -0.768    0.443
MaterialsStones       -343.3      330.9  -1.038    0.300
MaterialsUnreported   -361.0      465.4  -0.776    0.438

Residual standard error: 1625 on 915 degrees of freedom
  (9 observations deleted due to missingness)
Multiple R-squared:  0.00885,   Adjusted R-squared:  -0.0008989 
F-statistic: 0.9078 on 9 and 915 DF,  p-value: 0.5176

From lines 81-89 I grouped the data calculated the mean for materials and permitted.

Ovrl_Material <- MS6 %>%
  group_by(Materials) %>%
  summarise(mean_acre = mean(Acres_Dist, na.rm = TRUE))

Ovrl_Permitted <- MS6 %>%
  group_by(Materials) %>%
  summarise(mean_permit_acre = mean(PermitAcre, na.rm = TRUE))

From lines 94-121 I combined elements a line and scatter plot to make what is known as a dumbbell plot, a plot that compares two related values and shows the difference between them. First I combined the two previously listed data frames into one. Then I calculated the gap between active acres and permitted acres and used the arrange command to sort them from biggest to smallest gap. Then I utilized geom_segment to create the connecting lines and made those lines gray. The following step was adding the points and adjusting their size to my preference. Finally I added the title, caption, legend and customized the sizes to my liking. I chose bright colors to stand out and went with grey simply because every dumbbell plot I’ve scene uses gray for the line.

dumbbell_data <- data.frame(
  Materials = Ovrl_Material$Materials,
  Active_Acres = Ovrl_Material$mean_acre,
  Permitted_Acres = Ovrl_Permitted$mean_permit_acre)

dumbbell_data <- dumbbell_data %>%
  mutate(gap = abs(Active_Acres - Permitted_Acres)) %>%
  arrange(desc(gap))  

ggplot(dumbbell_data) +
  geom_segment(aes(x = Active_Acres, xend = Permitted_Acres, y = reorder(Materials, gap), yend = reorder(Materials, gap)),
               color = "grey", size = 1) +
  geom_point(aes(x = Active_Acres, y = reorder(Materials, gap), color = "Active Acres"), size = 4) +
  geom_point(aes(x = Permitted_Acres, y = reorder(Materials, gap), color = "Permitted Acres"), size = 4) +
  labs(title = "Dumbbell Chart: Permitted vs Active Acres",
    x = "Acres",
    y = "Materials",
    color = "Legend",  
    caption = "Source: DMR's dataset from California.gov") +
  scale_color_manual(values = c("Active Acres" = "violet", "Permitted Acres" = "orange")) +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
    axis.text = element_text(size = 12),
    axis.title = element_text(size = 14),
    legend.text = element_text(size = 12),
    legend.title = element_text(size = 14))

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

The first visulaization displays the the drastic differences between which materials have the most unused materials. I hypothesized thta clay would have the biggest disparity, and while it cam second, it was really nowhere close. Minerals had by far the largest unused acres, thanks in part to having the highest average of active and permitted acres. The inverse also exposes how Ore ming waste next to no space. My only assumption is becuase ores are so far below the surfaces, you mine in one small spot and just head straight down. It makes ense that golds, metals and minerals would requat and use most space as they are very rare materials that probably require vast areas just to find small nuggets—while more common materials like rock, drit, and stones don’t need nearly as much space

From lines 128-130 I summoned a table to show we each Operation type, and like before, decided my best curse of action was to sort these into groups.

table(MS2$Opt_Type)


                                                                BRINE WELL 
                                                                         4 
                                             BRINE WELL,EVAPORATION SYSTEM 
                                                                         2 
                                                                    DREDGE 
                                                                         4 
                                                           DREDGE,OPEN PIT 
                                                                         2 
                                             DREDGE,OPEN PIT,PLANT OR MILL 
                                                                         1 
                                  DREDGE,PLANT OR MILL,TAILINGS PROCESSING 
                                                                         1 
                       DREDGE,STREAMBED OR GRAVEL BAR SKIMMING AND PITTING 
                                                                         2 
DREDGE,STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,OPEN PIT,PLANT OR MILL 
                                                                         1 
  DREDGE,STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,PLANT OR MILL,QUARRY 
                                                                         1 
                                                                 HYDRAULIC 
                                                                         1 
                                                        HYDRAULIC,OPEN PIT 
                                                                         1 
                                                                  OPEN PIT 
                                                                       537 
                                                    OPEN PIT,PLANT OR MILL 
                                                                        15 
                                             OPEN PIT,PLANT OR MILL,QUARRY 
                                                                         3 
                                                         OPEN PIT,PROSPECT 
                                                                         1 
                                                           OPEN PIT,QUARRY 
                                                                        14 
                                       OPEN PIT,QUARRY,TAILINGS PROCESSING 
                                                                         1 
                                              OPEN PIT,TAILINGS PROCESSING 
                                                                         6 
                                                      OPEN PIT,UNDERGROUND 
                                                                         1 
                                                             PLANT OR MILL 
                                                                         8 
                                                    PLANT OR MILL,PROSPECT 
                                                                         1 
                                                      PLANT OR MILL,QUARRY 
                                                                         2 
                                         PLANT OR MILL,TAILINGS PROCESSING 
                                                                         1 
                                                 PLANT OR MILL,UNDERGROUND 
                                                                         1 
                                                                  PROSPECT 
                                                                         2 
                                                                    QUARRY 
                                                                       174 
                                                QUARRY,TAILINGS PROCESSING 
                                                                         1 
                                                       QUARRY,UNDETERMINED 
                                                                         1 
                              STREAMBED OR GRAVEL BAR SKIMMING AND PITTING 
                                                                        85 
       STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,OPEN PIT,PLANT OR MILL 
                                                                         1 
STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,OPEN PIT,PLANT OR MILL,QUARRY 
                                                                         1 
                STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,PLANT OR MILL 
                                                                         4 
                       STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,QUARRY 
                                                                         2 
          STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,TAILINGS PROCESSING 
                                                                         1 
                                                       TAILINGS PROCESSING 
                                                                        22 
                                                               UNDERGROUND 
                                                                         6 
                                                              UNDETERMINED 
                                                                        23

From lines 135-147 I created 7 groups based on operation types under the knew variable “Mining” using the function mutate. This one didn’t require much research outside of googling waht a dredge was (mining underwater).

MS7 <- MS2 %>%
  mutate(
    Mining = case_when(
      Opt_Type %in% c("BRINE WELL", "BRINE WELL,EVAPORATION SYSTEM") ~ "Wells",
      Opt_Type %in% c("DREDGE", "DREDGE,OPEN PIT", "DREDGE,OPEN PIT,PLANT OR MILL", "DREDGE,PLANT OR MILL,TAILINGS PROCESSING", "DREDGE,STREAMBED OR GRAVEL BAR SKIMMING AND PITTING", "DREDGE,STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,OPEN PIT,PLANT OR MILL", "DREDGE,STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,PLANT OR MILL,QUARRY", "HYDRAULIC", "HYDRAULIC,OPEN PIT") ~ "Dredges",
      Opt_Type %in% c("PLANT OR MILL", "PLANT OR MILL,PROSPECT", "PLANT OR MILL,QUARRY", "PLANT OR MILL,TAILINGS PROCESSING", "PLANT OR MILL,UNDERGROUND", "PROSPECT") ~ "Plant/Mills",
      Opt_Type %in% c("STREAMBED OR GRAVEL BAR SKIMMING AND PITTING", "STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,OPEN PIT,PLANT OR MILL", "STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,OPEN PIT,PLANT OR MILL,QUARRY", "STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,PLANT OR MILL", "STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,TAILINGS PROCESSING", "STREAMBED OR GRAVEL BAR SKIMMING AND PITTING,QUARRY") ~ "Streamed",
      Opt_Type %in% c("QUARRY", "QUARRY,TAILINGS PROCESSING", "QUARRY,UNDETERMINED") ~ "Quarries",
      Opt_Type %in% c("OPEN PIT", "OPEN PIT,PLANT OR MILL", "OPEN PIT,PLANT OR MILL,QUARRY", "OPEN PIT,PROSPECT", "OPEN PIT,QUARRY", " OPEN PIT,QUARRY,TAILINGS PROCESSING", "OPEN PIT,UNDERGROUND", "OPEN PIT,TAILINGS PROCESSING", "OPEN PIT,QUARRY,TAILINGS PROCESSING") ~ "Open Pit",
       Opt_Type %in% c("UNDETERMINED", "UNDERGROUND", "TAILINGS PROCESSING") ~ "Undetermined",
      TRUE ~ "Other"))

From lines 152-156 I created my second linear regression. This one proved to fair a little bit better with a positive R squared but still had large p-values, indicating that Mining type has no significant effect on Mech_Total, as none of the coefficients are statistically significant….Based on the p-values I hypotheized that Open pit would have the most mech_total

MS7$Mining <- as.factor(MS7$Mining)
lm_model2 <- lm(Mech_Total ~ Mining, data = MS7)
summary(lm_model2)


Call:
lm(formula = Mech_Total ~ Mining, data = MS7)

Residuals:
      Min        1Q    Median        3Q       Max 
 -1678681  -1303670   -760169   -145984 103154279 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
(Intercept)          625591    1572848   0.398    0.691
MiningOpen Pit       720130    1591782   0.452    0.651
MiningPlant/Mills    130007    2186956   0.059    0.953
MiningQuarries      -260680    1634207  -0.160    0.873
MiningStreamed      -558995    1685911  -0.332    0.740
MiningUndetermined  -394157    1775654  -0.222    0.824
MiningWells          724135    2871614   0.252    0.801

Residual standard error: 5885000 on 927 degrees of freedom
Multiple R-squared:  0.007832,  Adjusted R-squared:  0.001411 
F-statistic:  1.22 on 6 and 927 DF,  p-value: 0.2936

When Trying to make a grpah I kept running into the same issue: there was an obsured uptick of production in 2023, particularly amongst OPEN PIT, so I decided it was best to remove it for the sake of the visualization.

MS9 <- MS7 %>% filter(ReportYear != "2023")

From lines 168-211 I made an area graph using high charter. First I did group_by with Mining and ReportYear to calculate counts, then calculated mean Mech_Total per Mining type. Next I simply listed the colors I wanted to use. After this I created the stacked area chart with ReportYear on the x-axis and set the x-axis categories to ReportYear. When making making the y-axis I set the min to 0 and the max the count. Under plotOptions I set the stacking to normal, the line width to 2.5, and fillOpacity to 0.7 because to ensure each variable pops out a bit. Next I used the arrange function to get the years for proper stacking. Following that I got the mean Mech_Total for the current Mining type and then set the colors for the current mining type from the ‘colors’ vector. Finally I used brackets to extract the mean value of Mech_Total for a specific mining type from the mean_mech_total data frame. From then on it was just plugging in data, color and what I wanted in my tooltip. I ended up choosing the to add the mean mech_total and rounded up to a whole number to make it clearer.

MS9_prepared <- MS9 %>%
  group_by(Mining, ReportYear) %>%
  summarise(Count = n(), .groups = "drop")

mean_mech_total <- MS9 %>%
  group_by(Mining) %>%
  summarise(mean_mech_total = mean(Mech_Total, na.rm = TRUE))

colors <- c("red", "steelblue", "forestgreen", "orange", "purple", "hotpink", "cyan")


highchart() %>%
  hc_chart(type = "area") %>%
  hc_title(text = "Dominant Operating Types in California (by Year)") %>%
  hc_xAxis(title = list(text = "Report Year"),
    categories = unique(MS9_prepared$ReportYear),  
    labels = list(format = "{value}")) %>%
  hc_yAxis(title = list(text = "Mining Operations"),
    labels = list(format = "{value}"),
    min = 0,  
    max = max(MS9_prepared$Count)) %>%
  hc_plotOptions(area = list(stacking = "normal",  
    marker = list(enabled = FALSE),
    lineWidth = 2.5,
    fillOpacity = 0.7)) %>%
  hc_caption(
    text = "Source: DMR's dataset from California.gov") %>%
  hc_add_series_list(lapply(unique(MS9_prepared$Mining), function(mining_type) {
      MS9_data <- MS9_prepared %>%
        filter(Mining == mining_type) %>%
        arrange(ReportYear)  

      mining_color <- colors[which(unique(MS9_prepared$Mining) == mining_type)]
      mean_mech <- mean_mech_total$mean_mech_total[mean_mech_total$Mining == mining_type]

      list(name = mining_type,
        data = MS9_data$Count,
        color = mining_color,
        tooltip = list(pointFormat = paste(
            "Number of Mining Operations: <b>{point.y}</b><br>",
            "Mean Mech Tota (per ton): <b>", round(mean_mech), "</b>")))})) %>%
  hc_add_theme(hc_theme_darkunica())

The second visulaization reveals the sheer dominance of Open Pit mining in California since the late 90s with a mean mech total of almost 700K tons per year. With peaks going of the graph, it is important to remeber that this category alone made me remove 2023 data as its production dwarfed the previous years. The only mining category that remotely gives Open Pit a run for its money are Quaries, followed by Streemed who average 67.5K and 40K a year, respectively. Others mining categories like Dredges, Plant/Mills, and Wells, barely show up—if at all— when compared to the other forms of mining. Finally the last category of Undetermind may be for only specific types mining, hence why its so small. There are also some pretty noticeable dips in production, like in the years 2000, 2005, 2008, 2017 and 2020. The only category that seems to have recovered from 2017 dip would be Open Pit

CONCLUSION: Mining Operation Types seem to have an effect on Mech_total, but Mined Materials can be used get generalized sense of how much land may be wasted, just not too specific as the linear regression analysis pointed out. I struggled a lot with this assignment, especially considering I decided to redo both my visualizations only a day ago because I didn’t like how they turned out (which is just how it goes sometimes). The first thing I struggled with was figuring out a way to the log function, in hopes that it would allow me to use the 2023 data, but it only made my visualizations look wonky. The second thing I struggled with was trying to use categorical variables for the regression. It took me a while, and I was fortunate to remember the factor command. It’s unfortunate that the numbers didn’t have stronger correlations. I am pleased with my visualizations though and am glad I challenged myself one last time.

RESEARCHER: From my brief background reserch I was able to learn many new things, such as dredging is the process of excavating debris from the bottom ofs rivers, lakes, and harbors, to maintain navigation channels, and prevent flooding, in addition to being a form mining valuable resources (USDC). I also accidentally learn more about California’s rich mining history and how it stems far before the gold rush as well as how mining/railroad jobs made California one of the most diverse states in the country (The Nat). Lastly this project heavily expanded my geology knowledge. Like listed eraly, I found a website called MIndat, which made this stage of my reasearch a cake walk. One common trend I found is that almost none of the listed rocks, metals, mierals, ores, or others went below 15,000 feet (Mindat).

WORKS CITED:

“DMR All Mines.” DMR All Mines | CA Open Data, lab.data.ca.gov/dataset/dmr-all-mines.

“History of Mining in Southern California.” The Nat, www.sdnhm.org/exhibitions/all-that-glitters/history/history-of-mining-in-southern-california/.

“Mindat.Org.” Mindat.Org - Mines, Minerals and More, www.mindat.org/.

US Department of Commerce, National Oceanic and Atmospheric Administration. “What Is Dredging?” NOAA’s National Ocean Service, 6 Sept. 2013, oceanservice.noaa.gov/facts/dredging.html.