Mineral ores around the world

Contexto

Los minerales son recursos naturales fundamentales para la industria y el desarrollo humano, se encuentran en la corteza terrestre en forma de menas, que contienen concentraciones aprovechables de minerales valiosos, estos se extraen y procesan para obtener metales como hierro, cobre, oro, plata y aluminio, además de elementos clave como el litio y el cobalto, esenciales en la tecnología moderna.

Los yacimientos minerales varían en abundancia y tipo según la geología de cada región, y su explotación tiene un impacto económico y ambiental significativo, por ello, es crucial encontrar un equilibrio entre su aprovechamiento y la sostenibilidad.

Los minerales son regalos preciosos del universo, este conjunto de datos habla de esos minerales, estos minerales se encuentran en toda la Tierra, consta de 22 columnas y una enorme cantidad de filas.

Importación de datos

library(readxl)
data <-read.csv("Mineral ores round the world.csv")
head(data)
##             site_name latitude longitude region       country  state county
## 1    Lookout Prospect 55.05612 -132.1434   <NA> United States Alaska       
## 2 Lucky Find Prospect 55.52751 -132.6851   <NA> United States Alaska       
## 3 Mccullough Prospect 55.97751 -132.9991   <NA> United States Alaska       
## 4     Lucky Jim Claim 55.52195 -132.6865   <NA> United States Alaska       
## 5  Matilda Occurrence 55.14556 -132.0523   <NA> United States Alaska       
## 6     Marion Prospect 55.14695 -132.4851   <NA> United States Alaska       
##   com_type commod1      commod2      commod3 oper_type dep_type prod_size
## 1        M  Copper Gold, Silver                Unknown                  N
## 2        M  Copper         Gold                Unknown                  N
## 3        M  Copper                Zinc, Gold   Unknown                  N
## 4        M    Gold              Copper, Lead   Unknown                  N
## 5        M    Gold                             Unknown                  N
## 6        M  Copper                      Lead   Unknown                  N
##     dev_stat                              ore                    gangue
## 1 Occurrence  Chalcopyrite, Covellite, Pyrite          Quartz, Sericite
## 2 Occurrence             Chalcopyrite, Pyrite Calcite, Quartz, Siderite
## 3 Occurrence Chalcopyrite, Pyrite, Sphalerite                    Quartz
## 4 Occurrence        Galena, Malachite, Pyrite                          
## 5 Occurrence                           Pyrite                          
## 6 Occurrence     Chalcopyrite, Galena, Pyrite                          
##     work_type                                      names             ore_ctrl
## 1             Conundrum, Mammoth, Wakefield Minerals Co.                     
## 2 Underground                                            Vein Follows Contact
## 3                    Claims: Horseshoe, Copper, Lake Bay                     
## 4                                                                            
## 5                                                                            
## 6 Underground                     Nutqua Gold Mining Co.                     
##    hrock_type arock_type
## 1      Schist           
## 2     Diabase           
## 3   Siltstone           
## 4     Granite    Granite
## 5 Mica Schist           
## 6      Schist

La base de datos consta de 304.632 observaciones de 22 variables las cuales son:

colnames(data)
##  [1] "site_name"  "latitude"   "longitude"  "region"     "country"   
##  [6] "state"      "county"     "com_type"   "commod1"    "commod2"   
## [11] "commod3"    "oper_type"  "dep_type"   "prod_size"  "dev_stat"  
## [16] "ore"        "gangue"     "work_type"  "names"      "ore_ctrl"  
## [21] "hrock_type" "arock_type"

El tipo de variables es la siguiente:

str(data)
## 'data.frame':    304632 obs. of  22 variables:
##  $ site_name : chr  "Lookout Prospect" "Lucky Find Prospect" "Mccullough Prospect" "Lucky Jim Claim" ...
##  $ latitude  : num  55.1 55.5 56 55.5 55.1 ...
##  $ longitude : num  -132 -133 -133 -133 -132 ...
##  $ region    : chr  NA NA NA NA ...
##  $ country   : chr  "United States" "United States" "United States" "United States" ...
##  $ state     : chr  "Alaska" "Alaska" "Alaska" "Alaska" ...
##  $ county    : chr  "" "" "" "" ...
##  $ com_type  : chr  "M" "M" "M" "M" ...
##  $ commod1   : chr  "Copper" "Copper" "Copper" "Gold" ...
##  $ commod2   : chr  "Gold, Silver" "Gold" "" "" ...
##  $ commod3   : chr  "" "" "Zinc, Gold" "Copper, Lead" ...
##  $ oper_type : chr  "Unknown" "Unknown" "Unknown" "Unknown" ...
##  $ dep_type  : chr  "" "" "" "" ...
##  $ prod_size : chr  "N" "N" "N" "N" ...
##  $ dev_stat  : chr  "Occurrence" "Occurrence" "Occurrence" "Occurrence" ...
##  $ ore       : chr  "Chalcopyrite, Covellite, Pyrite" "Chalcopyrite, Pyrite" "Chalcopyrite, Pyrite, Sphalerite" "Galena, Malachite, Pyrite" ...
##  $ gangue    : chr  "Quartz, Sericite" "Calcite, Quartz, Siderite" "Quartz" "" ...
##  $ work_type : chr  "" "Underground" "" "" ...
##  $ names     : chr  "Conundrum, Mammoth, Wakefield Minerals Co." "" "Claims: Horseshoe, Copper, Lake Bay" "" ...
##  $ ore_ctrl  : chr  "" "Vein Follows Contact" "" "" ...
##  $ hrock_type: chr  "Schist" "Diabase" "Siltstone" "Granite" ...
##  $ arock_type: chr  "" "" "" "Granite" ...

Se encuentran variables de tipo character cualitativa y de tipo number cuantitativa, las variables atribuidas son:

Character: site_name, region, country, state, county, com_type, commod1, commod2, commod3, oper_type, dep_type, prod_size, dev_stat, ore, gangue, work_type, names, ore_ctrl, hrock_type, arock_type

Number: Latitude, Longitude

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
filtered_data <- data %>%
  filter(hrock_type > 10)
Tabla_1 <- data %>%
  group_by(hrock_type) %>%                                  
  summarise(Total = n()) %>%                                
  mutate(Porcentaje = round(Total / sum(Total) * 100, 3)) %>%   
  arrange(hrock_type)

print(Tabla_1)
## # A tibble: 3,120 × 3
##    hrock_type                                                   Total Porcentaje
##    <chr>                                                        <int>      <dbl>
##  1 ""                                                          235103     77.2  
##  2 "Alkali Rhyolite,Latite"                                         1      0    
##  3 "Alkali Syenite"                                                 2      0.001
##  4 "Alkali Syenite,Nepheline Syenite,Phonolite"                     1      0    
##  5 "Alkali Syenite,Quartzite,Limestone"                             1      0    
##  6 "Alkali-Granite (Alaskite)"                                      5      0.002
##  7 "Alkalic Intrusive Rock"                                         4      0.001
##  8 "Alkalic Intrusive Rock,Volcanic Breccia (Agglomerate),Tra…      1      0    
##  9 "Alkalic Volcanic Rock"                                          1      0    
## 10 "Alluvium"                                                    1245      0.409
## # ℹ 3,110 more rows
Tabla_1 <- Tabla_1 %>%
  filter(hrock_type %in% c("Basalt", "Chert"))
print(Tabla_1)
## # A tibble: 2 × 3
##   hrock_type Total Porcentaje
##   <chr>      <int>      <dbl>
## 1 Basalt       589      0.193
## 2 Chert        293      0.096
knitr::kable(Tabla_1, caption = "Frecuencia de Tipos de Roca (hrock_type)")
Frecuencia de Tipos de Roca (hrock_type)
hrock_type Total Porcentaje
Basalt 589 0.193
Chert 293 0.096

Gráfico de barras de hrock_type

ggplot(Tabla_1, aes(x = reorder(hrock_type, -Total), y = Total, fill = hrock_type)) +
  geom_bar(stat = "identity") +
  labs(title = "Distribución de hrock_type", x = "Tipo de Roca", y = "Frecuencia") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))