The Explosion of BIG Data

  • Government
  • Business and Finance
  • Healthcare
  • Manufacturing and Retail
  • Social Media
  • Telecommunications
  • Transportation
  • Education

Demand for Statisticians and Data Scientists

code
library(readxl)
library(ggplot2)
library(forcats)
library(patchwork)
job <- read_excel("job posting.xlsx")

job |> 
  ggplot(aes(x = fct_reorder(Country, Stat),
             y = Stat,
             fill = Country)) +
  geom_col() +
  geom_text(vjust = 0.5,
            hjust = 1.2,
            size = 4,
            label = job$Stat) +
  labs(x = "Country",
       y = " ") +
  coord_flip() + 
  theme_classic() +
  theme(legend.position = "none") +
  ggtitle("Number of Job Postings for Statisticians as of Sept 2022")

code
library(readxl)
library(ggplot2)
library(forcats)
library(patchwork)
job <- read_excel("job posting.xlsx")

job |> 
  ggplot(aes(x = fct_reorder(Country, DS),
             y = DS,
             fill = Country)) +
  geom_col() +
  geom_text(vjust = 0.5,
            hjust = 1.2,
            size = 4,
            label = job$DS) +
  labs(x = "Country",
       y = " ") +
  coord_flip() + 
  theme_classic() +
  theme(legend.position = "none") +
  ggtitle("Number of Job Postings for Data Scientists as of Sept 2022")

Skills Set Needed for a Successful Statistician

Mathematics and Statistics

  • Probability & statistical inference
  • Regression analysis & multivariate analysis
  • Time series analysis
  • Linear algebra & Calculus

Software

  • R/R Studio
  • Python
  • SAS
  • Stata

  • Dissect complex problems
  • Translate a research or business question into a statistical question
  • Discern between good and bad statistics
  • Critically assess validity and reliability of statistical methods and results
  • Ability to translate statistical findings into clear, actionable recommendations
  • Able to present analyses results in layman’s terms
  • Has a strong foundation in data visualization
  • Able to write a clear and structured reports of statistical analysis and results
  • Able to tell a story behind the data and highlight the relevance and implications of findings to decision-making

Data Visualization

code

library(plotly)

p <- iris |> 
  select(Sepal.Length, Petal.Length, Species) |> 
  ggplot(aes(x = Sepal.Length, 
             y = Petal.Length)) +
  geom_point(aes(color=Species)) +
  labs(x="Sepal Length",
       y="Petal Length") +
  theme_classic()

ggplotly(p)
code
library(gapminder)
library(gganimate)

ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, colour = country)) +
  geom_point(alpha = 0.7, show.legend = FALSE) +
  scale_colour_manual(values = country_colors) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap(~continent) +
  labs(title = 'Year: {frame_time}', x = '', y = 'Life expectancy') +
  ggtitle("GDP per capita of Countries: 1952-2007") +
  transition_time(year) +
  ease_aes('linear')

How and where to acquire these skills set

  • BS in Statistics and related/allied fields
    • 36 units of GE courses
    • 39 units of Statistics core courses
    • 24 units of Mathematics and Computing courses
    • Electives & OJT
  • Pursue graduate studies
    • MS or PhD in Statistics
    • MS or PhD in Data Science
  • Coursera
  • DataCamp
  • edX
  • Khan Academy
  • Kaggle
  • Udacity
  • Datanovia
  • Youtube
  • r-blogers.com
  • github pages

How and where to acquire these skills set

  • Join professional organizations: PSAI, ASA, ISI
  • Attend seminars & conferences
  • Join online communities: Linkedin, Stack Overflow, Cross Validated
  • Regularly update your skill set with the latest statistical methods and software
  • Follow influential statisticians and data scientists on social media and subscribe to relevant podcasts
  • Read journals and publications
  • Contribute to open-source projects on platforms like GitHub
  • Equip yourself with the recent trends: AI, Machine learning, Real-time analytics

References

Kalita, JK, Bhattacharyya, JK, and Roy, S (2023). Fundamentals of Data Science: Theory and Practice. Academic Press.

R Core Team (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

Posit team (2023). RStudio: Integrated Development Environment for R. Posit Software, PBC, Boston, MA. URL http://www.posit.co/

https://psa.gov.ph/infographics?field_sector_value=Labor%20and%20Employment&page=1

https://quarto.org/docs/presentations/revealjs/

https://github.com/gadenbuie/xaringanthemer

https://github.com/chris-allones/RTalks/blob/main/dbm-special-lecture-2024/index.qmd

https://www.shutterstock.com/image-vector/futuristic-big-data-technology-concept-art-1044083485

https://www.linkedin.com/jobs/

https://www.jobstreet.com.ph/statistician-jobs

https://psa.gov.ph/career