What are the Top Tech companies to work for? What are the characteristics that make them so?
Needless to say, it’s been a turbulent few years. Companies have either risen to the top or fallen off thanks to competing pressures from all sides. Over the course of this class, we discussed a number of companies, I am curious to explore what are the Top Tech Companies and its characteristics. I think it might be very interesting to do our research, so we can ultimately land our dream job.
Coming from a finance background, as a career changer, researching companies would be helpful to help us align with great employers. Also, I really enjoy the data cleaning part, drilling into a lot of information and having that “ah-ha” moment when I can carve out useful insights. I love creating a fabulous dashboard so I will be creating and embedding the Tableau Dashboard to showcase the finding.
library(tidyverse)## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1 ✔ purrr 1.0.1
## ✔ tibble 3.2.1 ✔ dplyr 1.1.2
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(stringr)
library(dplyr)
library(tidyr)
library(rvest)##
## Attaching package: 'rvest'
##
## The following object is masked from 'package:readr':
##
## guess_encoding
library(stopwords)
library(kableExtra)##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
library(wordcloud2)
1. https://www.kaggle.com/tomasmantero/top-tech-companies-stock-price?select=Technology+Sector+List.csv
In answering the question above, the following steps was followed:
—Acquire tech stock data.
—Filter for highest value (growth or market cap) companies.
—Verify corresponding company review on Glassdoor (if < 3.5, drop).
—For each company, scrape the “Pros” section of the top 10 reviews.
—Tidy and transform our collection of reviews.
—Visualize most frequent, pertinent verbage via table, barplot, and wordcloud.
—Analyze and export data to Tableau
—Create a Dashboard in Tableau Public and embed the Dashboard into R markdown
—Conclude
—-We pulled in (2) different csv files: a list of the Top 100 tech
companies and their corresponding market metrics as well as a list of
S&P 500 companies (and corresponding sector)
—-From the exploration, it seems that sp_table dataset’s only
interesting addition would have been its “Sector” variable, I decide to
shift from merging the 2 tables, instead, just explore the tech_table
dataset and manually add sector variable.
#Read in csv files
tech_data <- read_csv("https://raw.githubusercontent.com/yinaS1234/data-607/main/project%20final/tech_sector_list.csv")## Rows: 100 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Symbol, Name
## dbl (7): Price, Change, % Change, Volume, Avg Vol, Market Cap (Billions), PE...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
tech_table <- as_tibble(tech_data)
#head(tech_table)
sp_data <- read_csv("https://raw.githubusercontent.com/yinaS1234/data-607/main/project%20final/sp500_list.csv")## Rows: 505 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): Symbol, Name, Sector
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sp_table <- as_tibble(sp_data)
#EDA of table --> size, sector --> dataframe of companies
ncol(tech_table)## [1] 9
nrow(tech_table)## [1] 100
summary(tech_table)## Symbol Name Price Change
## Length:100 Length:100 Min. : 5.12 Min. :-8.500
## Class :character Class :character 1st Qu.: 72.42 1st Qu.:-0.290
## Mode :character Mode :character Median : 131.42 Median : 0.710
## Mean : 181.01 Mean : 1.048
## 3rd Qu.: 244.68 3rd Qu.: 2.215
## Max. :1020.00 Max. :15.100
##
## % Change Volume Avg Vol
## Min. :-0.029700 Min. : 483 Min. : 6552
## 1st Qu.:-0.001725 1st Qu.: 946206 1st Qu.: 1114250
## Median : 0.008050 Median : 1324000 Median : 1853000
## Mean : 0.012525 Mean : 5939219 Mean : 6542440
## 3rd Qu.: 0.019200 3rd Qu.: 3121500 3rd Qu.: 4757250
## Max. : 0.202400 Max. :127959000 Max. :150549000
##
## Market Cap (Billions) PE Ratio
## Min. : 14.60 Min. : 9.02
## 1st Qu.: 20.09 1st Qu.: 26.89
## Median : 32.08 Median : 33.84
## Mean : 94.60 Mean : 70.26
## 3rd Qu.: 75.33 3rd Qu.: 57.59
## Max. :1936.00 Max. :667.10
## NA's :26
#filter 1: "% Change"
high_growth <- filter(tech_table, `% Change` > 0.06) #top 3
#filter 2: "Market Cap"
high_val <- filter(tech_table, `Market Cap (Billions)` > 1000) #top 2
#Merge data frames
filtered <- rbind(high_growth, high_val)
#Add "Sector" column - manually keyed in
filtered$Sector <- c("Financial Services", "Big Data", "Semiconductor", "Big Tech", "Big Tech")
filtered## # A tibble: 5 × 10
## Symbol Name Price Change `% Change` Volume `Avg Vol` Market Cap (Billions…¹
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SQ Squar… 208. 11.9 0.0609 1.22e7 9772000 93.8
## 2 PLTR Palan… 21.0 2.89 0.159 8.38e7 52088000 39.5
## 3 UMC Unite… 6.95 1.17 0.202 1.12e7 3153000 15.8
## 4 AAPL Apple… 114. -3.49 -0.0297 1.28e8 150549000 1936
## 5 MSFT Micro… 210. -0.28 -0.0013 2.57e7 31868000 1589
## # ℹ abbreviated name: ¹`Market Cap (Billions)`
## # ℹ 2 more variables: `PE Ratio` <dbl>, Sector <chr>
write.csv(tech_table, "/Users/linda/Desktop/techtable.csv")
write.csv(filtered, "/Users/linda/Desktop/filteredstock.csv")
From our Top 100 Tech Companies list,we narrow down to 5:
—–Square, Palantir, and United Microelectronics were filtered in as
the top 3 highest growth companies
—–Apple and Microsoft were BY FAR the highest value companies in our
list (with a Market Cap greater than $1 trillion)
—Verify corresponding company review on Glassdoor (if < 3.5,
drop).
Drop United Microelectronics (with a rating of 2.5) and proceed with just four companies below.
—-Web-scrape reviews.
##APPLE##
#1. Download HTML and convert to XML with read_html()
a <- read_html("https://www.glassdoor.com/Reviews/Apple-Reviews-E1138.htm")
#2. Extract specific nodes with html_nodes()
a_ext <- html_nodes(a,'.v2__EIReviewDetailsV2__fullWidth:nth-child(1) span')
#3. Extract review text from HTML
a_pros <- html_text(a_ext) #collect pros section of 1st 10 reviews
##MICROSOFT##
m <- read_html("https://www.glassdoor.com/Reviews/Microsoft-Reviews-E1651.htm")
m_ext <- html_nodes(m,'.v2__EIReviewDetailsV2__fullWidth:nth-child(1) span')
m_pros <- html_text(m_ext) #collect pros section of 1st 10 reviews
##PALANTIR##
p <- read_html("https://www.glassdoor.com/Reviews/Palantir-Technologies-Reviews-E236375.htm")
p_ext <- html_nodes(p,'.v2__EIReviewDetailsV2__fullWidth:nth-child(1) span')
p_pros <- html_text(p_ext) #collect pros section of 1st 10 reviews
##SQUARE##
s <- read_html("https://www.glassdoor.com/Reviews/Square-Reviews-E422050.htm")
s_ext <- html_nodes(s,'.v2__EIReviewDetailsV2__fullWidth:nth-child(1) span')
s_pros <- html_text(s_ext) #collect pros section of 1st 10 reviews—Tidy and transform our collection of reviews.
—–merge to create one dataframe
—–handle special character
—–remove white spaces
—–change to lower cases
—–remove stopwords. non-descriptive words
—–remove NAs
—–output to a table
#merge data frames
merged_pros <- rbind(a_pros, m_pros, p_pros, s_pros)
#Tidy text via regular expressions
#Handle special characters and digits
merged_pros <- str_replace_all(merged_pros, "[^[:alnum:]]", " ") #remove non-alpha numeric characters
merged_pros <- str_replace_all(merged_pros, "[!^[:digit:]]", "") #remove digits
#Handle white space
merged_pros <- trimws(merged_pros)
merged_pros <- str_replace_all(merged_pros, "\\s+", " ") #compress whitespace
merged_pros <- str_replace_all(merged_pros, "' '", "','") #' ' ' --> ','
#Remove excess characters and properly split and then re-merge the vector
merged_pros <- str_split(merged_pros, pattern=" ") #convert vector to list at each ,
merged_pros <- unlist(merged_pros) #convert list back to vector
merged_pros <- tolower(merged_pros) #convert list to lowercase
stopwords_regex = paste(stopwords('en'), collapse = '\\b|\\b')
stopwords_regex = paste0('\\b', stopwords_regex, '\\b')
merged_pros = stringr::str_replace_all(merged_pros, stopwords_regex, '')#Rearrange words into a table format
merged_pros <- as_tibble(merged_pros) #useful?
count1 <- merged_pros %>% count(value, sort = TRUE)
##Drop rows with (perceived) non-pertinent verbage:
refined <- subset(count1, n>=4, select=c(value, n))
#replace ALL non-word, non-descriptive entries as "" and then NA
refined$value <- as.character(refined$value)
refined$value[refined$value == "great"] <- ""
refined$value[refined$value == "good"] <- ""
refined$value[refined$value == "t"] <- ""
refined$value[refined$value == "s"] <- ""
refined$value[refined$value == "can"] <- ""
refined$value[refined$value == "lot"] <- ""
refined$value[refined$value == "amazing"] <- ""
refined$value[refined$value == "ll"] <- ""
refined$value[refined$value == "everyone"] <- ""
refined$value[refined$value == "everything"] <- ""
refined$value[refined$value == "get"] <- ""
refined$value[refined$value == "like"] <- ""
refined$value[refined$value == "palantir"] <- ""
refined$value[refined$value == "square"] <- ""
refined$value[refined$value == "apple"] <- ""
refined$value[refined$value == "ve"] <- ""
refined$value[refined$value == "truly"] <- ""
refined$value[refined$value == "best"] <- ""
refined$value[refined$value == ""] <- NA
#Remove NA entries
refined<-subset(refined, (!is.na(refined[,1])) & (!is.na(refined[,2])))
#output as kable table
refined %>%
kbl() %>%
kable_minimal()| value | n |
|---|---|
| work | 25 |
| benefits | 20 |
| company | 20 |
| people | 15 |
| culture | 12 |
| employees | 9 |
| time | 9 |
| microsoft | 8 |
| opportunities | 8 |
| pay | 8 |
| life | 7 |
| stock | 7 |
| compensation | 6 |
| development | 6 |
| large | 6 |
| many | 6 |
| outside | 6 |
| software | 6 |
| career | 5 |
| competitive | 5 |
| day | 5 |
| different | 5 |
| employee | 5 |
| etc | 5 |
| go | 5 |
| growth | 5 |
| management | 5 |
| nice | 5 |
| options | 5 |
| product | 5 |
| working | 5 |
| years | 5 |
| companies | 4 |
| don | 4 |
| innovative | 4 |
| leadership | 4 |
| means | 4 |
| much | 4 |
| new | 4 |
| one | 4 |
| opportunity | 4 |
| place | 4 |
| plan | 4 |
| remote | 4 |
| smart | 4 |
| something | 4 |
| team | 4 |
| tech | 4 |
| use | 4 |
| well | 4 |
| worked | 4 |
| world | 4 |
—Visualize most frequent, pertinent verbage via table, barplot, and
wordcloud.
#visualize the frequency count
ggplot(refined) +
geom_bar(aes(reorder(value,n) , y = n, fill=value), stat = "identity", position = "dodge", width = 1) + coord_flip() +
theme(legend.position = "none") +
labs( title = "Word Count Frequency", x = "", y = "", fill = "Source")
#word cloud
wordcloud2(data=refined, color = "random-light", backgroundColor = "grey")
write.csv(refined, "/Users/linda/Desktop/refined.csv")
Hit —open in browser— for full view of the dashboard
Top companies: Apple, Microsoft, Palantir and
Square.
—-Terms like “work”, “company”,“opportunities” indicate that these
companies offer meaningful work, growth. Employees feel like they’re
building toward something greater than themselves.
—-Terms like “people”,”culture“, and”employees” indicate these
companies have inclusive culture and that employees of these companies
feel a sense of belonging. They feel like they’re a part of something
and like their opinion matters.
—Terms like “benefits”, “pay”, and “time” indicate that these
companies take care of their employees (not just in word but in
deed).
Differentiating characteristics: meaningful work,
sense of belonging, employer care for employees
Employer
These findings would help employers can tune their mission, culture, and
compensation packages to attract higher and higher level employees while
employees now have an idea of “what’s out there?”.
JobSeekers
For those uncertain as to what direction to head or what their
specialty might be, why not choose a great place to work? That way, at
least they have a better chance in enjoying their day-to-day as they
gain clarity and experience.
Word spreads, and a long term focus on the right characteristics could pay off handsomely whether for employer or employee.
Tomas Mantero. Top Tech Companies Stock Price retreived and
stored as csv in github https://www.kaggle.com/tomasmantero/top-tech-companies-stock-price?select=Technology+Sector+List.csv\
Glassdoor. Glassdoor webscraped Company Reviews https://www.glassdoor.com/member/home/companies.htm\