For our week 5 discussion, we were to come forward with datasets exemplifying “untidy” data. Project 2 requires us to take 3 of the peer posted examples from Week 5, tidy the data, and then perform the analysis requested.
This portion will be focusing on Kory’s African American therapists in LA and his provided untidy example.
Sources: - African American Therapists in LA: https://www.psychologytoday.com/us/therapists/ca/los-angeles?category=african-american Classmate: Kory Martin & his post
Let’s take a look at the data.
therapists <- read_csv("https://raw.githubusercontent.com/d-ev-craig/DATA607/main/Projects/Project2%20-%20Untidy%20Data/therapists.csv")
## Rows: 3200 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): X1, X2, X3
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(therapists)
## # A tibble: 6 × 3
## X1 X2 X3
## <chr> <chr> <chr>
## 1 Uriah Cty "Marriage & Family Therapist, MA, LMFT" Los …
## 2 <NA> "Maybe you remember this scene from a movie; picture it, a … & On…
## 3 <NA> "(213) 513-5553" <NA>
## 4 <NA> <NA> <NA>
## 5 James Birks "Marriage & Family Therapist, LMFT" Los …
## 6 <NA> "Accepting Teletherapy Clients Only. In today's world It ca… & On…
therapists
## # A tibble: 3,200 × 3
## X1 X2 X3
## <chr> <chr> <chr>
## 1 Uriah Cty "Marriage & Family Therapist, MA, LMFT" Los …
## 2 <NA> "Maybe you remember this scene from a movie; picture it,… & On…
## 3 <NA> "(213) 513-5553" <NA>
## 4 <NA> <NA> <NA>
## 5 James Birks "Marriage & Family Therapist, LMFT" Los …
## 6 <NA> "Accepting Teletherapy Clients Only. In today's world It… & On…
## 7 <NA> <NA> <NA>
## 8 <NA> <NA> <NA>
## 9 Taronda Jones "Clinical Social Work/Therapist, LCSW" Los …
## 10 <NA> "Are you looking for someone to help you navigate throug… & On…
## # … with 3,190 more rows
The variables that we need to create after taking a look seem to be
the following: - Name
- Focus - Credentials
- Description
- Phone Number - Location - Mode
- Accepting Clients
We will expand each of the columns one by one to get our values. Grabbing names from the first column will be first.
name <- therapists %>% filter(!is.na(X1))
colnames(name)[1] <- 'name'
name
## # A tibble: 800 × 3
## name X2 X3
## <chr> <chr> <chr>
## 1 Uriah Cty Marriage & Family Therapist, MA, LMFT Los Angeles…
## 2 James Birks Marriage & Family Therapist, LMFT Los Angeles…
## 3 Taronda Jones Clinical Social Work/Therapist, LCSW Los Angeles…
## 4 Christina Harrison Clinical Social Work/Therapist, LCSW Los Angeles…
## 5 Eric Michael Katende Marriage & Family Therapist Associate, AMFT Los Angeles…
## 6 Brittany Williams Pre-Licensed Professional Los Angeles…
## 7 Claudia Williams Pre-Licensed Professional, MSW, ACSW Los Angeles…
## 8 Bradlisia Dixon Marriage & Family Therapist, LMFT Los Angeles…
## 9 Dr. Daryl M Rowe Psychologist, PhD Los Angeles…
## 10 Camille Tenerife Marriage & Family Therapist, LMFT Los Angeles…
## # … with 790 more rows
name <- name[,1]
In the second column, we are looking to pull out the following
variables - Specialty - Description
- Phone Number
We will approach this by creating sequence vectors that identify the values in the column we are interested in. We then take all those values, turn them into a column, and combine those columns to create our new dataframe. I am choosing this method since pivot_wider is performed within a cell and each variable is at a fixed sequence in the column.
#coalesce method -not used
#nameExpanded <- name %>% slice(rep(1:n(),each = 4))
#nameExpanded
#therapists2 <- data.frame(nameExpanded,therapists[,-1])
#therapists2
#df %>%
#mutate(A = coalesce(A,B))
#Extract values by sequences
specialSeq <- seq(from=1,to=3200, by = 4)
specialty <- therapists$X2[specialSeq]
therapists2 <- cbind(name,specialty)
#Desc Sequence
descSeq <- seq(from=2,to=3200, by = 4)
desc <- therapists$X2[descSeq]
therapists2 <- data.frame(therapists2,desc)
#phone number Sequence
phoneSeq <- seq(from=3,to=3200, by = 4)
phone <- therapists$X2[phoneSeq]
therapists2 <- data.frame(therapists2,phone)
#The 4th cell in the sequence of column 2 are all NAs
#scraps Sequence
#scrapSeq <- seq(from=4,to=3200, by = 4)
#scrap <- therapists$X2[scrapSeq]
#therapists2 <- data.frame(therapists2,scrap)
head(therapists2)
## name specialty
## 1 Uriah Cty Marriage & Family Therapist, MA, LMFT
## 2 James Birks Marriage & Family Therapist, LMFT
## 3 Taronda Jones Clinical Social Work/Therapist, LCSW
## 4 Christina Harrison Clinical Social Work/Therapist, LCSW
## 5 Eric Michael Katende Marriage & Family Therapist Associate, AMFT
## 6 Brittany Williams Pre-Licensed Professional
## desc
## 1 Maybe you remember this scene from a movie; picture it, a peaceful airplane ride becomes turbulent. The plane begins losing altitude rapidly. The flight attendant urgently instructs you to "put your oxygen mask on first" before helping others. That simple, but critical statement, is just as important in our everyday lives. Remembering to take time to yourself, non selfishly, can be difficult for anyone who tends to place others' needs first. Together, we'll explore your thoughts, feelings, decisions, choices, wants, and needs in your therapy. I will assist you in" finding your voice" and help you to "put your oxygen mask on first.
## 2 Accepting Teletherapy Clients Only. In today's world It can be so difficult to connect to our truth and live as our authentic selves. My mission is to create a non-judgmental, supportive and affirmative environment that will encourage you to heal, to grow and to reach your fullest potential. Through collaboration, empathetic listening and challenging negative patterns we will work together and help you reach your goals. Embarking on a therapeutic journey takes hard work, vulnerability and commitment. So if you are willing to put in the work I am committed to helping you succeed.
## 3 Are you looking for someone to help you navigate through difficult times? Do you find yourself feeling alone with no one to help you find solutions to your problems? Are you a coupIe struggling to have a healthy relationship? I am a Licensed Clinical Social Worker in the State of California, Oregon and Washington providing hope and encouragement to those in need. I have over 9 years' experience as a psychotherapist and have worked with all ages, from early childhood to the aged adult.
## 4 My primary goal in working in mental health is to put myself out of a job. Given that each of us intrinsically know what we need to heal and grow, my approach to psychotherapy is to couple your expertise of being you – your reality and lived experiences – with my skillset, in order to collaboratively work towards achieving your goals. Through a culturally-affirming and healing-centered stance, I utilize evidence-based treatments to facilitate our work.
## 5 Reaching out for help is one of the most humbling experiences we can go through in life. In a world that is constantly pushing narratives on us based on race, gender, sexuality and/or dominant cultural beliefs, it is life-affirming to engage preferred narratives that support and empower our own values. In difficult times it is easy to feel misunderstood or lost, longing to reconnect with our own resources. My work aims for those reconnections. My approach is collaborative, working from a place of respectful curiosity. I am not the expert on your lived experience. You are.
## 6 I am in the final year of my master's degree in social work. As a masters-level clinician, I want to collaborate with you to address and achieve your therapeutic goals. My experience includes working with survivors of domestic violence, sexual assault, and elder abuse, among other victimizations. I have facilitated domestic violence groups where I have educated survivors on the cycle of abuse, healthy relationships, and setting healthy boundaries with the people in their lives.
## phone
## 1 (213) 513-5553
## 2 <NA>
## 3 (323) 347-3314
## 4 (213) 320-6802
## 5 (213) 212-7852
## 6 <NA>
In the third column we want to pull the following variables:
- Location
- Online - Client Acceptance
#Location Seq
locSeq <- seq(from=1,to=3200, by = 4)
loc <- therapists$X3[locSeq]
therapists2 <- data.frame(therapists2,loc)
#online Seq
onlineSeq <- seq(from=2,to=3200, by = 4)
online <- therapists$X3[onlineSeq]
therapists2 <- data.frame(therapists2,online)
#Client Seq
clientSeq <- seq(from=3,to=3200, by = 4)
client <- therapists$X3[clientSeq]
therapists2 <- data.frame(therapists2,client)
# 4th value in column is all NA
# scrapSeq <- seq(from=4,to=3200, by = 4)
# scrap <- therapists$X3[scrapSeq]
# therapists2 <- data.frame(therapists2,scrap)
# therapists2
There are a few columns that contain NA’s that I’d prefer to replace with text so that it could easily be readable to an end user.
therapists2$specialty[is.na(therapists2$specialty)]<- 'None Listed'
therapists2$phone[is.na(therapists2$phone)]<- 'None Listed'
therapists2$online[is.na(therapists2$online)]<- 'No' # check this column
therapists2$client[is.na(therapists2$client)]<- 'Accepting'
therapists2$online <- str_replace(therapists2$online, '& ','')
We will also expand our location column into a state and zip code column.
therapists2 <- therapists2 %>% separate_wider_delim(loc, delim = ',', names = c('city','var'))
therapists2
## # A tibble: 800 × 8
## name specialty desc phone city var online client
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Uriah Cty Marriage & Family… "May… (213… Los … " … Online Accep…
## 2 James Birks Marriage & Family… "Acc… None… Los … " … Online Accep…
## 3 Taronda Jones Clinical Social W… "Are… (323… Los … " … Online Accep…
## 4 Christina Harrison Clinical Social W… "My … (213… Los … " … No Not a…
## 5 Eric Michael Katende Marriage & Family… "Rea… (213… Los … " … Online Accep…
## 6 Brittany Williams Pre-Licensed Prof… "I a… None… Los … " … Online Accep…
## 7 Claudia Williams Pre-Licensed Prof… "“We… (424… Los … " … Online Accep…
## 8 Bradlisia Dixon Marriage & Family… "Are… (424… Los … " … Online Accep…
## 9 Dr. Daryl M Rowe Psychologist, PhD "As … (323… Los … " … No Not a…
## 10 Camille Tenerife Marriage & Family… "As … (424… Los … " … Online Accep…
## # … with 790 more rows
therapists2$var <- str_replace(therapists2$var, 'CA ','')
therapists2$var <- trimws(therapists2$var)
therapists2 <- therapists2 %>% rename(c('var' = 'zip'))
state <- rep('CA',800)
therapists2 <- cbind(therapists2,state)
head(therapists2)
## name specialty
## 1 Uriah Cty Marriage & Family Therapist, MA, LMFT
## 2 James Birks Marriage & Family Therapist, LMFT
## 3 Taronda Jones Clinical Social Work/Therapist, LCSW
## 4 Christina Harrison Clinical Social Work/Therapist, LCSW
## 5 Eric Michael Katende Marriage & Family Therapist Associate, AMFT
## 6 Brittany Williams Pre-Licensed Professional
## desc
## 1 Maybe you remember this scene from a movie; picture it, a peaceful airplane ride becomes turbulent. The plane begins losing altitude rapidly. The flight attendant urgently instructs you to "put your oxygen mask on first" before helping others. That simple, but critical statement, is just as important in our everyday lives. Remembering to take time to yourself, non selfishly, can be difficult for anyone who tends to place others' needs first. Together, we'll explore your thoughts, feelings, decisions, choices, wants, and needs in your therapy. I will assist you in" finding your voice" and help you to "put your oxygen mask on first.
## 2 Accepting Teletherapy Clients Only. In today's world It can be so difficult to connect to our truth and live as our authentic selves. My mission is to create a non-judgmental, supportive and affirmative environment that will encourage you to heal, to grow and to reach your fullest potential. Through collaboration, empathetic listening and challenging negative patterns we will work together and help you reach your goals. Embarking on a therapeutic journey takes hard work, vulnerability and commitment. So if you are willing to put in the work I am committed to helping you succeed.
## 3 Are you looking for someone to help you navigate through difficult times? Do you find yourself feeling alone with no one to help you find solutions to your problems? Are you a coupIe struggling to have a healthy relationship? I am a Licensed Clinical Social Worker in the State of California, Oregon and Washington providing hope and encouragement to those in need. I have over 9 years' experience as a psychotherapist and have worked with all ages, from early childhood to the aged adult.
## 4 My primary goal in working in mental health is to put myself out of a job. Given that each of us intrinsically know what we need to heal and grow, my approach to psychotherapy is to couple your expertise of being you – your reality and lived experiences – with my skillset, in order to collaboratively work towards achieving your goals. Through a culturally-affirming and healing-centered stance, I utilize evidence-based treatments to facilitate our work.
## 5 Reaching out for help is one of the most humbling experiences we can go through in life. In a world that is constantly pushing narratives on us based on race, gender, sexuality and/or dominant cultural beliefs, it is life-affirming to engage preferred narratives that support and empower our own values. In difficult times it is easy to feel misunderstood or lost, longing to reconnect with our own resources. My work aims for those reconnections. My approach is collaborative, working from a place of respectful curiosity. I am not the expert on your lived experience. You are.
## 6 I am in the final year of my master's degree in social work. As a masters-level clinician, I want to collaborate with you to address and achieve your therapeutic goals. My experience includes working with survivors of domestic violence, sexual assault, and elder abuse, among other victimizations. I have facilitated domestic violence groups where I have educated survivors on the cycle of abuse, healthy relationships, and setting healthy boundaries with the people in their lives.
## phone city zip online client state
## 1 (213) 513-5553 Los Angeles 90048 Online Accepting CA
## 2 None Listed Los Angeles 90044 Online Accepting CA
## 3 (323) 347-3314 Los Angeles 90008 Online Accepting CA
## 4 (213) 320-6802 Los Angeles 90066 No Not accepting new clients CA
## 5 (213) 212-7852 Los Angeles 90004 Online Accepting CA
## 6 None Listed Los Angeles 90019 Online Accepting CA
#Factorizing
therapists2$zip <- as.factor(therapists2$zip)
therapists2$client <- as.factor(therapists2$client)
therapists2$online <- as.factor(therapists2$online)
From here its a pretty tidy data set. I did not split name since some of the ‘names’ are company names and would be best left all as one entity for it to be easily recognizable. I also left unicode inside the descr column since it may be best to leave those if importing the data elsewhere to another platform.
Some analysis that was chosen to be performed: Identifying the % of therapist offering online, on-ste Identifying the percent accepting new patients Break down the group based on their zip code
#Some totals
therapists2 %>% dplyr::group_by(online) %>% dplyr::summarize(n())
## # A tibble: 2 × 2
## online `n()`
## <fct> <int>
## 1 No 102
## 2 Online 698
therapists2 %>% dplyr::group_by(client) %>% dplyr::summarize(n())
## # A tibble: 3 × 2
## client `n()`
## <fct> <int>
## 1 Accepting 683
## 2 Not accepting new clients 78
## 3 Waitlist for new clients 39
therapists2 %>% dplyr::group_by(zip) %>% dplyr::summarize(total=n()) %>% arrange(desc(total))
## # A tibble: 65 × 2
## zip total
## <fct> <int>
## 1 90025 80
## 2 90001 45
## 3 90066 38
## 4 90008 37
## 5 90034 34
## 6 90045 33
## 7 90064 33
## 8 90048 26
## 9 90036 21
## 10 90024 20
## # … with 55 more rows
#Percentage of Therapists offering online
onlineOffer <- sum(therapists2$online =='Online') #Summing number of therapists offering online
noOnline <- sum(therapists2$online == 'No') #Summing number of therapists not offering online
#percentageNotOnline <- noOnline/800 #800 is the total number of therapists
percentageOnline <- (onlineOffer/800 * 100)
print(paste0('Percentage Offering Online Service is ',percentageOnline, '%'))
## [1] "Percentage Offering Online Service is 87.25%"
newPatients <- sum(therapists2$client =='Accepting') #Summing number of therapists accepting new patients
percentagePatient <- (newPatients/800 * 100)
print(paste0('Percentage Accepting New Patients is ',percentagePatient,'%'))
## [1] "Percentage Accepting New Patients is 85.375%"
#Percentage accepting both
both <- sum(therapists2$online == 'Online' & therapists2$client == 'Accepting')
percentageBoth <- (both/800 *100)
print(paste0('Percentage accepting new patients and offering online service is ', percentageBoth,'%'))
## [1] "Percentage accepting new patients and offering online service is 82.375%"
# Breaking Groups down based on Zip
g <- ggplot(data=therapists2,aes(x=online, fill=online))
g + geom_bar()
g2 <- ggplot(data=therapists2,aes(x=client, fill=client))
g2 + geom_bar()
#Ordering the table by zip code, then by whether they offer online services, then by whether they're accepting new clients
therapists2 <- therapists2 %>% arrange(zip,online,client)
#Table to help find breakdown by each zip code
therapists2 %>% dplyr::group_by(zip,client,online) %>% dplyr::summarize(total = n())
## `summarise()` has grouped output by 'zip', 'client'. You can override using the
## `.groups` argument.
## # A tibble: 134 × 4
## # Groups: zip, client [122]
## zip client online total
## <fct> <fct> <fct> <int>
## 1 90001 Accepting No 1
## 2 90001 Accepting Online 40
## 3 90001 Not accepting new clients No 3
## 4 90001 Waitlist for new clients Online 1
## 5 90002 Accepting Online 12
## 6 90003 Accepting Online 3
## 7 90003 Waitlist for new clients Online 1
## 8 90004 Accepting Online 13
## 9 90005 Accepting Online 8
## 10 90005 Not accepting new clients No 1
## # … with 124 more rows
So in summary, of the 800 total therapists:
- 698 of them offer online service (87.25%)
- 683 of them are accepting new patients (85.375%), 78 not accepting, 39
have a waitlist
- Zip code 90025 has the highest total with 80 therapists in the
area