For mini-lesson 2 class presentation, I wanted a simple data set to see what new coders were interested in. I went to kaggle and searched for data. I was able to find a 2016 new coder survey from 15,000+ people. In total the data had 15620 observation. The link attached below. https://www.kaggle.com/freecodecamp/2016-new-coder-survey-
library(ggplot2) # Data visualization
## Warning: package 'ggplot2' was built under R version 3.3.3
library(readr)
## Warning: package 'readr' was built under R version 3.3.3
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.3.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.3.3
df <- read_csv("~/2 MSSA/463/datasets/2016-FCC-New-Coders-Survey-Data.csv")
## Parsed with column specification:
## cols(
## .default = col_integer(),
## BootcampName = col_character(),
## CityPopulation = col_character(),
## CodeEventBootcamp = col_character(),
## CodeEventDjangoGirls = col_character(),
## CodeEventGameJam = col_character(),
## CodeEventMeetup = col_character(),
## CodeEventOther = col_character(),
## CodeEventRailsGirls = col_character(),
## CodeEventWorkshop = col_character(),
## CountryCitizen = col_character(),
## CountryLive = col_character(),
## EmploymentField = col_character(),
## EmploymentFieldOther = col_character(),
## EmploymentStatus = col_character(),
## EmploymentStatusOther = col_character(),
## Gender = col_character(),
## ID.x = col_character(),
## ID.y = col_character(),
## JobApplyWhen = col_character(),
## JobPref = col_character()
## # ... with 17 more columns
## )
## See spec(...) for full column specifications.
names(df)
## [1] "Age" "AttendedBootcamp"
## [3] "BootcampFinish" "BootcampFullJobAfter"
## [5] "BootcampLoanYesNo" "BootcampMonthsAgo"
## [7] "BootcampName" "BootcampPostSalary"
## [9] "BootcampRecommend" "ChildrenNumber"
## [11] "CityPopulation" "CodeEventBootcamp"
## [13] "CodeEventCoffee" "CodeEventConferences"
## [15] "CodeEventDjangoGirls" "CodeEventGameJam"
## [17] "CodeEventGirlDev" "CodeEventHackathons"
## [19] "CodeEventMeetup" "CodeEventNodeSchool"
## [21] "CodeEventNone" "CodeEventOther"
## [23] "CodeEventRailsBridge" "CodeEventRailsGirls"
## [25] "CodeEventStartUpWknd" "CodeEventWomenCode"
## [27] "CodeEventWorkshop" "CommuteTime"
## [29] "CountryCitizen" "CountryLive"
## [31] "EmploymentField" "EmploymentFieldOther"
## [33] "EmploymentStatus" "EmploymentStatusOther"
## [35] "ExpectedEarning" "FinanciallySupporting"
## [37] "Gender" "HasChildren"
## [39] "HasDebt" "HasFinancialDependents"
## [41] "HasHighSpdInternet" "HasHomeMortgage"
## [43] "HasServedInMilitary" "HasStudentDebt"
## [45] "HomeMortgageOwe" "HoursLearning"
## [47] "ID.x" "ID.y"
## [49] "Income" "IsEthnicMinority"
## [51] "IsReceiveDiabilitiesBenefits" "IsSoftwareDev"
## [53] "IsUnderEmployed" "JobApplyWhen"
## [55] "JobPref" "JobRelocateYesNo"
## [57] "JobRoleInterest" "JobRoleInterestOther"
## [59] "JobWherePref" "LanguageAtHome"
## [61] "MaritalStatus" "MoneyForLearning"
## [63] "MonthsProgramming" "NetworkID"
## [65] "Part1EndTime" "Part1StartTime"
## [67] "Part2EndTime" "Part2StartTime"
## [69] "PodcastChangeLog" "PodcastCodeNewbie"
## [71] "PodcastCodingBlocks" "PodcastDeveloperTea"
## [73] "PodcastDotNetRocks" "PodcastHanselminutes"
## [75] "PodcastJSJabber" "PodcastJsAir"
## [77] "PodcastNone" "PodcastOther"
## [79] "PodcastProgrammingThrowDown" "PodcastRubyRogues"
## [81] "PodcastSEDaily" "PodcastShopTalk"
## [83] "PodcastTalkPython" "PodcastWebAhead"
## [85] "ResourceBlogs" "ResourceBooks"
## [87] "ResourceCodeWars" "ResourceCodecademy"
## [89] "ResourceCoursera" "ResourceDevTips"
## [91] "ResourceEdX" "ResourceEggHead"
## [93] "ResourceFCC" "ResourceGoogle"
## [95] "ResourceHackerRank" "ResourceKhanAcademy"
## [97] "ResourceLynda" "ResourceMDN"
## [99] "ResourceOdinProj" "ResourceOther"
## [101] "ResourcePluralSight" "ResourceReddit"
## [103] "ResourceSkillCrush" "ResourceSoloLearn"
## [105] "ResourceStackOverflow" "ResourceTreehouse"
## [107] "ResourceUdacity" "ResourceUdemy"
## [109] "ResourceW3Schools" "ResourceYouTube"
## [111] "SchoolDegree" "SchoolMajor"
## [113] "StudentDebtOwe"
There were a lot of columns to choose from but I wanted age and Job role interest to keep it simple.
df %>%
select(Age,JobRoleInterest) %>%
filter(Age!="null", JobRoleInterest != "null")%>%
group_by(Age, JobRoleInterest) %>%
summarize(count = n())%>%
arrange(desc(count))%>%
ggplot(aes(Age, count, color=JobRoleInterest)) +
geom_point() +
ggtitle("Interest in the IT field for new comers") +
geom_smooth()
## `geom_smooth()` using method = 'loess'
It is interesting to see a lot of new coders were interested in Full-stack web developer and desire to be data analyst was somewhere in the middle.