For mini-lesson 2 class presentation, I wanted a simple data set to see what new coders were interested in. I went to kaggle and searched for data. I was able to find a 2016 new coder survey from 15,000+ people. In total the data had 15620 observation. The link attached below. https://www.kaggle.com/freecodecamp/2016-new-coder-survey-

library(ggplot2) # Data visualization
## Warning: package 'ggplot2' was built under R version 3.3.3
library(readr) 
## Warning: package 'readr' was built under R version 3.3.3
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.3.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.3.3
df <- read_csv("~/2 MSSA/463/datasets/2016-FCC-New-Coders-Survey-Data.csv")
## Parsed with column specification:
## cols(
##   .default = col_integer(),
##   BootcampName = col_character(),
##   CityPopulation = col_character(),
##   CodeEventBootcamp = col_character(),
##   CodeEventDjangoGirls = col_character(),
##   CodeEventGameJam = col_character(),
##   CodeEventMeetup = col_character(),
##   CodeEventOther = col_character(),
##   CodeEventRailsGirls = col_character(),
##   CodeEventWorkshop = col_character(),
##   CountryCitizen = col_character(),
##   CountryLive = col_character(),
##   EmploymentField = col_character(),
##   EmploymentFieldOther = col_character(),
##   EmploymentStatus = col_character(),
##   EmploymentStatusOther = col_character(),
##   Gender = col_character(),
##   ID.x = col_character(),
##   ID.y = col_character(),
##   JobApplyWhen = col_character(),
##   JobPref = col_character()
##   # ... with 17 more columns
## )
## See spec(...) for full column specifications.
names(df)
##   [1] "Age"                          "AttendedBootcamp"            
##   [3] "BootcampFinish"               "BootcampFullJobAfter"        
##   [5] "BootcampLoanYesNo"            "BootcampMonthsAgo"           
##   [7] "BootcampName"                 "BootcampPostSalary"          
##   [9] "BootcampRecommend"            "ChildrenNumber"              
##  [11] "CityPopulation"               "CodeEventBootcamp"           
##  [13] "CodeEventCoffee"              "CodeEventConferences"        
##  [15] "CodeEventDjangoGirls"         "CodeEventGameJam"            
##  [17] "CodeEventGirlDev"             "CodeEventHackathons"         
##  [19] "CodeEventMeetup"              "CodeEventNodeSchool"         
##  [21] "CodeEventNone"                "CodeEventOther"              
##  [23] "CodeEventRailsBridge"         "CodeEventRailsGirls"         
##  [25] "CodeEventStartUpWknd"         "CodeEventWomenCode"          
##  [27] "CodeEventWorkshop"            "CommuteTime"                 
##  [29] "CountryCitizen"               "CountryLive"                 
##  [31] "EmploymentField"              "EmploymentFieldOther"        
##  [33] "EmploymentStatus"             "EmploymentStatusOther"       
##  [35] "ExpectedEarning"              "FinanciallySupporting"       
##  [37] "Gender"                       "HasChildren"                 
##  [39] "HasDebt"                      "HasFinancialDependents"      
##  [41] "HasHighSpdInternet"           "HasHomeMortgage"             
##  [43] "HasServedInMilitary"          "HasStudentDebt"              
##  [45] "HomeMortgageOwe"              "HoursLearning"               
##  [47] "ID.x"                         "ID.y"                        
##  [49] "Income"                       "IsEthnicMinority"            
##  [51] "IsReceiveDiabilitiesBenefits" "IsSoftwareDev"               
##  [53] "IsUnderEmployed"              "JobApplyWhen"                
##  [55] "JobPref"                      "JobRelocateYesNo"            
##  [57] "JobRoleInterest"              "JobRoleInterestOther"        
##  [59] "JobWherePref"                 "LanguageAtHome"              
##  [61] "MaritalStatus"                "MoneyForLearning"            
##  [63] "MonthsProgramming"            "NetworkID"                   
##  [65] "Part1EndTime"                 "Part1StartTime"              
##  [67] "Part2EndTime"                 "Part2StartTime"              
##  [69] "PodcastChangeLog"             "PodcastCodeNewbie"           
##  [71] "PodcastCodingBlocks"          "PodcastDeveloperTea"         
##  [73] "PodcastDotNetRocks"           "PodcastHanselminutes"        
##  [75] "PodcastJSJabber"              "PodcastJsAir"                
##  [77] "PodcastNone"                  "PodcastOther"                
##  [79] "PodcastProgrammingThrowDown"  "PodcastRubyRogues"           
##  [81] "PodcastSEDaily"               "PodcastShopTalk"             
##  [83] "PodcastTalkPython"            "PodcastWebAhead"             
##  [85] "ResourceBlogs"                "ResourceBooks"               
##  [87] "ResourceCodeWars"             "ResourceCodecademy"          
##  [89] "ResourceCoursera"             "ResourceDevTips"             
##  [91] "ResourceEdX"                  "ResourceEggHead"             
##  [93] "ResourceFCC"                  "ResourceGoogle"              
##  [95] "ResourceHackerRank"           "ResourceKhanAcademy"         
##  [97] "ResourceLynda"                "ResourceMDN"                 
##  [99] "ResourceOdinProj"             "ResourceOther"               
## [101] "ResourcePluralSight"          "ResourceReddit"              
## [103] "ResourceSkillCrush"           "ResourceSoloLearn"           
## [105] "ResourceStackOverflow"        "ResourceTreehouse"           
## [107] "ResourceUdacity"              "ResourceUdemy"               
## [109] "ResourceW3Schools"            "ResourceYouTube"             
## [111] "SchoolDegree"                 "SchoolMajor"                 
## [113] "StudentDebtOwe"

There were a lot of columns to choose from but I wanted age and Job role interest to keep it simple.

df %>%
  select(Age,JobRoleInterest) %>%
  filter(Age!="null", JobRoleInterest != "null")%>%
  group_by(Age, JobRoleInterest) %>%
  summarize(count = n())%>%
  arrange(desc(count))%>%
  ggplot(aes(Age, count, color=JobRoleInterest))  + 
  geom_point() + 
  ggtitle("Interest in the IT field for new comers") +
  geom_smooth()
## `geom_smooth()` using method = 'loess'

It is interesting to see a lot of new coders were interested in Full-stack web developer and desire to be data analyst was somewhere in the middle.