CollegeMajors

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

require(dplyr)

## Loading required package: dplyr

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

 ## 1. For getting Data from collegemajors.csv where the column Major contains Statistics or Data
setwd("C:/RData")
MajorsList <- read.csv("collegemajors.csv",header=TRUE)
subset(MajorsList,(grepl("Data|STATISTICS", Major, ignore.case = TRUE)))

##    FOD1P                                         Major          Major_Category
## 44  6212 MANAGEMENT INFORMATION SYSTEMS AND STATISTICS                Business
## 52  2101      COMPUTER PROGRAMMING AND DATA PROCESSING Computers & Mathematics
## 59  3702               STATISTICS AND DECISION SCIENCE Computers & Mathematics

## 2. For getting Data from collegemajors.csv where the column Major contains Statistics or Data
#2 Write code that transforms the data below:

##[1] "bell pepper"  "bilberry"     "blackberry"   "blood orange"

##[5] "blueberry"    "cantaloupe"   "chili pepper" "cloudberry"  

##[9] "elderberry"   "lime"         "lychee"       "mulberry"    

##[13] "olive"        "salal berry"

testdata <- '[1] "bell pepper"  "bilberry"     "blackberry"   "blood orange"

[5] "blueberry"    "cantaloupe"   "chili pepper" "cloudberry"  

[9] "elderberry"   "lime"         "lychee"       "mulberry"    

[13] "olive"        "salal berry"'

library(stringr)

testdata_split <-unlist(str_extract_all(testdata, pattern = "\"([a-z]+.[a-z]+)\""))


testdata_split

##  [1] "\"bell pepper\""  "\"bilberry\""     "\"blackberry\""   "\"blood orange\""
##  [5] "\"blueberry\""    "\"cantaloupe\""   "\"chili pepper\"" "\"cloudberry\""  
##  [9] "\"elderberry\""   "\"lime\""         "\"lychee\""       "\"mulberry\""    
## [13] "\"olive\""        "\"salal berry\""

testdata_final <- str_remove_all(testdata_split, "\"")



##3 Describe, in words, what these expressions will match:

## "(.)\1\1"
#This will match any one character followed by two repetitions, like "ccc" or "666".
## "(.)(.)\\2\\1"
#This will search for two characters repeated, except in reverse like "cddc" or "2552".
## "(..)\1"
#This will search for two characters, repeated once, like “dada” or “6767”
"(.).\\1.\\1"

## [1] "(.).\\1.\\1"

#This will search for a five character term, three of which are the same, like “71727”.
## "(.)(.)(.).*\\3\\2\\1"
# This will construct a set of characters that begin and end with the same three characters, except the second instance is reversed, like  “6547113456”.

##4 Construct regular expressions to match words that:

##Start and end with the same character.
# __"(.).*\1"__
##Contain a repeated pair of letters (e.g. "church" contains "ch" repeated twice.)
# __".([A-Za-z][A-Za-z]).\1.*"__
##Contain one letter repeated in at least three places (e.g. "eleven" contains three "e"s.)
#  “.([A-Za-z]).\1.\1.”

CollegeMajors

Jaya Veluri

9/12/2021

R Markdown

Including Plots