Please deliver links to an R Markdown file (in GitHub and rpubs.com)
with solutions to the problems below. You may work in a small group, but
please submit separately with names of all group participants in your
submission.
1. Provide code that identifies the majors that contain either
“DATA” or “STATISTICS”
Using the 173 majors listed in fivethirtyeight.com’s College Majors
dataset [https://fivethirtyeight.com/features/the-economic-guide-to-picking-a-college-major/]
majors_df = read.csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/recent-grads.csv', header = TRUE)
Create a pattern that identifies the major containing either “DATA”
or “STATISTICS”
pattern <- "(DATA|STATISTICS)"
matches <- grep(pattern, majors_df$Major, value = TRUE, ignore.case = TRUE)
matches
## [1] "MANAGEMENT INFORMATION SYSTEMS AND STATISTICS"
## [2] "STATISTICS AND DECISION SCIENCE"
## [3] "COMPUTER PROGRAMMING AND DATA PROCESSING"
3 Describe, in words, what these expressions will match:
This regular expression consists of three parts: (.): The first part
is a capturing group that matches any single character and captures it.
\1: The second part is a backreference to the first capturing group. It
matches the same character as captured by the first group. \1: The third
part is another backreference to the first capturing group, matching the
same character again. This regular expression will match three
consecutive identical characters.
This regular expression consists of four parts: (.): The first part
is a capturing group that matches any single character and captures it.
(.): The second part is another capturing group that matches any single
character and captures it. \2: This is a backreference to the second
capturing group, matching the character captured by the second group.
\1: This is a backreference to the first capturing group, matching the
character captured by the first group. This regular expression will
match two consecutive characters that are the same.
This regular expression consists of two parts: (..): The first part
is a capturing group that matches any two consecutive characters and
captures them. \1: The second part is a backreference to the first
capturing group, matching the same two characters as captured by the
first group. This regular expression will match two consecutive
identical pairs of characters.
This regular expression consists of three parts: (.): The first part
is a capturing group that matches any single character and captures it.
\1: This is a backreference to the first capturing group, matching the
character captured by the first group. \1: This is another backreference
to the first capturing group, matching the same character again. This
regular expression will match a single character surrounded by two
identical characters.
This regular expression consists of five parts: (.): The first part
is a capturing group that matches any single character and captures it.
(.): The second part is another capturing group that matches any single
character and captures it. (.): The third part is yet another capturing
group that matches any single character and captures it. .*: This part
matches zero or more characters (any character sequence). \3\2\1: This
part is a sequence of backreferences to the capturing groups, matching
the characters captured by the third, second, and first groups in that
order. This regular expression will match a sequence of characters where
the last three characters are in reverse order compared to the first
three characters.
4 Construct regular expressions to match words that:
Start and end with the same character:
Contain a repeated pair of letters:
reg2 = "\\b\\w*(\\w{2})\\w*\\1"
Contain one letter repeated in at least three places:
reg3 = "^[a-z]*([a-z])\\1[a-z]*"
LS0tDQp0aXRsZTogIkFzc2lnbm1lbnQgMzogIFIgQ2hhcmFjdGVyIE1hbmlwdWxhdGlvbiBhbmQgRGF0ZSBQcm9jZXNzaW5nIg0KYXV0aG9yOiAiS29zc2kgQWtwbGFrYSINCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCINCm91dHB1dDogb3BlbmludHJvOjpsYWJfcmVwb3J0DQotLS0NCg0KUGxlYXNlIGRlbGl2ZXIgbGlua3MgdG8gYW4gUiBNYXJrZG93biBmaWxlIChpbiBHaXRIdWIgYW5kIHJwdWJzLmNvbSkgd2l0aCBzb2x1dGlvbnMgdG8gdGhlIHByb2JsZW1zIGJlbG93LiAgWW91IG1heSB3b3JrIGluIGEgc21hbGwgZ3JvdXAsIGJ1dCBwbGVhc2Ugc3VibWl0IHNlcGFyYXRlbHkgd2l0aCBuYW1lcyBvZiBhbGwgZ3JvdXAgcGFydGljaXBhbnRzIGluIHlvdXIgc3VibWlzc2lvbi4NCg0KIyMgMS4gUHJvdmlkZSBjb2RlIHRoYXQgaWRlbnRpZmllcyB0aGUgbWFqb3JzIHRoYXQgY29udGFpbiBlaXRoZXIgIkRBVEEiIG9yICJTVEFUSVNUSUNTIg0KDQpVc2luZyB0aGUgMTczIG1ham9ycyBsaXN0ZWQgaW4gZml2ZXRoaXJ0eWVpZ2h0LmNvbeKAmXMgQ29sbGVnZSBNYWpvcnMgZGF0YXNldCBbaHR0cHM6Ly9maXZldGhpcnR5ZWlnaHQuY29tL2ZlYXR1cmVzL3RoZS1lY29ub21pYy1ndWlkZS10by1waWNraW5nLWEtY29sbGVnZS1tYWpvci9dIA0KDQpgYGB7ciBsb2FkLWRhdGF9DQptYWpvcnNfZGYgPSByZWFkLmNzdignaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2ZpdmV0aGlydHllaWdodC9kYXRhL21hc3Rlci9jb2xsZWdlLW1ham9ycy9yZWNlbnQtZ3JhZHMuY3N2JywgaGVhZGVyID0gVFJVRSkNCmBgYA0KDQpDcmVhdGUgYSBwYXR0ZXJuIHRoYXQgaWRlbnRpZmllcyB0aGUgbWFqb3IgY29udGFpbmluZyBlaXRoZXIgIkRBVEEiIG9yICJTVEFUSVNUSUNTIg0KYGBge3IgcmV0dXJuLWRhdGEtc3RhdHMtbWFqb3J9DQpwYXR0ZXJuIDwtICIoREFUQXxTVEFUSVNUSUNTKSINCm1hdGNoZXMgPC0gZ3JlcChwYXR0ZXJuLCBtYWpvcnNfZGYkTWFqb3IsIHZhbHVlID0gVFJVRSwgaWdub3JlLmNhc2UgPSBUUlVFKQ0KbWF0Y2hlcw0KYGBgDQoNCiMjIDIgV3JpdGUgY29kZSB0aGF0IHRyYW5zZm9ybXMgdGhlIGRhdGEgYmVsb3c6DQoNClsxXSAiYmVsbCBwZXBwZXIiICAiYmlsYmVycnkiICAgICAiYmxhY2tiZXJyeSIgICAiYmxvb2Qgb3JhbmdlIg0KDQpbNV0gImJsdWViZXJyeSIgICAgImNhbnRhbG91cGUiICAgImNoaWxpIHBlcHBlciIgImNsb3VkYmVycnkiICANCg0KWzldICJlbGRlcmJlcnJ5IiAgICJsaW1lIiAgICAgICAgICJseWNoZWUiICAgICAgICJtdWxiZXJyeSIgICAgDQoNClsxM10gIm9saXZlIiAgICAgICAgInNhbGFsIGJlcnJ5Ig0KDQpJbnRvIGEgZm9ybWF0IGxpa2UgdGhpczoNCg0KYygiYmVsbCBwZXBwZXIiLCAiYmlsYmVycnkiLCAiYmxhY2tiZXJyeSIsICJibG9vZCBvcmFuZ2UiLCAiYmx1ZWJlcnJ5IiwgImNhbnRhbG91cGUiLCAiY2hpbGkgcGVwcGVyIiwgImNsb3VkYmVycnkiLCAiZWxkZXJiZXJyeSIsICJsaW1lIiwgImx5Y2hlZSIsICJtdWxiZXJyeSIsICJvbGl2ZSIsICJzYWxhbCBiZXJyeSIpDQoNCg0KYGBge3IgdHJhbnNmb3JtZWQtZGF0YX0NCnJhd19kYXRhIDwtICdbMV0gImJlbGwgcGVwcGVyIiAgImJpbGJlcnJ5IiAgICAgImJsYWNrYmVycnkiICAgImJsb29kIG9yYW5nZSINCg0KWzVdICJibHVlYmVycnkiICAgICJjYW50YWxvdXBlIiAgICJjaGlsaSBwZXBwZXIiICJjbG91ZGJlcnJ5IiAgDQoNCls5XSAiZWxkZXJiZXJyeSIgICAibGltZSIgICAgICAgICAibHljaGVlIiAgICAgICAibXVsYmVycnkiICAgIA0KDQpbMTNdICJvbGl2ZSIgICAgICAgICJzYWxhbCBiZXJyeSInDQoNCnRyYW5zZm9ybWVkX2RhdGEgPC0gc2Nhbih0ZXh0ID0gcmF3X2RhdGEsIHdoYXQgPSAiY2hhcmFjdGVyIiwgcXVpZXQgPSBUUlVFLCBxdW90ZSA9ICJcIiIpDQp0cmFuc2Zvcm1lZF9kYXRhIDwtIHRyYW5zZm9ybWVkX2RhdGFbIWdyZXBsKCJeXFxbIiwgdHJhbnNmb3JtZWRfZGF0YSldDQoNCnByaW50KHRyYW5zZm9ybWVkX2RhdGEpDQpgYGANCg0KDQpUaGUgdHdvIGV4ZXJjaXNlcyBiZWxvdyBhcmUgdGFrZW4gZnJvbSBSIGZvciBEYXRhIFNjaWVuY2U6DQoNCiMjIDMgRGVzY3JpYmUsIGluIHdvcmRzLCB3aGF0IHRoZXNlIGV4cHJlc3Npb25zIHdpbGwgbWF0Y2g6DQoNCi0gKC4pXDFcMQ0KDQpUaGlzIHJlZ3VsYXIgZXhwcmVzc2lvbiBjb25zaXN0cyBvZiB0aHJlZSBwYXJ0czoNCiguKTogVGhlIGZpcnN0IHBhcnQgaXMgYSBjYXB0dXJpbmcgZ3JvdXAgdGhhdCBtYXRjaGVzIGFueSBzaW5nbGUgY2hhcmFjdGVyIGFuZCBjYXB0dXJlcyBpdC4NClwxOiBUaGUgc2Vjb25kIHBhcnQgaXMgYSBiYWNrcmVmZXJlbmNlIHRvIHRoZSBmaXJzdCBjYXB0dXJpbmcgZ3JvdXAuIEl0IG1hdGNoZXMgdGhlIHNhbWUgY2hhcmFjdGVyIGFzIGNhcHR1cmVkIGJ5IHRoZSBmaXJzdCBncm91cC4NClwxOiBUaGUgdGhpcmQgcGFydCBpcyBhbm90aGVyIGJhY2tyZWZlcmVuY2UgdG8gdGhlIGZpcnN0IGNhcHR1cmluZyBncm91cCwgbWF0Y2hpbmcgdGhlIHNhbWUgY2hhcmFjdGVyIGFnYWluLg0KVGhpcyByZWd1bGFyIGV4cHJlc3Npb24gd2lsbCBtYXRjaCB0aHJlZSBjb25zZWN1dGl2ZSBpZGVudGljYWwgY2hhcmFjdGVycy4NCg0KLSAoLikoLilcXDJcXDENCg0KVGhpcyByZWd1bGFyIGV4cHJlc3Npb24gY29uc2lzdHMgb2YgZm91ciBwYXJ0czoNCiguKTogVGhlIGZpcnN0IHBhcnQgaXMgYSBjYXB0dXJpbmcgZ3JvdXAgdGhhdCBtYXRjaGVzIGFueSBzaW5nbGUgY2hhcmFjdGVyIGFuZCBjYXB0dXJlcyBpdC4NCiguKTogVGhlIHNlY29uZCBwYXJ0IGlzIGFub3RoZXIgY2FwdHVyaW5nIGdyb3VwIHRoYXQgbWF0Y2hlcyBhbnkgc2luZ2xlIGNoYXJhY3RlciBhbmQgY2FwdHVyZXMgaXQuDQpcXDI6IFRoaXMgaXMgYSBiYWNrcmVmZXJlbmNlIHRvIHRoZSBzZWNvbmQgY2FwdHVyaW5nIGdyb3VwLCBtYXRjaGluZyB0aGUgY2hhcmFjdGVyIGNhcHR1cmVkIGJ5IHRoZSBzZWNvbmQgZ3JvdXAuDQpcXDE6IFRoaXMgaXMgYSBiYWNrcmVmZXJlbmNlIHRvIHRoZSBmaXJzdCBjYXB0dXJpbmcgZ3JvdXAsIG1hdGNoaW5nIHRoZSBjaGFyYWN0ZXIgY2FwdHVyZWQgYnkgdGhlIGZpcnN0IGdyb3VwLg0KVGhpcyByZWd1bGFyIGV4cHJlc3Npb24gd2lsbCBtYXRjaCB0d28gY29uc2VjdXRpdmUgY2hhcmFjdGVycyB0aGF0IGFyZSB0aGUgc2FtZS4NCg0KLSAoLi4pXDENCg0KVGhpcyByZWd1bGFyIGV4cHJlc3Npb24gY29uc2lzdHMgb2YgdHdvIHBhcnRzOg0KKC4uKTogVGhlIGZpcnN0IHBhcnQgaXMgYSBjYXB0dXJpbmcgZ3JvdXAgdGhhdCBtYXRjaGVzIGFueSB0d28gY29uc2VjdXRpdmUgY2hhcmFjdGVycyBhbmQgY2FwdHVyZXMgdGhlbS4NClwxOiBUaGUgc2Vjb25kIHBhcnQgaXMgYSBiYWNrcmVmZXJlbmNlIHRvIHRoZSBmaXJzdCBjYXB0dXJpbmcgZ3JvdXAsIG1hdGNoaW5nIHRoZSBzYW1lIHR3byBjaGFyYWN0ZXJzIGFzIGNhcHR1cmVkIGJ5IHRoZSBmaXJzdCBncm91cC4NClRoaXMgcmVndWxhciBleHByZXNzaW9uIHdpbGwgbWF0Y2ggdHdvIGNvbnNlY3V0aXZlIGlkZW50aWNhbCBwYWlycyBvZiBjaGFyYWN0ZXJzLg0KDQotICguKS5cXDEuXFwxDQoNClRoaXMgcmVndWxhciBleHByZXNzaW9uIGNvbnNpc3RzIG9mIHRocmVlIHBhcnRzOg0KKC4pOiBUaGUgZmlyc3QgcGFydCBpcyBhIGNhcHR1cmluZyBncm91cCB0aGF0IG1hdGNoZXMgYW55IHNpbmdsZSBjaGFyYWN0ZXIgYW5kIGNhcHR1cmVzIGl0Lg0KXFwxOiBUaGlzIGlzIGEgYmFja3JlZmVyZW5jZSB0byB0aGUgZmlyc3QgY2FwdHVyaW5nIGdyb3VwLCBtYXRjaGluZyB0aGUgY2hhcmFjdGVyIGNhcHR1cmVkIGJ5IHRoZSBmaXJzdCBncm91cC4NClxcMTogVGhpcyBpcyBhbm90aGVyIGJhY2tyZWZlcmVuY2UgdG8gdGhlIGZpcnN0IGNhcHR1cmluZyBncm91cCwgbWF0Y2hpbmcgdGhlIHNhbWUgY2hhcmFjdGVyIGFnYWluLg0KVGhpcyByZWd1bGFyIGV4cHJlc3Npb24gd2lsbCBtYXRjaCBhIHNpbmdsZSBjaGFyYWN0ZXIgc3Vycm91bmRlZCBieSB0d28gaWRlbnRpY2FsIGNoYXJhY3RlcnMuDQoNCi0gKC4pKC4pKC4pLipcXDNcXDJcXDENCg0KVGhpcyByZWd1bGFyIGV4cHJlc3Npb24gY29uc2lzdHMgb2YgZml2ZSBwYXJ0czoNCiguKTogVGhlIGZpcnN0IHBhcnQgaXMgYSBjYXB0dXJpbmcgZ3JvdXAgdGhhdCBtYXRjaGVzIGFueSBzaW5nbGUgY2hhcmFjdGVyIGFuZCBjYXB0dXJlcyBpdC4NCiguKTogVGhlIHNlY29uZCBwYXJ0IGlzIGFub3RoZXIgY2FwdHVyaW5nIGdyb3VwIHRoYXQgbWF0Y2hlcyBhbnkgc2luZ2xlIGNoYXJhY3RlciBhbmQgY2FwdHVyZXMgaXQuDQooLik6IFRoZSB0aGlyZCBwYXJ0IGlzIHlldCBhbm90aGVyIGNhcHR1cmluZyBncm91cCB0aGF0IG1hdGNoZXMgYW55IHNpbmdsZSBjaGFyYWN0ZXIgYW5kIGNhcHR1cmVzIGl0Lg0KLio6IFRoaXMgcGFydCBtYXRjaGVzIHplcm8gb3IgbW9yZSBjaGFyYWN0ZXJzIChhbnkgY2hhcmFjdGVyIHNlcXVlbmNlKS4NClxcM1xcMlxcMTogVGhpcyBwYXJ0IGlzIGEgc2VxdWVuY2Ugb2YgYmFja3JlZmVyZW5jZXMgdG8gdGhlIGNhcHR1cmluZyBncm91cHMsIG1hdGNoaW5nIHRoZSBjaGFyYWN0ZXJzIGNhcHR1cmVkIGJ5IHRoZSB0aGlyZCwgc2Vjb25kLCBhbmQgZmlyc3QgZ3JvdXBzIGluIHRoYXQgb3JkZXIuDQpUaGlzIHJlZ3VsYXIgZXhwcmVzc2lvbiB3aWxsIG1hdGNoIGEgc2VxdWVuY2Ugb2YgY2hhcmFjdGVycyB3aGVyZSB0aGUgbGFzdCB0aHJlZSBjaGFyYWN0ZXJzIGFyZSBpbiByZXZlcnNlIG9yZGVyIGNvbXBhcmVkIHRvIHRoZSBmaXJzdCB0aHJlZSBjaGFyYWN0ZXJzLg0KDQoNCiMjIDQgQ29uc3RydWN0IHJlZ3VsYXIgZXhwcmVzc2lvbnMgdG8gbWF0Y2ggd29yZHMgdGhhdDoNCg0KIFN0YXJ0IGFuZCBlbmQgd2l0aCB0aGUgc2FtZSBjaGFyYWN0ZXI6DQogDQpgYGB7cn0NCnJlZzEgPSAiXiguKS4rXFwxIg0KYGBgDQoNCiBDb250YWluIGEgcmVwZWF0ZWQgcGFpciBvZiBsZXR0ZXJzOiANCmBgYHtyfQ0KcmVnMiA9ICJcXGJcXHcqKFxcd3syfSlcXHcqXFwxIg0KYGBgDQogDQoNCiBDb250YWluIG9uZSBsZXR0ZXIgcmVwZWF0ZWQgaW4gYXQgbGVhc3QgdGhyZWUgcGxhY2VzOiANCmBgYHtyfQ0KcmVnMyA9ICJeW2Etel0qKFthLXpdKVxcMVthLXpdKiINCmBgYA0KIA0KIA0KDQogDQo=