Assignment 1 – Data Analysis using R Programming

Dataset: Drugs, Side Effects & Medical Conditions

Author: Grp 4

Introduction

This report analyzes the relationship between drug activity and user ratings, and explores online ratings based on alcohol reaction.

1. Load Dataset

drugs_side_effects_drugs_com <- readxl::read_excel("C:/Users/Chetna Patil/OneDrive - George Brown College/Desktop/R Assignment/R Assignment_Group 4/Assignment R_Grp 4/drugs_side_effects_drugs_com.xlsx")
## Warning: Expecting numeric in G2045 / R2045C7: got
## 'h=function(n){document.body?k(document.body):0<n?l.setTimeout(function(){h(n-1)}'
## Warning: Expecting numeric in N2045 / R2045C14: got 'P=[]'
## Warning: Expecting numeric in O2045 / R2045C15: got 'Q=function(a'
## Warning: Expecting logical in R2045 / R2045C18: got 'R=function(a'
## Warning: Expecting logical in S2045 / R2045C19: got 'b'
## Warning: Expecting logical in T2045 / R2045C20: got
## 'c){a.b||(a.b={});if(!a.b[c]){var d=Q(a'
## Warning: Expecting logical in U2045 / R2045C21: got 'c);d&&(a.b[c]=new
## b(d))}return a.b[c]}; M.prototype.h=N?function(){var
## a=Uint8Array.prototype.toJSON;Uint8Array.prototype.toJSON=function(){var b;void
## 0===b&&(b=0);if(!L){L={};for(var
## c=""ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"".split("""")'
## Warning: Expecting logical in V2045 / R2045C22: got 'd=[""+/=""'
## Warning: Expecting logical in W2045 / R2045C23: got '+/""'
## Warning: Expecting logical in Y2045 / R2045C25: got '-_.""'
## Warning: Expecting logical in Z2045 / R2045C26: got '-_""]'
## Warning: Expecting logical in AA2045 / R2045C27: got 'e=0;5>e;e++){var
## f=c.concat(d[e].split(""""));K[e]=f;for(var k=0;k<f.length;k++){var h=f[k];void
## 0===L[h]&&(L[h]=k)}}}b=K[b];c=[];for(d=0;d<this.length;d+=3){var n=this[d]'
## Warning: Expecting logical in AB2045 / R2045C28: got
## 't=(e=d+1<this.length)?this[d+1]:0;h=(f=d+2<this.length)?this[d+2]:0;k=n>>2;n=(n&
## 3)<<4|t>>4;t=(t&15)<<2|h>>6;h&=63;f||(h=64'
## Warning: Expecting logical in AC2045 / R2045C29: got 'e||(t=64));c.push(b[k]'
## Warning: Expecting logical in AD2045 / R2045C30: got 'b[n]'
## Warning: Expecting logical in AE2045 / R2045C31: got 'b[t]||""""'
## Warning: Expecting logical in AF2045 / R2045C32: got 'b[h]||"""")}return
## c.join("""")};try{return JSON.stringify(this.a&&this.a'
## Warning: Expecting logical in AG2045 / R2045C33: got
## 'S)}finally{Uint8Array.prototype.toJSON=a}}:function(){return
## JSON.stringify(this.a&&this.a'
## Warning: Expecting logical in AH2045 / R2045C34: got 'S)};var S=function(a'
## Warning: Expecting logical in AI2045 / R2045C35: got
## 'b){return""number""!==typeof
## b||!isNaN(b)&&Infinity!==b&&-Infinity!==b?b:String(b)};M.prototype.toString=function(){return
## this.a.toString()};var T=function(a){O(this'
## Warning: Expecting logical in AJ2045 / R2045C36: got 'a)};u(T'
## Warning: Expecting logical in AK2045 / R2045C37: got 'M);var
## U=function(a){O(this'
## Warning: Expecting logical in AL2045 / R2045C38: got 'a)};u(U'
## Warning: Expecting logical in AM2045 / R2045C39: got 'M);var ja=function(a'
## Warning: Expecting logical in AN2045 / R2045C40: got 'b){this.c=new B(a);var
## c=R(b'
## Warning: Expecting logical in AP2045 / R2045C42: got '5);c=new y(w'
## Warning: Expecting logical in AQ2045 / R2045C43: got 'Q(c'
## Warning: Expecting logical in AR2045 / R2045C44: got '4)||"""");this.b=new
## ea(a'
## Warning: Expecting logical in AS2045 / R2045C45: got 'c'
## Warning: Expecting logical in AT2045 / R2045C46: got 'Q(b'
## Warning: Expecting logical in AU2045 / R2045C47: got '4));this.a=b}'
## Warning: Expecting logical in AV2045 / R2045C48: got 'ka=function(a'
## Warning: Expecting logical in AW2045 / R2045C49: got 'b'
## Warning: Expecting logical in AX2045 / R2045C50: got 'c'
## Warning: Expecting logical in AY2045 / R2045C51: got 'd){b=new
## T(b?JSON.parse(b):null);b=new y(w'
## Warning: Expecting logical in AZ2045 / R2045C52: got 'Q(b'
## Warning: Expecting logical in BA2045 / R2045C53: got '4)||"""");C(a.c'
## Warning: Expecting logical in BB2045 / R2045C54: got 'b'
## Warning: Expecting logical in BD2045 / R2045C56: got '!1'
## Warning: Expecting logical in BE2045 / R2045C57: got 'c'
## Warning: Expecting logical in BF2045 / R2045C58: got
## 'function(){ia(function(){F(a.b);d(!1)}'
## Warning: Expecting logical in BG2045 / R2045C59: got 'function(){d(!0)}'
## Warning: Expecting logical in BH2045 / R2045C60: got 'Q(a.a'
## Warning: Expecting logical in BI2045 / R2045C61: got '2)'
## Warning: Expecting logical in BJ2045 / R2045C62: got 'Q(a.a'
## Warning: Expecting logical in BK2045 / R2045C63: got '3)'
## Warning: Expecting logical in BL2045 / R2045C64: got 'Q(a.a'
## Warning: Expecting logical in BM2045 / R2045C65: got '1))})};var la=function(a'
## Warning: Expecting logical in BN2045 / R2045C66: got 'b){V(a'
## Warning: Expecting logical in BO2045 / R2045C67: got
## 'internal_api_load_with_sb""'
## Warning: Expecting logical in BP2045 / R2045C68: got 'function(c'
## Warning: Expecting logical in BQ2045 / R2045C69: got 'd'
## Warning: Expecting logical in BR2045 / R2045C70: got 'e){ka(b'
## Warning: Expecting logical in BS2045 / R2045C71: got 'c'
## Warning: Expecting logical in BT2045 / R2045C72: got 'd'
## Warning: Expecting logical in BU2045 / R2045C73: got 'e)});V(a'
## Warning: Expecting logical in BV2045 / R2045C74: got 'internal_api_sb""'
## Warning: Expecting logical in BW2045 / R2045C75: got 'function(){F(b.b)})}'
## Warning: Expecting logical in BX2045 / R2045C76: got 'V=function(a'
## Warning: Expecting logical in BY2045 / R2045C77: got 'b'
## Warning: Expecting logical in BZ2045 / R2045C78: got 'c){a=l.btoa(a+b);v(a'
## Warning: Expecting logical in CA2045 / R2045C79: got 'c)}'
## Warning: Expecting logical in CB2045 / R2045C80: got 'W=function(a'
## Warning: Expecting logical in CC2045 / R2045C81: got 'b'
## Warning: Expecting logical in CD2045 / R2045C82: got 'c){for(var d=[]'
## Warning: Expecting logical in CE2045 / R2045C83: got
## 'e=2;e<arguments.length;++e)d[e-2]=arguments[e];e=l.btoa(a+b);e=l[e];if(""function""==r(e))e.apply(null'
## Warning: Expecting logical in CF2045 / R2045C84: got 'd);else throw Error(""API
## not exported."");};var X=function(a){O(this'
## Warning: Expecting logical in CG2045 / R2045C85: got 'a)};u(X'
## Warning: Expecting logical in CH2045 / R2045C86: got 'M);var
## Y=function(a){this.h=window;this.a=a;this.b=Q(this.a'
## Warning: Expecting logical in CI2045 / R2045C87: got '1);this.f=R(this.a'
## Warning: Expecting logical in CK2045 / R2045C89: got '2);this.g=R(this.a'
## Warning: Expecting logical in CL2045 / R2045C90: got 'U'
## Warning: Expecting logical in CM2045 / R2045C91: got
## '3);this.c=!1};Y.prototype.start=function(){ma();var a=new ja(this.h.document'
## Warning: Expecting logical in CN2045 / R2045C92: got 'this.g);la(this.b'
## Warning: Expecting logical in CO2045 / R2045C93: got 'a);na(this)}; var
## ma=function(){var
## a=function(){if(!l.frames.googlefcPresent)if(document.body){var
## b=document.createElement(""iframe"");b.style.display=""none"";b.style.width=""0px"";b.style.height=""0px"";b.style.border=""none"";b.style.zIndex=""-1000"";b.style.left=""-1000px"";b.style.top=""-1000px"";b.name=""googlefcPresent"";document.body.appendChild(b)}else
## l.setTimeout(a'
## Warning: Expecting logical in CP2045 / R2045C94: got '5)};a()}'
## Warning: Expecting logical in CQ2045 / R2045C95: got 'na=function(a){var
## b=Date.now();W(a.b'
## Warning: Expecting logical in CR2045 / R2045C96: got
## 'internal_api_load_with_sb""'
## Warning: Expecting logical in CS2045 / R2045C97: got 'a.f.h()'
## Warning: Expecting logical in CT2045 / R2045C98: got 'function(){var c;var
## d=a.b'
## Warning: Expecting logical in CU2045 / R2045C99: got
## 'e=l[l.btoa(d+""loader_js"")];if(e){e=l.atob(e); e=parseInt(e'
## Warning: Expecting logical in CV2045 / R2045C100: got
## '10);d=l.btoa(d+""loader_js"").split(""."");var f=l;d[0]in
## f||""undefined""==typeof f.execScript||f.execScript(""var
## ""+d[0]);for(;d.length&&(c=d.shift());)d.length?f[c]&&f[c]!==Object.prototype[c]?f=f[c]:f=f[c]={}:f[c]=null;c=Math.abs(b-e);c=1728E5>c?0:c}else
## c=-1;0!=c&&(W(a.b'
## Warning: Expecting logical in CW2045 / R2045C101: got 'internal_api_sb"")'
## Warning: Expecting logical in CX2045 / R2045C102: got 'Z(a'
## Warning: Expecting logical in CY2045 / R2045C103: got 'Q(a.a'
## Warning: Expecting logical in CZ2045 / R2045C104: got '6)))}'
## Warning: Expecting logical in DA2045 / R2045C105: got 'function(c){Z(a'
## Warning: Expecting logical in DB2045 / R2045C106: got 'c?Q(a.a'
## Warning: Expecting logical in DC2045 / R2045C107: got '4):Q(a.a'
## Warning: Expecting logical in DD2045 / R2045C108: got '5))})}'
## Warning: Expecting logical in DE2045 / R2045C109: got 'Z=function(a'
## Warning: Expecting logical in DF2045 / R2045C110: got 'b){a.c||(a.c=!0'
## Warning: Expecting logical in DG2045 / R2045C111: got 'a=new l.XMLHttpRequest'
## Warning: Expecting logical in DH2045 / R2045C112: got 'a.open(""GET""'
## Warning: Expecting logical in DI2045 / R2045C113: got 'b'
## Warning: Expecting logical in DJ2045 / R2045C114: got '!0)'
## Warning: Expecting logical in DK2045 / R2045C115: got 'a.send())};(function(a'
## Warning: Expecting logical in DL2045 / R2045C116: got
## 'b){l[a]=function(c){for(var d=[]'
## Warning: Expecting logical in DM2045 / R2045C117: got
## 'e=0;e<arguments.length;++e)d[e-0]=arguments[e];l[a]=q;b.apply(null'
## Warning: Expecting logical in DN2045 / R2045C118: got
## 'd)}})(""__d3lUW8vwsKlB__""'
## Warning: Expecting logical in DO2045 / R2045C119: got
## 'function(a){""function""==typeof window.atob&&(a=window.atob(a)'
## Warning: Expecting logical in DP2045 / R2045C120: got 'a=new
## X(a?JSON.parse(a):null)'
## Warning: Expecting logical in DQ2045 / R2045C121: got '(new
## Y(a)).start())});}).call(this);
## window.__d3lUW8vwsKlB__(""WyIyYzZlMmQxOTc1M2U5ZGEzIixbbnVsbCxudWxsLG51bGwsImh0dHBzOi8vZnVuZGluZ2Nob2ljZXNtZXNzYWdlcy5nb29nbGUuY29tL2YvQUdTS1d4VmZTeHR0VzVfLXo0NlYxeGhodEk4M053bE41MjJJTkdmZ3Rqck9aZkM0V1plV2xnQ2w1cW50NkRsM1luZ3QtSmQ1bXZsc2s4QXJnUWJMRnBtbW1ZY1x1MDAzZCJdCixbMjAsImRpdi1ncHQtYWQiLDEwMCwiTW1NMlpUSmtNVGszTlRObE9XUmhNd1x1MDAzZFx1MDAzZCIsW251bGwsbnVsbCxudWxsLCJodHRwczovL3d3dy5nc3RhdGljLmNvbS8wZW1uL2YvcC8yYzZlMmQxOTc1M2U5ZGEzLmpzP3VzcXBcdTAwM2RDQkkiXQpdCiwiaHR0cHM6Ly9mdW5kaW5nY2hvaWNlc21lc3NhZ2VzLmdvb2dsZS5jb20vbC9BR1NLV3hVQlJCUV9HOGlYM0NDQUpKT09RM3lLNG5LYzNrUkNyeW13Tk1JVkhZUGo1WUswWndVRVI0aHI2N3FBRnQ1cl9lODJRWWU1Zkp0S184M0o2dVpqP2FiXHUwMDNkMSIsImh0dHBzOi8vZnVuZGluZ2Nob2ljZXNtZXNzYWdlcy5nb29nbGUuY29tL2wvQUdTS1d4WE9yRjJUOWFjVE5kQzB5UXE5UVczNjZDd3h4X1dMZVpLYk4wcVAyM1c2ZXd1UHM4ajdCNTlsRlM0RVZERUVhUWRGNUF5SW1vM2RVQzcyUWg5VD9hYlx1MDAzZDJcdTAwMjZzYmZcdTAwM2QxIiwiaHR0cHM6Ly9mdW5kaW5nY2hvaWNlc21lc3NhZ2VzLmdvb2dsZS5jb20vbC9BR1NLV3hXR1lwVEdYWUxWU3FKQloxQTJkUjk5S2t2MHg1NjBJelpzLW5ZZHllTjRSUDNfNlZnTzMxRG5xU0VjTzNsSlIyaFpuckpQeFNweUNfSGluMVVxP3NiZlx1MDAzZDIiXQo="");"'
## Warning: Expecting logical in DR2045 / R2045C122: got 'infliximab,
## inliximab-abda, infliximab-axxq, infliximab-dyyb, infliximab-qbtx Brand names:
## Remicade , Avsola , Inflectra , Ixifi , Renflexis'
## Warning: Expecting logical in DS2045 / R2045C123: got 'Antirheumatics, TNF alfa
## inhibitors'
## Warning: Expecting logical in DT2045 / R2045C124: got 'Remicade, Avsola,
## Inflectra, Ixifi, Renflexis'
## Warning: Expecting logical in DV2045 / R2045C126: got 'Rx'
## Warning: Expecting logical in DW2045 / R2045C127: got 'B'
## Warning: Expecting logical in DX2045 / R2045C128: got 'N'
## Warning: Expecting logical in DZ2045 / R2045C130: got 'Remicade:
## https://www.drugs.com/remicade.html | Inflectra:
## https://www.drugs.com/inflectra.html | Renflexis:
## https://www.drugs.com/renflexis.html | Avsola:
## https://www.drugs.com/avsola.html | Cosentyx:
## https://www.drugs.com/cosentyx.html | Enbrel: https://www.drugs.com/enbrel.html
## | Entyvio: https://www.drugs.com/entyvio.html | Humira:
## https://www.drugs.com/humira.html | Otezla: https://www.drugs.com/otezla.html |
## Stelara: https://www.drugs.com/stelara.html'
## Warning: Expecting logical in EA2045 / R2045C131: got 'Inflammatory Bowel
## Disease Other names: IBD Crohn's disease is a chronic autoimmune disease that
## can affect any part of the gastrointestinal tract but most commonly occurs in
## the ileum (the area where the small and large intestine meet).'
## Warning: Expecting logical in ED2045 / R2045C134: got
## 'https://www.drugs.com/infliximab.html'
## Warning: Expecting logical in EE2045 / R2045C135: got
## 'https://www.drugs.com/condition/inflammatory-bowel-disease.html'
## New names:
## • `` -> `...18`
## • `` -> `...19`
## • `` -> `...20`
## • `` -> `...21`
## • `` -> `...22`
## • `` -> `...23`
## • `` -> `...24`
## • `` -> `...25`
## • `` -> `...26`
## • `` -> `...27`
## • `` -> `...28`
## • `` -> `...29`
## • `` -> `...30`
## • `` -> `...31`
## • `` -> `...32`
## • `` -> `...33`
## • `` -> `...34`
## • `` -> `...35`
## • `` -> `...36`
## • `` -> `...37`
## • `` -> `...38`
## • `` -> `...39`
## • `` -> `...40`
## • `` -> `...41`
## • `` -> `...42`
## • `` -> `...43`
## • `` -> `...44`
## • `` -> `...45`
## • `` -> `...46`
## • `` -> `...47`
## • `` -> `...48`
## • `` -> `...49`
## • `` -> `...50`
## • `` -> `...51`
## • `` -> `...52`
## • `` -> `...53`
## • `` -> `...54`
## • `` -> `...55`
## • `` -> `...56`
## • `` -> `...57`
## • `` -> `...58`
## • `` -> `...59`
## • `` -> `...60`
## • `` -> `...61`
## • `` -> `...62`
## • `` -> `...63`
## • `` -> `...64`
## • `` -> `...65`
## • `` -> `...66`
## • `` -> `...67`
## • `` -> `...68`
## • `` -> `...69`
## • `` -> `...70`
## • `` -> `...71`
## • `` -> `...72`
## • `` -> `...73`
## • `` -> `...74`
## • `` -> `...75`
## • `` -> `...76`
## • `` -> `...77`
## • `` -> `...78`
## • `` -> `...79`
## • `` -> `...80`
## • `` -> `...81`
## • `` -> `...82`
## • `` -> `...83`
## • `` -> `...84`
## • `` -> `...85`
## • `` -> `...86`
## • `` -> `...87`
## • `` -> `...88`
## • `` -> `...89`
## • `` -> `...90`
## • `` -> `...91`
## • `` -> `...92`
## • `` -> `...93`
## • `` -> `...94`
## • `` -> `...95`
## • `` -> `...96`
## • `` -> `...97`
## • `` -> `...98`
## • `` -> `...99`
## • `` -> `...100`
## • `` -> `...101`
## • `` -> `...102`
## • `` -> `...103`
## • `` -> `...104`
## • `` -> `...105`
## • `` -> `...106`
## • `` -> `...107`
## • `` -> `...108`
## • `` -> `...109`
## • `` -> `...110`
## • `` -> `...111`
## • `` -> `...112`
## • `` -> `...113`
## • `` -> `...114`
## • `` -> `...115`
## • `` -> `...116`
## • `` -> `...117`
## • `` -> `...118`
## • `` -> `...119`
## • `` -> `...120`
## • `` -> `...121`
## • `` -> `...122`
## • `` -> `...123`
## • `` -> `...124`
## • `` -> `...125`
## • `` -> `...126`
## • `` -> `...127`
## • `` -> `...128`
## • `` -> `...129`
## • `` -> `...130`
## • `` -> `...131`
## • `` -> `...132`
## • `` -> `...133`
## • `` -> `...134`
## • `` -> `...135`

Remove blank columns

drugs_side_effects_drugs_com <- drugs_side_effects_drugs_com[, sapply(drugs_side_effects_drugs_com, function(x) !all(is.na(x)))]

Keep only relevant rows

drugs_side_effects_drugs_com <- drugs_side_effects_drugs_com[1:1116, ]

View the updated data frame structure

str(drugs_side_effects_drugs_com)
## tibble [1,116 × 17] (S3: tbl_df/tbl/data.frame)
##  $ drug_name                    : chr [1:1116] "doxycycline" "spironolactone" "minocycline" "Accutane" ...
##  $ medical_condition            : chr [1:1116] "Acne" "Acne" "Acne" "Acne" ...
##  $ side_effects                 : chr [1:1116] "(hives, difficult breathing, swelling in your face or throat) or a severe skin reaction (fever, sore throat, bu"| __truncated__ "hives ; difficulty breathing; swelling of your face, lips, tongue, or throat. Call your doctor at once if you h"| __truncated__ "skin rash, fever, swollen glands, flu-like symptoms, muscle aches, severe weakness, unusual bruising, or yellow"| __truncated__ "problems with your vision or hearing; muscle or joint pain, bone pain, back pain; increased thirst, increased u"| __truncated__ ...
##  $ generic_name                 : chr [1:1116] "doxycycline" "spironolactone" "minocycline" "isotretinoin (oral)" ...
##  $ drug_classes                 : chr [1:1116] "Miscellaneous antimalarials, Tetracyclines" "Aldosterone receptor antagonists, Potassium-sparing diuretics" "Tetracyclines" "Miscellaneous antineoplastics, Miscellaneous uncategorized agents" ...
##  $ brand_names                  : chr [1:1116] "Acticlate, Adoxa CK, Adoxa Pak, Adoxa TT, Alodox, Avidoxy, Doryx, Mondoxyne NL, Monodox, Morgidox, Okebo, Orace"| __truncated__ "Aldactone, CaroSpir" "Dynacin, Minocin, Minolira, Solodyn, Ximino, Vectrin, Myrac" NA ...
##  $ activity                     : num [1:1116] 0.87 0.82 0.48 0.41 0.39 0.35 0.3 0.26 0.2 0.17 ...
##  $ rx_otc                       : chr [1:1116] "Rx" "Rx" "Rx" "Rx" ...
##  $ pregnancy_category           : chr [1:1116] "D" "C" "D" "X" ...
##  $ csa                          : chr [1:1116] "N" "N" "N" "N" ...
##  $ alcohol                      : chr [1:1116] "X" "X" NA "X" ...
##  $ related_drugs                : chr [1:1116] "amoxicillin: https://www.drugs.com/amoxicillin.html | prednisone: https://www.drugs.com/prednisone.html | albut"| __truncated__ "amlodipine: https://www.drugs.com/amlodipine.html | lisinopril: https://www.drugs.com/lisinopril.html | losarta"| __truncated__ "amoxicillin: https://www.drugs.com/amoxicillin.html | prednisone: https://www.drugs.com/prednisone.html | doxyc"| __truncated__ "doxycycline: https://www.drugs.com/doxycycline.html | clindamycin topical: https://www.drugs.com/mtm/clindamyci"| __truncated__ ...
##  $ medical_condition_description: chr [1:1116] "Acne Other names: Acne Vulgaris; Blackheads; Breakouts; Cystic acne; Pimples; Whiteheads; Zits Acne is a skin c"| __truncated__ "Acne Other names: Acne Vulgaris; Blackheads; Breakouts; Cystic acne; Pimples; Whiteheads; Zits Acne is a skin c"| __truncated__ "Acne Other names: Acne Vulgaris; Blackheads; Breakouts; Cystic acne; Pimples; Whiteheads; Zits Acne is a skin c"| __truncated__ "Acne Other names: Acne Vulgaris; Blackheads; Breakouts; Cystic acne; Pimples; Whiteheads; Zits Acne is a skin c"| __truncated__ ...
##  $ rating                       : num [1:1116] 6.8 7.2 5.7 7.9 7.4 7.6 7.7 8 8.5 7.9 ...
##  $ no_of_reviews                : num [1:1116] 760 449 482 623 146 8 439 999 96 86 ...
##  $ drug_link                    : chr [1:1116] "https://www.drugs.com/doxycycline.html" "https://www.drugs.com/spironolactone.html" "https://www.drugs.com/minocycline.html" "https://www.drugs.com/accutane.html" ...
##  $ medical_condition_url        : chr [1:1116] "https://www.drugs.com/condition/acne.html" "https://www.drugs.com/condition/acne.html" "https://www.drugs.com/condition/acne.html" "https://www.drugs.com/condition/acne.html" ...
names(drugs_side_effects_drugs_com)
##  [1] "drug_name"                     "medical_condition"            
##  [3] "side_effects"                  "generic_name"                 
##  [5] "drug_classes"                  "brand_names"                  
##  [7] "activity"                      "rx_otc"                       
##  [9] "pregnancy_category"            "csa"                          
## [11] "alcohol"                       "related_drugs"                
## [13] "medical_condition_description" "rating"                       
## [15] "no_of_reviews"                 "drug_link"                    
## [17] "medical_condition_url"

#4. Print Top 15 rows

print(head(drugs_side_effects_drugs_com,15))
## # A tibble: 15 × 17
##    drug_name        medical_condition side_effects     generic_name drug_classes
##    <chr>            <chr>             <chr>            <chr>        <chr>       
##  1 doxycycline      Acne              (hives, difficu… doxycycline  Miscellaneo…
##  2 spironolactone   Acne              hives ; difficu… spironolact… Aldosterone…
##  3 minocycline      Acne              skin rash, feve… minocycline  Tetracyclin…
##  4 Accutane         Acne              problems with y… isotretinoi… Miscellaneo…
##  5 clindamycin      Acne              hives ; difficu… clindamycin… Topical acn…
##  6 Aldactone        Acne              hives ; difficu… spironolact… Aldosterone…
##  7 tretinoin        Acne              hives ; difficu… tretinoin t… Topical acn…
##  8 isotretinoin     Acne              problems with y… isotretinoi… Miscellaneo…
##  9 Bactrim          Acne              skin rash, feve… sulfamethox… Sulfonamides
## 10 Retin-A          Acne              hives; difficul… Retin-A      Topical acn…
## 11 Aczone           Acne              hives; difficul… dapsone top… Topical acn…
## 12 benzoyl peroxide Acne              Benzoyl peroxid… benzoyl per… Topical acn…
## 13 Differin         Acne              hives, itching;… adapalene t… Topical acn…
## 14 Epiduo           Acne              Benzoyl peroxid… adapalene a… Topical acn…
## 15 adapalene        Acne              hives , itching… adapalene t… Topical acn…
## # ℹ 12 more variables: brand_names <chr>, activity <dbl>, rx_otc <chr>,
## #   pregnancy_category <chr>, csa <chr>, alcohol <chr>, related_drugs <chr>,
## #   medical_condition_description <chr>, rating <dbl>, no_of_reviews <dbl>,
## #   drug_link <chr>, medical_condition_url <chr>

#5.A user defined function using any of the variables from the data set. # example: filter high rated drugs for a condition

filter_high_rated_drugs <- function(drugs_side_effects_drugs_com, condition_value, rating_threshold = 4) {
  filtered_drug_side_effects_drugs_com <- drugs_side_effects_drugs_com[drugs_side_effects_drugs_com$alcohol == condition_value & drugs_side_effects_drugs_com$rating >= rating_threshold, ]
  return(filtered_drug_side_effects_drugs_com)
}

Example usage:

high_rated <- filter_high_rated_drugs(drugs_side_effects_drugs_com,"Yes", rating_threshold = 4)
head(high_rated)
## # A tibble: 6 × 17
##   drug_name medical_condition side_effects generic_name drug_classes brand_names
##   <chr>     <chr>             <chr>        <chr>        <chr>        <chr>      
## 1 <NA>      <NA>              <NA>         <NA>         <NA>         <NA>       
## 2 <NA>      <NA>              <NA>         <NA>         <NA>         <NA>       
## 3 <NA>      <NA>              <NA>         <NA>         <NA>         <NA>       
## 4 <NA>      <NA>              <NA>         <NA>         <NA>         <NA>       
## 5 <NA>      <NA>              <NA>         <NA>         <NA>         <NA>       
## 6 <NA>      <NA>              <NA>         <NA>         <NA>         <NA>       
## # ℹ 11 more variables: activity <dbl>, rx_otc <chr>, pregnancy_category <chr>,
## #   csa <chr>, alcohol <chr>, related_drugs <chr>,
## #   medical_condition_description <chr>, rating <dbl>, no_of_reviews <dbl>,
## #   drug_link <chr>, medical_condition_url <chr>

7. Data Manipulation + Filter

Filter prescription-only drugs with activity >= 70

drugs_side_effects_drugs_com_filtered <- subset(drugs_side_effects_drugs_com, alcohol == "prescription-only" & activity >= 70)
print(drugs_side_effects_drugs_com_filtered)
## # A tibble: 0 × 17
## # ℹ 17 variables: drug_name <chr>, medical_condition <chr>, side_effects <chr>,
## #   generic_name <chr>, drug_classes <chr>, brand_names <chr>, activity <dbl>,
## #   rx_otc <chr>, pregnancy_category <chr>, csa <chr>, alcohol <chr>,
## #   related_drugs <chr>, medical_condition_description <chr>, rating <dbl>,
## #   no_of_reviews <dbl>, drug_link <chr>, medical_condition_url <chr>

8. Identify Dep/Indep Variables and Join into New Frame

Dependent: activity; Independent: condition, drug

Create new frame with dependent variable ‘activity’ and independent variables ‘condition’ and ‘drug’

new_df <- drugs_side_effects_drugs_com %>% 
  select(activity, condition = rating, drug = alcohol)

new_df
## # A tibble: 1,116 × 3
##    activity condition drug 
##       <dbl>     <dbl> <chr>
##  1     0.87       6.8 X    
##  2     0.82       7.2 X    
##  3     0.48       5.7 <NA> 
##  4     0.41       7.9 X    
##  5     0.39       7.4 <NA> 
##  6     0.35       7.6 X    
##  7     0.3        7.7 <NA> 
##  8     0.26       8   X    
##  9     0.2        8.5 X    
## 10     0.17       7.9 <NA> 
## # ℹ 1,106 more rows

9. Remove Missing Values

Remove rows with missing values

drugs_clean <- na.omit(drugs_side_effects_drugs_com)

10. Remove duplicated rows

drugs_clean <- drugs_clean[!duplicated(drugs_clean), ]

#11. Reorder multiple rows in descending order

df_sorted <- drugs_side_effects_drugs_com %>%
  arrange(desc(activity), desc(rating), desc(no_of_reviews))

df_sorted
## # A tibble: 1,116 × 17
##    drug_name    medical_condition side_effects         generic_name drug_classes
##    <chr>        <chr>             <chr>                <chr>        <chr>       
##  1 Benadryl     Colds & Flu       "hives; difficult b… diphenhydra… Anticholine…
##  2 Vyvanse      ADHD              "hives; difficult b… lisdexamfet… CNS stimula…
##  3 docusate     Constipation      "hives ; difficult … docusate (o… Laxatives   
##  4 atorvastatin Cholesterol       "hives; difficulty … atorvastatin Statins     
##  5 carboplatin  Cancer            "Carboplatin may ca… carboplatin… Alkylating …
##  6 Aricept      Alzheimer's       "hives; difficult b… donepezil (… Cholinester…
##  7 Truvada      AIDS/HIV          "hives ; difficult … emtricitabi… Antiviral c…
##  8 Lamictal     Bipolar Disorder  "mood or behavior c… lamotrigine  Triazine an…
##  9 Adderall     ADHD              "hives; difficult b… amphetamine… CNS stimula…
## 10 Symbicort    COPD              "hives ; difficulty… budesonide … Bronchodila…
## # ℹ 1,106 more rows
## # ℹ 12 more variables: brand_names <chr>, activity <dbl>, rx_otc <chr>,
## #   pregnancy_category <chr>, csa <chr>, alcohol <chr>, related_drugs <chr>,
## #   medical_condition_description <chr>, rating <dbl>, no_of_reviews <dbl>,
## #   drug_link <chr>, medical_condition_url <chr>

#12.Rename some of the column names in your dataset.

df_renamed <- drugs_side_effects_drugs_com %>%
  rename( act = activity,
    rate = rating,
    reviews = no_of_reviews,
    alcohol_status = alcohol)

#13- Add new variables in your data frame by using a mathematical function

Add new variable “new_variable” by applying a mathematical function on the “activity” column

drugs_side_effects_drugs_com_filtered <- drugs_side_effects_drugs_com_filtered %>% mutate(new_variable = activity * 2)

summary(drugs_side_effects_drugs_com_filtered)
##   drug_name         medical_condition  side_effects       generic_name      
##  Length:0           Length:0           Length:0           Length:0          
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  drug_classes       brand_names           activity      rx_otc         
##  Length:0           Length:0           Min.   : NA   Length:0          
##  Class :character   Class :character   1st Qu.: NA   Class :character  
##  Mode  :character   Mode  :character   Median : NA   Mode  :character  
##                                        Mean   :NaN                     
##                                        3rd Qu.: NA                     
##                                        Max.   : NA                     
##  pregnancy_category     csa              alcohol          related_drugs     
##  Length:0           Length:0           Length:0           Length:0          
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  medical_condition_description     rating    no_of_reviews  drug_link        
##  Length:0                      Min.   : NA   Min.   : NA   Length:0          
##  Class :character              1st Qu.: NA   1st Qu.: NA   Class :character  
##  Mode  :character              Median : NA   Median : NA   Mode  :character  
##                                Mean   :NaN   Mean   :NaN                     
##                                3rd Qu.: NA   3rd Qu.: NA                     
##                                Max.   : NA   Max.   : NA                     
##  medical_condition_url  new_variable
##  Length:0              Min.   : NA  
##  Class :character      1st Qu.: NA  
##  Mode  :character      Median : NA  
##                        Mean   :NaN  
##                        3rd Qu.: NA  
##                        Max.   : NA

View the column names of the updated data frame

names(drugs_side_effects_drugs_com_filtered)
##  [1] "drug_name"                     "medical_condition"            
##  [3] "side_effects"                  "generic_name"                 
##  [5] "drug_classes"                  "brand_names"                  
##  [7] "activity"                      "rx_otc"                       
##  [9] "pregnancy_category"            "csa"                          
## [11] "alcohol"                       "related_drugs"                
## [13] "medical_condition_description" "rating"                       
## [15] "no_of_reviews"                 "drug_link"                    
## [17] "medical_condition_url"         "new_variable"

14- Create a training set using a random number generator engine

set.seed(123)  # For reproducibility
train_indices <- sample(seq_len(nrow(drugs_side_effects_drugs_com)), size = floor(0.7 * nrow(drugs_side_effects_drugs_com)))
training_set <- drugs_side_effects_drugs_com[train_indices, ]
print(dim(training_set))
## [1] 781  17

Check the structure of the data to ensure it’s loaded correctly

str(drugs_side_effects_drugs_com)
## tibble [1,116 × 17] (S3: tbl_df/tbl/data.frame)
##  $ drug_name                    : chr [1:1116] "doxycycline" "spironolactone" "minocycline" "Accutane" ...
##  $ medical_condition            : chr [1:1116] "Acne" "Acne" "Acne" "Acne" ...
##  $ side_effects                 : chr [1:1116] "(hives, difficult breathing, swelling in your face or throat) or a severe skin reaction (fever, sore throat, bu"| __truncated__ "hives ; difficulty breathing; swelling of your face, lips, tongue, or throat. Call your doctor at once if you h"| __truncated__ "skin rash, fever, swollen glands, flu-like symptoms, muscle aches, severe weakness, unusual bruising, or yellow"| __truncated__ "problems with your vision or hearing; muscle or joint pain, bone pain, back pain; increased thirst, increased u"| __truncated__ ...
##  $ generic_name                 : chr [1:1116] "doxycycline" "spironolactone" "minocycline" "isotretinoin (oral)" ...
##  $ drug_classes                 : chr [1:1116] "Miscellaneous antimalarials, Tetracyclines" "Aldosterone receptor antagonists, Potassium-sparing diuretics" "Tetracyclines" "Miscellaneous antineoplastics, Miscellaneous uncategorized agents" ...
##  $ brand_names                  : chr [1:1116] "Acticlate, Adoxa CK, Adoxa Pak, Adoxa TT, Alodox, Avidoxy, Doryx, Mondoxyne NL, Monodox, Morgidox, Okebo, Orace"| __truncated__ "Aldactone, CaroSpir" "Dynacin, Minocin, Minolira, Solodyn, Ximino, Vectrin, Myrac" NA ...
##  $ activity                     : num [1:1116] 0.87 0.82 0.48 0.41 0.39 0.35 0.3 0.26 0.2 0.17 ...
##  $ rx_otc                       : chr [1:1116] "Rx" "Rx" "Rx" "Rx" ...
##  $ pregnancy_category           : chr [1:1116] "D" "C" "D" "X" ...
##  $ csa                          : chr [1:1116] "N" "N" "N" "N" ...
##  $ alcohol                      : chr [1:1116] "X" "X" NA "X" ...
##  $ related_drugs                : chr [1:1116] "amoxicillin: https://www.drugs.com/amoxicillin.html | prednisone: https://www.drugs.com/prednisone.html | albut"| __truncated__ "amlodipine: https://www.drugs.com/amlodipine.html | lisinopril: https://www.drugs.com/lisinopril.html | losarta"| __truncated__ "amoxicillin: https://www.drugs.com/amoxicillin.html | prednisone: https://www.drugs.com/prednisone.html | doxyc"| __truncated__ "doxycycline: https://www.drugs.com/doxycycline.html | clindamycin topical: https://www.drugs.com/mtm/clindamyci"| __truncated__ ...
##  $ medical_condition_description: chr [1:1116] "Acne Other names: Acne Vulgaris; Blackheads; Breakouts; Cystic acne; Pimples; Whiteheads; Zits Acne is a skin c"| __truncated__ "Acne Other names: Acne Vulgaris; Blackheads; Breakouts; Cystic acne; Pimples; Whiteheads; Zits Acne is a skin c"| __truncated__ "Acne Other names: Acne Vulgaris; Blackheads; Breakouts; Cystic acne; Pimples; Whiteheads; Zits Acne is a skin c"| __truncated__ "Acne Other names: Acne Vulgaris; Blackheads; Breakouts; Cystic acne; Pimples; Whiteheads; Zits Acne is a skin c"| __truncated__ ...
##  $ rating                       : num [1:1116] 6.8 7.2 5.7 7.9 7.4 7.6 7.7 8 8.5 7.9 ...
##  $ no_of_reviews                : num [1:1116] 760 449 482 623 146 8 439 999 96 86 ...
##  $ drug_link                    : chr [1:1116] "https://www.drugs.com/doxycycline.html" "https://www.drugs.com/spironolactone.html" "https://www.drugs.com/minocycline.html" "https://www.drugs.com/accutane.html" ...
##  $ medical_condition_url        : chr [1:1116] "https://www.drugs.com/condition/acne.html" "https://www.drugs.com/condition/acne.html" "https://www.drugs.com/condition/acne.html" "https://www.drugs.com/condition/acne.html" ...

17 - Scatter Plot

# Plotting 'activity' vs 'rating'
scatter_plot <- ggplot(drugs_side_effects_drugs_com, aes(x = activity, y = rating)) +
  geom_point(color = "blue") +
  labs(title = "Scatter Plot: Activity vs Rating",
       x = "Activity",
       y = "Rating")

# Explicitly print the scatter plot
print(scatter_plot)
## Warning: Removed 556 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(drugs_side_effects_drugs_com, aes(x = activity, y = rating)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red") +
  labs(
    title = "Scatter Plot: Activity vs Rating with Trend Line",
    x = "Activity",
    y = "Rating"
  )
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 556 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 556 rows containing missing values or values outside the scale range
## (`geom_point()`).

#There doesn’t seem to be a strong connection between how active a drug is and how people rate it. #If anything, higher activity might be linked to a slightly lower rating, #but this effect is very small and the ratings are spread out.

#18- Bar Plot

# Ensure 'alcohol' is a factor variable in the df data frame
if (!is.factor(drugs_side_effects_drugs_com$alcohol)) {
  drugs_side_effects_drugs_com$alcohol <- as.factor(drugs_side_effects_drugs_com$alcohol)
}

ggplot(drugs_side_effects_drugs_com, aes(x = alcohol, y = rating)) +
  stat_summary(fun = mean, geom = "bar", fill = "skyblue") +
  labs(title = "Mean Drug Rating by Alcohol Category", y = "Mean Rating", x = "Alcohol Category")
## Warning: Removed 556 rows containing non-finite outside the scale range
## (`stat_summary()`).

unique(drugs_side_effects_drugs_com$alcohol)
## [1] X    <NA>
## Levels: X
# or for factors
levels(as.factor(drugs_side_effects_drugs_com$alcohol))
## [1] "X"
names(drugs_side_effects_drugs_com)
##  [1] "drug_name"                     "medical_condition"            
##  [3] "side_effects"                  "generic_name"                 
##  [5] "drug_classes"                  "brand_names"                  
##  [7] "activity"                      "rx_otc"                       
##  [9] "pregnancy_category"            "csa"                          
## [11] "alcohol"                       "related_drugs"                
## [13] "medical_condition_description" "rating"                       
## [15] "no_of_reviews"                 "drug_link"                    
## [17] "medical_condition_url"
drugs_side_effects_drugs_com <- drugs_side_effects_drugs_com %>% slice(1:1116)

ggplot(drugs_side_effects_drugs_com, aes(x = alcohol, y = rating)) +
  stat_summary(fun = mean, geom = "bar", fill = "skyblue") +
  labs(title = "Mean Drug Rating by Alcohol Category", y = "Mean Rating", x = "Alcohol Category")
## Warning: Removed 556 rows containing non-finite outside the scale range
## (`stat_summary()`).

# Convert 'X' to "Alcohol Reaction", NA/blank to "No Alcohol Reaction"
drugs_side_effects_drugs_com$alcohol_reaction <- ifelse( is.na(drugs_side_effects_drugs_com$alcohol) 
                                                         | drugs_side_effects_drugs_com$alcohol == "",
  "No Alcohol Reaction", "Alcohol Reaction" )
table(drugs_side_effects_drugs_com$alcohol_reaction)
## 
##    Alcohol Reaction No Alcohol Reaction 
##                 541                 575
ggplot(drugs_side_effects_drugs_com, aes(x = alcohol_reaction, y = rating)) +
  stat_summary(fun = mean, geom = "bar", fill = "skyblue") +
  labs(
    title = "Mean Drug Rating by Alcohol Reaction",
    x = "Alcohol Reaction",
    y = "Mean Rating"
  )
## Warning: Removed 556 rows containing non-finite outside the scale range
## (`stat_summary()`).

means <- drugs_side_effects_drugs_com %>%
  group_by(alcohol_reaction) %>%
  summarise(mean_rating = mean(rating, na.rm = TRUE))

ggplot(means, aes(x = alcohol_reaction, y = mean_rating)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  geom_text(aes(label = round(mean_rating, 2)), vjust = -0.5, size = 5) +
  labs(
    title = "Mean Drug Rating by Alcohol Reaction",
    x = "Alcohol Reaction",
    y = "Mean Rating"
  ) +
  coord_cartesian(ylim = c(6, 7.5))

# Verify the variable type
str(drugs_side_effects_drugs_com$alcohol)
##  Factor w/ 1 level "X": 1 1 NA 1 NA 1 NA 1 1 NA ...
# Creating a bar plot for mean rating by 'alcohol' category.
bar_plot <- ggplot(drugs_side_effects_drugs_com, aes(x = as.factor(alcohol), y = rating)) +
  stat_summary(fun = mean, geom = "bar", fill = "orange") +
  labs(title = "Bar Plot: Mean Rating by Alcohol",
       x = "Alcohol",
       y = "Mean Rating")

#People rate drugs about the same, whether or not the drug has an alcohol reaction.

19- Least Squares Linear Regression

# Fitting a linear regression model for rating vs activity
lm_model <- lm(rating ~ activity, data = drugs_side_effects_drugs_com)
# Print the summary which includes correlation info
print(summary(lm_model))
## 
## Call:
## lm(formula = rating ~ activity, data = drugs_side_effects_drugs_com)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.1746 -1.0891  0.2365  1.5473  3.7914 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.1746     0.1135  63.195  < 2e-16 ***
## activity     -1.1104     0.4136  -2.685  0.00748 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.186 on 558 degrees of freedom
##   (556 observations deleted due to missingness)
## Multiple R-squared:  0.01275,    Adjusted R-squared:  0.01098 
## F-statistic: 7.207 on 1 and 558 DF,  p-value: 0.007478

#Drugs with higher activity usually get slightly lower ratings, but activity doesn’t really help predict ratings in a meaningful way.

#Conclusion In summary, there is no strong relationship between drug activity and user ratings. Drugs with or without alcohol reactions are rated similarly.

#THEEND