Download Hurricane Harvey Resource contact info into R.
setwd("~/Desktop")
library(readxl)
Harvey <- read_excel("~/Desktop/Harvey.xlsx")
[110, 3]: expecting numeric: got 'anne.imber@gmail.com'[113, 3]: expecting numeric: got '1415 CALIFORNIA ST Houston, TX 77006'[114, 3]: expecting numeric: got '3317 Montrose Blvd. Houston, TX 77006'[115, 3]: expecting numeric: got '2612 Smith St. Houston, TX 77006'[116, 3]: expecting numeric: got '6400 Fannin St. Houston, TX77030'[117, 3]: expecting numeric: got '6128 Broadway St Galveston, TX 77551'
head(Harvey)
Look at column names
names(Harvey)
[1] "Who" "Number" "Email"
[4] "Category" "Reason" "City"
[7] "Notes" "Source"
Look for missing values
sum(is.na(Harvey))
[1] 481
This has too many missing values for practice. I’m going to swtich to another data set.
Practice Guided by https://www.youtube.com/watch?v=S7l1z301G30
setwd("~/Desktop")
library(readxl)
Films <- read_excel("~/Desktop/Movies.xlsx")
head(Films)
names(Films)
[1] "Title" "Year"
[3] "Rating" "Runtime"
[5] "Critic.Score" "Box.Office"
-Rename a comumn
names(Films)[5]<- "Critic.Score"
Count Missing Values
sum(is.na(Films))
[1] 0
I’m going to delete some values for practice and reload the excel file.
setwd("~/Desktop")
library(readxl)
Movies1 <- read_excel("~/Desktop/Movies.xlsx")
head(Movies1)
names(Movies1)[5]<- "Critic.Score"
sum(is.na(Movies1))
[1] 3
Delete Rows with na values.
q <- na.omit(Movies1)
Replace na values with mean values
Movies2$Runtime[which(is.na(Movies1$Runtime))] <- mean(Movies2$Runtime,na.rm = TRUE)
sum(is.na(Movies2))
[1] 2
Create Dummy Variables
Movies3<-cut(Movies2$Critic.Score, br = c(0,50,75,100), labels = c("Bad","Decent", "Good"))
Movies3
[1] Bad Good Bad Decent
[5] Bad Bad Decent Bad
[9] Good Decent Bad Bad
[13] Good Bad Bad Bad
[17] Bad Bad Good Bad
[21] Bad Good Bad Bad
[25] Decent Bad Good Bad
[29] Bad Bad Bad Good
[33] Bad Bad Bad Bad
[37] Bad Good Decent Bad
[41] Decent Good Good Bad
[45] Bad Decent Good Bad
[49] Bad Good Bad Bad
[53] Bad Bad Decent Decent
[57] Bad Bad Bad Bad
[61] Bad Decent Bad Decent
[65] Bad Bad Bad Decent
[69] Good Good Bad Decent
[73] Good Bad Decent Bad
[77] Good Good Good Good
[81] Decent Decent Decent Decent
[85] Good Good Decent Bad
[89] Good Good Decent Decent
[93] Good Good Decent Bad
[97] Bad Good Decent Bad
[101] Bad Bad Bad Bad
[105] Decent Bad <NA> Bad
[109] Bad Bad Bad Decent
[113] Bad Bad Bad Bad
[117] Bad Decent Bad Decent
[121] Decent Bad Good Bad
[125] Bad Decent Bad Bad
[129] Decent Bad Bad Bad
[133] Bad Bad Bad Bad
[137] Good Decent Good Bad
[141] Good Bad Good Bad
[145] Bad Bad Bad Bad
[149] Decent Bad Decent Bad
[153] Decent Decent Decent Decent
[157] Decent Bad Decent Bad
[161] Bad Bad Bad Bad
[165] Decent Good Decent Decent
[169] Bad Bad Bad Good
[173] Decent Bad Bad Bad
[177] Bad Good Decent Decent
[181] Decent Good Bad Bad
[185] Bad Bad Bad Bad
[189] Good Bad Bad Decent
[193] Bad Bad Bad Decent
[197] Decent Decent Bad Bad
[201] Decent Bad Decent Decent
[205] Good Bad Bad Bad
[209] Bad Bad Good Decent
[213] Bad Decent Decent Bad
[217] Good Good Good Decent
[221] Bad Bad Bad Decent
[225] Good Bad Bad Bad
[229] Good Bad Good Decent
[233] Bad Bad Bad Decent
[237] Good Decent Bad Bad
[241] Decent Good Good Bad
[245] Decent Good Bad Good
[249] Decent Good Good Bad
[253] Bad Decent Bad Good
[257] Bad Decent Good Decent
[261] Decent Good Decent Good
[265] Decent Bad Bad Decent
[269] Decent Bad Bad Bad
[273] <NA> Bad Bad Bad
[277] Decent Decent Bad Bad
[281] Bad Bad Bad Bad
[285] Bad Decent Bad Decent
[289] Bad Bad Decent Bad
[293] Bad Bad Good Bad
[297] Bad Decent Bad Good
[301] Bad Good Bad Bad
[305] Good Bad Bad Bad
[309] Bad Bad Bad Bad
[313] Decent Decent Good Decent
[317] Bad Good Bad Good
[321] Decent Bad Bad Good
[325] Bad Bad Decent Good
[329] Bad Good Bad Bad
[333] Decent Bad Decent Good
[337] Bad Bad Good Decent
[341] Bad Decent Good Decent
[345] Bad Bad Decent Bad
[349] Good Decent Bad Decent
[353] Bad Bad Decent Bad
[357] Decent Bad Bad Bad
[361] Bad Good Bad <NA>
[365] Bad Bad Bad Bad
[369] <NA> Bad Bad Decent
[373] Good Decent Decent Bad
[377] Decent Decent Bad Decent
[381] Decent Decent Decent Decent
[385] Bad <NA> Decent Bad
[389] Bad Bad Decent Good
[393] Good Decent Good Good
[397] Good Good Bad Good
[401] Bad Bad Good Decent
[405] Bad Decent Decent Bad
[409] Good Bad Bad Good
[413] Bad Bad Bad Good
[417] Good Decent Bad Good
[421] <NA> Decent Good Good
[425] Good Good Good Decent
[429] Good Good Bad Good
[433] Good Decent Decent Bad
[437] Bad Bad Decent Decent
[441] Decent Good Good Decent
[445] Good Good Bad Good
[449] Good Bad Decent Bad
[453] Bad Bad Bad Bad
[457] Bad Decent Bad Bad
[461] Bad Bad Decent Bad
[465] Bad Bad Bad Decent
[469] Bad Bad Bad Bad
[473] Bad Bad Decent Bad
[477] Bad Decent Bad Bad
[481] Bad Bad Bad Bad
[485] Bad Bad Bad Bad
[489] Good Bad Good Bad
[493] Decent Bad Decent Bad
[497] Good Good Bad Decent
[501] Decent Bad Bad Bad
[505] Good Decent Bad Bad
[509] Bad Bad Bad Decent
[513] Bad Bad Bad Bad
[517] Decent Good Bad Bad
[521] Bad Bad Bad Good
[525] Bad Decent Decent Decent
[529] Good Bad Bad Bad
[533] Bad Good Bad Bad
[537] Bad Decent Bad Good
[541] Bad Bad Bad Good
[545] Good Decent Bad Bad
[549] Good Bad Decent Bad
[553] Bad Decent Decent Good
[557] Good Bad Bad Decent
[561] Good Bad Bad Decent
[565] Bad Bad Bad Bad
[569] Bad Bad Bad Good
[573] Good Decent Bad Decent
[577] Decent Good Bad Bad
[581] Decent Bad Good Bad
[585] Good Bad Decent Decent
[589] Bad Decent Good Bad
[593] Bad Good Bad Bad
[597] Decent Decent Good Decent
[601] Bad Good Good Good
[605] Good Good Good Bad
[609] Decent Good Bad Decent
[613] Decent Decent Bad Decent
[617] Bad Good Good Decent
[621] Bad Good Decent Good
[625] Bad Bad Good Decent
[629] Bad Bad Bad Bad
[633] Good Decent Bad Bad
[637] Bad Decent Bad Bad
[641] Bad Bad Bad Bad
[645] Bad Decent Bad Decent
[649] Bad Bad Decent Bad
[653] Good Decent Bad Bad
[657] Good Bad Bad Decent
[661] Bad Decent Bad Bad
[665] Bad Bad Decent Bad
[669] Decent Bad Bad Bad
[673] Good Bad Decent Bad
[677] Good Bad Bad Bad
[681] Bad Good Bad Decent
[685] Bad Bad Bad Decent
[689] Decent Bad Good Good
[693] Bad Bad Decent Bad
[697] Bad Good Bad Bad
[701] Bad Good Bad Decent
[705] Good Good Bad Decent
[709] Good Bad Good Bad
[713] Bad Good Bad Bad
[717] Good Bad Bad Decent
[721] Bad Bad <NA> Decent
[725] Bad Bad Bad Decent
[729] Good Decent Bad Decent
[733] Decent Decent Bad Bad
[737] Good Decent Good Bad
[741] Good Bad Good Bad
[745] Bad Bad Good Good
[749] Bad Bad Bad Bad
[753] Decent Decent Bad Bad
[757] Bad Bad Good Bad
[761] Decent Decent Decent Bad
[765] Bad Decent Bad Bad
[769] Bad Bad Bad Good
[773] Bad Decent Decent Decent
[777] Bad Decent Bad Bad
[781] Bad Good Bad Good
[785] Good Good Good Bad
[789] Good Bad Bad Good
[793] Bad Good Decent Good
[797] Decent Bad Good Decent
[801] Decent Bad Bad Decent
[805] Bad Decent Good Decent
[809] Decent Decent Decent Decent
[813] Bad Decent Bad Bad
[817] Good Bad Bad Bad
[821] Decent Decent Bad Bad
[825] Bad Bad Bad Decent
[829] Good <NA> Bad Decent
[833] Bad Bad Bad Bad
[837] Bad Bad Bad Decent
[841] Bad Decent Bad Bad
[845] Bad Decent Bad Decent
[849] Bad Bad Bad Bad
[853] Decent Decent Bad Decent
[857] Bad Bad Bad Good
[861] Bad Bad Bad Decent
[865] Bad Decent Bad Good
[869] Bad Bad Decent Good
[873] Good Good Decent Decent
[877] Bad Good Bad Bad
[881] Bad Decent Good Decent
[885] Good Bad Bad Bad
[889] Decent Good Decent Good
[893] Bad Good Decent Bad
[897] Decent Bad Decent Bad
[901] Bad Good Good Bad
[905] Bad Bad Bad Bad
[909] Good Good Bad Good
[913] Good Bad Bad Good
[917] Good Bad Bad Decent
[921] Decent Bad Bad Decent
[925] Decent Bad Decent Bad
[929] Good Good Bad Bad
[933] Good Good Bad Decent
[937] Decent Decent Bad Bad
[941] Bad Decent Good Bad
[945] Bad Decent Bad Decent
[949] Bad Bad <NA> Decent
[953] Bad Decent Bad Bad
[957] Decent Decent Bad Good
[961] Bad Decent Bad Good
[965] Good Good Bad Bad
[969] Bad Good Bad Bad
[973] Decent Bad Decent Decent
[977] Good Good Decent Good
[981] Bad Bad Bad Bad
[985] Decent Bad Good Bad
[989] Bad Decent Good Bad
[993] Bad Good Bad Good
[997] Decent Bad Decent Decent
[ reached getOption("max.print") -- omitted 2238 entries ]
Levels: Bad Decent Good
Regress Box.Office over Critic.Rating
reg1 <- lm(Box.Office ~ Critic.Score, data = Movies3)
summary(reg1)
Call:
lm(formula = Box.Office ~ Critic.Score, data = Movies3)
Residuals:
Min 1Q Median 3Q Max
-59.33 -38.04 -20.06 12.98 707.21
Coefficients:
Estimate Std. Error
(Intercept) 21.85118 2.32123
Critic.Score 0.37880 0.04082
t value Pr(>|t|)
(Intercept) 9.414 <2e-16 ***
Critic.Score 9.280 <2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05
‘.’ 0.1 ‘ ’ 1
Residual standard error: 64.25 on 3234 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.02594, Adjusted R-squared: 0.02564
F-statistic: 86.12 on 1 and 3234 DF, p-value: < 2.2e-16
Predicted model will be y = .37880x + 21.85118, and the p value is significant.
If we needed to normalize the data we could try the following:
summary(Movies1$Box.Office)
Min. 1st Qu. Median Mean
0.0002 1.0000 16.1000 40.6800
3rd Qu. Max. NA's
51.5000 760.5000 1
summary(log10(Movies1$Box.Office+1))
Min. 1st Qu. Median
0.0000786 0.3010000 1.2330000
Mean 3rd Qu. Max.
1.1060000 1.7200000 2.8820000
NA's
1