boros <- tibble::tribble( ~ Boro, ~ Pop, ~ Size, ~ Random, 'Manhattan', 1600000, 23, 7, 'Brooklyn', 2600000, 78, 24, 'Queens', 2330000, 104, pi, 'Bronx', 1455000, 42, 21, 'Staten Island', 475000, 60, 3 )
boros
## # A tibble: 5 x 4
## Boro Pop Size Random
## <chr> <dbl> <dbl> <dbl>
## 1 Manhattan 1600000 23 7
## 2 Brooklyn 2600000 78 24
## 3 Queens 2330000 104 3.14
## 4 Bronx 1455000 42 21
## 5 Staten Island 475000 60 3
The dataset boros is used below and the single explanatory variable Pop is used to generate the function. Notice that intercept is 1.
build.x(~Pop, data=boros)
## (Intercept) Pop
## 1 1 1600000
## 2 1 2600000
## 3 1 2330000
## 4 1 1455000
## 5 1 475000
## attr(,"assign")
## [1] 0 1
Now with two(02) variables Pop and Size
build.x(~Pop+Size, data = boros)
## (Intercept) Pop Size
## 1 1 1600000 23
## 2 1 2600000 78
## 3 1 2330000 104
## 4 1 1455000 42
## 5 1 475000 60
## attr(,"assign")
## [1] 0 1 2
Two(02) variables multiplication
## multiplying 2 variables
build.x(~Pop*Size, data = boros)
## (Intercept) Pop Size Pop:Size
## 1 1 1600000 23 36800000
## 2 1 2600000 78 202800000
## 3 1 2330000 104 242320000
## 4 1 1455000 42 61110000
## 5 1 475000 60 28500000
## attr(,"assign")
## [1] 0 1 2 3
Two(02) variables division
## dividing 2 variables
build.x(~Pop:Size, data = boros)
## (Intercept) Pop:Size
## 1 1 36800000
## 2 1 202800000
## 3 1 242320000
## 4 1 61110000
## 5 1 28500000
## attr(,"assign")
## [1] 0 1
Three(03) variables interaction (multiplication)
## multiplying 3 variables
build.x(~Pop*Size*Random, data = boros)
## (Intercept) Pop Size Random Pop:Size Pop:Random Size:Random
## 1 1 1600000 23 7.000000 36800000 11200000 161.0000
## 2 1 2600000 78 24.000000 202800000 62400000 1872.0000
## 3 1 2330000 104 3.141593 242320000 7319911 326.7256
## 4 1 1455000 42 21.000000 61110000 30555000 882.0000
## 5 1 475000 60 3.000000 28500000 1425000 180.0000
## Pop:Size:Random
## 1 257600000
## 2 4867200000
## 3 761270732
## 4 1283310000
## 5 85500000
## attr(,"assign")
## [1] 0 1 2 3 4 5 6 7
Notice that for all outputs we have the intercept appearing by default. What if we do want to suppress the intercept from our output?
### By putting the -1 you remove the intercept from the output
build.x(~Pop+Size, data = boros)
## (Intercept) Pop Size
## 1 1 1600000 23
## 2 1 2600000 78
## 3 1 2330000 104
## 4 1 1455000 42
## 5 1 475000 60
## attr(,"assign")
## [1] 0 1 2
build.x(~Pop+Size-1, data = boros)
## Pop Size
## 1 1600000 23
## 2 2600000 78
## 3 2330000 104
## 4 1455000 42
## 5 475000 60
## attr(,"assign")
## [1] 1 2
Notice the second ouput does not have the intercept and this is done by putting the -1 at the end of the formula
build.x(~Boro, data = boros)
## (Intercept) BoroBrooklyn BoroManhattan BoroQueens BoroStaten Island
## 1 1 0 1 0 0
## 2 1 1 0 0 0
## 3 1 0 0 1 0
## 4 1 0 0 0 0
## 5 1 0 0 0 1
## attr(,"assign")
## [1] 0 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$Boro
## [1] "contr.treatment"
## Displaying the content of the variable Boro
boros$Boro
## [1] "Manhattan" "Brooklyn" "Queens" "Bronx"
## [5] "Staten Island"
Notice that new variables are created to generate the output and they are a concatenation of the variable name Boro and the different levels within the categorical variable which are the city names.
By displaying the content of the variable Boro we can see that “Bronx” does not appear on the output of the build.x variable. This means the function has selected “Bronx” as the baseline. This is done alphabetically. “Bronx” is the first in alphabetic order. Therefore, it was selected as baseline and dropped from the build.x output.
build.x(~Boro, data = boros, contrasts = FALSE)
## Warning: package 'bindrcpp' was built under R version 3.4.4
## (Intercept) BoroBronx BoroBrooklyn BoroManhattan BoroQueens
## 1 1 0 0 1 0
## 2 1 0 1 0 0
## 3 1 0 0 0 1
## 4 1 1 0 0 0
## 5 1 0 0 0 0
## BoroStaten Island
## 1 0
## 2 0
## 3 0
## 4 0
## 5 1
## attr(,"assign")
## [1] 0 1 1 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$Boro
## Bronx Brooklyn Manhattan Queens Staten Island
## Bronx 1 0 0 0 0
## Brooklyn 0 1 0 0 0
## Manhattan 0 0 1 0 0
## Queens 0 0 0 1 0
## Staten Island 0 0 0 0 1
This time we have all levels of the categorical variable listed.
build.x(~Boro+Pop-1, data = boros, contrasts = FALSE)
## BoroBronx BoroBrooklyn BoroManhattan BoroQueens BoroStaten Island
## 1 0 0 1 0 0
## 2 0 1 0 0 0
## 3 0 0 0 1 0
## 4 1 0 0 0 0
## 5 0 0 0 0 1
## Pop
## 1 1600000
## 2 2600000
## 3 2330000
## 4 1455000
## 5 475000
## attr(,"assign")
## [1] 1 1 1 1 1 2
## attr(,"contrasts")
## attr(,"contrasts")$Boro
## Bronx Brooklyn Manhattan Queens Staten Island
## Bronx 1 0 0 0 0
## Brooklyn 0 1 0 0 0
## Manhattan 0 0 1 0 0
## Queens 0 0 0 1 0
## Staten Island 0 0 0 0 1
Note: Every single value is stored including the zero which makes it memory and processor intensive for calculation.
build.x(~Boro+Pop-1, data = boros, contrasts = FALSE, sparse = TRUE)
## 5 x 6 sparse Matrix of class "dgCMatrix"
## BoroBronx BoroBrooklyn BoroManhattan BoroQueens BoroStaten Island
## 1 . . 1 . .
## 2 . 1 . . .
## 3 . . . 1 .
## 4 1 . . . .
## 5 . . . . 1
## Pop
## 1 1600000
## 2 2600000
## 3 2330000
## 4 1455000
## 5 475000
The output is a sparse matrix = Matrix that does not store or contains zeros(0).