| Member.1 | Member.2 |
|---|---|
| Archit Chawla | Robby Connor |
2024-04-23
| Member.1 | Member.2 |
|---|---|
| Archit Chawla | Robby Connor |
ISSUE Identify a model with protected columns & compare and contrast how differnet H2o Models perform.
Protected Columns
| text | date | gender | age | horoscope | job | text_length | sentiment_score | month | day | year | day_of_week | gender_female | gender_male | age_group | age_group_0_18 | age_group_19_25 | age_group_26_35 | age_group_36_50 | age_group_50plus | horoscope_factors | avg_sentiment_score_all | max_text_length_all | sentiment_trend |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| goin on n | 2001-02-20 | female | 16 | Pisces | Student | 1203 | -1 | 2 | 20 | 2001 | Tue | 1 | 0 | 0-18 | 1 | 0 | 0 | 0 | 0 | 0 | 4.8504 | 12663 | NA |
| i write re | 2001-02-20 | female | 16 | Pisces | Student | 41 | -1 | 2 | 20 | 2001 | Tue | 1 | 0 | 0-18 | 1 | 0 | 0 | 0 | 0 | 0 | 4.8504 | 12663 | NA |
| some idiot | 2001-02-20 | female | 16 | Pisces | Student | 346 | 2 | 2 | 20 | 2001 | Tue | 1 | 0 | 0-18 | 1 | 0 | 0 | 0 | 0 | 0 | 4.8504 | 12663 | NA |
| am i prett | 2001-02-20 | female | 16 | Pisces | Student | 32 | 1 | 2 | 20 | 2001 | Tue | 1 | 0 | 0-18 | 1 | 0 | 0 | 0 | 0 | 0 | 4.8504 | 12663 | NA |
| 1 you attr | 2001-02-20 | female | 16 | Pisces | Student | 530 | 5 | 2 | 20 | 2001 | Tue | 1 | 0 | 0-18 | 1 | 0 | 0 | 0 | 0 | 0 | 4.8504 | 12663 | NA |
| Column_Name | Data_Type | Column_Type |
|---|---|---|
| text | text | Feature |
| date | date | Feature |
| gender | text | Feature |
| age | int64 | Feature |
| horoscope | text | Feature |
| job | text | Predictor |
processed_data <- read_csv('processed_data.csv')
blog_authorship_data <- as.h2o(processed_data)
Accuracy of GLM Model
hugging_face_predictions <- h2o.predict(m1, test_h2o) perf_glm <- h2o.performance(m1, newdata = test_h2o)
Accuracy of Naivie Bayes
hugging_face_predictions <- h2o.predict(m2, test_h2o) perf_nb <- h2o.performance(m2, newdata = test_h2o)
Accuracy of the GBM model
hugging_face_predictions <- h2o.predict(m3, test_h2o) perf_gbm <- h2o.performance(m3, newdata = test_h2o)
conf_matrix <- as.data.frame.matrix(h2o.confusionMatrix(perf_nb)) #conf_matrix %>% #kbl() %>% #kable_material_dark()
Due to the number of predictors the confusion matrix is difficult to read, so it is excluded