This script is used to look at patterns within your data. The script goes through each available value within each column and builds charts to determine the proportion of times that each value occur with the dependent variable. Continuous variables (numbers) are split into quartiles (though whether it is groups of 4 or another number is easily changed) and are then factorized and evaluated as categorical data. Values that represent less than 1% of the data are omitted as they are often rare events and clutter up the charts.
Using the space below, enter the file name and variable you wish to test. Enter the fileName with its extension and put it in quotes (ex. “this is my data.csv”). Then type the logical test you want to observe (ex. “Sepal.Length >= 6.5”) This should be the column name of your data. Note that R column names do not have spaces or other punctuation. These characters will be replaced by a period when R reads in the file. Thus, use a period here or edit the column names before bringing in the file.
Plots are only shown for values that have > 2 observations