Note that to be thorough, I explained the data cleaning and calculation of new variables. However if you would like to see just a visualization with two paragraphs that describe it, please jump to the third section titled “A Single Visualization”.
About the Author: Abigail Richard received her PhD in mathematics from the University of Cincinnati, and she greatly enjoys working on interdisciplinary projects. If you would like to work with her on a collaborative research project related to this study (or any other study), please contact her at richaab@mail.uc.edu.
There is a wealth of evidence in psychology and management indicating that difficult goal-setting is important for performance (Jung et al., 2010; Locke & Latham, 1990; O’Leary-Kelly et al., 1994; Mento et al., 1987; Locke & Latham, 2002; Gellatly & Meyer, 1992). However although extensive research has been done to show the importance of setting difficult goals, there are relatively fewer psychological studies demonstrating the potential disadvantages that come with being overly ambitious (Ivancevich, 1982; Sales, 1970; Fowles, 1982; Bandura & Locke, 2003). One reason for this is that it can be difficult to measure these disadvantages experimentally. In situations where goals are unrealistic, it is often the case that those involved simply disregard an overly ambitious goal and instead set more realistic personal goals (Locke et al., 1984). This ‘change in goals’ can lead to difficulty in measuring the full negative effect of unrealistic goal-setting (Gellatly & Meyer, 1992). Given this, further studies are needed to more closely examine the effect of unrealistic goal-setting in situations where those involved cannot easily disregard the overly ambitious goals. We consider agency and company settings to perhaps be an appropriate outlet for such a study, as long as management does their best to stress the importance of reaching the unrealistic goals. To examine ambitious goal-setting in an agency environment, we use the FY20 MMR Performance Indicators data set. This data set is openly available for download in the NYC Open Data website at the following link:
https://data.cityofnewyork.us/City-Government/FY20-MMR-Performance-Indicators/qe3p-6ugh
A detailed variable dictionary can also be found at the preceding link. The data set contains a list of various agencies in New York City, their performance in various fiscal years, the desired direction for their performance goals, and their target performance goals for the fiscal years 2020 and 2021. After downloading the data from the above link, we import the data and view the variable names using the below code. The output below displays the variable names.
# Using the below code, we import the data set.
mmr_2020 <- read.csv("C:/Users/richa/Dropbox/My PC (DESKTOP-B9LT0L1)/Documents/Data Wrangling/Goals Homework/FY20_MMR_Performance_Indicators.csv")
# Using the below code, we view the variable names in the data set.
names(mmr_2020)
## [1] "ï..Agency" "MMR.Goal" "Critical"
## [4] "Performance.Indicator" "FY16" "FY17"
## [7] "FY18" "FY19" "FY20"
## [10] "TGT20" "TGT21" "X5yr.Trend"
## [13] "Desired.Direction"
For the purposes of this study, the only variables that we will be concerned with are “FY19”, “FY20”, “TGT20”, and “Desired.Direction”. Using the below code, we remove all columns from the data set that we do not plan to use, and we check that the data set now only contains the desired variables. The below output indicates the names of the desired remaining variables.
# Using the below code, we remove all variables that we do not need for our analysis.
mmr_2020 <- mmr_2020[, c(8, 9, 10, 13)]
# Using the below code, we display the variable names that are now in our data set.
names(mmr_2020)
## [1] "FY19" "FY20" "TGT20"
## [4] "Desired.Direction"
In the above output, we notice that the variable names do not follow the snake_case, and so we use the below code to appropriately change their names so that they follow the snake_case.
# Using the below code, we change the variable names to follow the snake_case.
names(mmr_2020) <- c("fy_19", "fy_20", "tgt_20", "desired_direction")
We have now changed the variable names in the data set to be “fy_19”, “fy_20”, “tgt_20”, and “desired_direction”. The variable “fy_19” contains the performance results for fiscal year 2019, the variable “fy_20” contains the performance results for fiscal year 2020, and the variable “tgt_20” contains the agency goal for the fiscal year 2020. The variable “desired_direction” is “Up” when it would be best for the agency to increase the numeric value of their performance, and the variable “desired_direction” is “Down” when it would be best for the agency to decrease the numeric value of their performance. The below code and output allows us to see the unique values that are contained in the variable “desired_direction”.
# Using the below code, we display the unique values contained in the variable desired_direction.
unique(mmr_2020$desired_direction)
## [1] "*" "Up" "Down"
In the above output, we see that a "*" occurs whenever there is a missing value for “desired_direction”. In our analysis, we will perform a calculation that will require values to be present in each of the variables in our data set. As a consequence, we need to remove the observations having missing values. Using the below code, we remove the observations with missing values in the variable “desired_direction”, and we check the remaining values in “desired_direction”. The below output indicates that “desired_direction” now only contains the values “Up” and “Down” (i.e., there are no longer any missing values in the variable “desired_direction”).
# Using the below code, we remove the observations with missing values in the variable desired_direction.
mmr_2020 <- mmr_2020[mmr_2020$desired_direction != "*",]
# Using the below code, we check the unique values that are now present in the variable desired_direction.
unique(mmr_2020$desired_direction)
## [1] "Up" "Down"
Using the below code and output, we examine the structure of the data set.
# Using the below code, we examine the structure of the data set.
str(mmr_2020)
## 'data.frame': 1125 obs. of 4 variables:
## $ fy_19 : chr "2,234" "20,185" "0:27" "0:23" ...
## $ fy_20 : chr "2,201" "10,553" "1:38" "1:03" ...
## $ tgt_20 : chr "*" "↑" "↓" "↓" ...
## $ desired_direction: chr "Up" "Up" "Down" "Down" ...
We see that each of the variables has been assigned the type character. However in our analysis, we will perform calculations using the variables “fy_19”, “fy_20”, and “tgt_20”. As a consequence, we will want to change the type of these variables to numeric. We do this using the below code, which will necessarily create a “NA” value in cases where the entry is not numeric.
# Using the below code we change the type of the variable fy_19 to numeric.
mmr_2020$fy_19 <- as.numeric(mmr_2020$fy_19)
## Warning: NAs introduced by coercion
# Using the below code we change the type of the variable fy_20 to numeric.
mmr_2020$fy_20 <- as.numeric(mmr_2020$fy_20)
## Warning: NAs introduced by coercion
# Using the below code we change the type of the variable tgt_20 to numeric.
mmr_2020$tgt_20 <- as.numeric(mmr_2020$tgt_20)
## Warning: NAs introduced by coercion
In our analysis, we will perform a calculation that will require values to be present in each of the variables in our data set. As a consequence, we remove the observations having missing values in the variables “fy_19”, “fy_20”, and “tgt_20” using the below code.
# Using the below code, we remove the observations having missing values in the variable fy_19.
mmr_2020 <- mmr_2020[!is.na(mmr_2020$fy_19),]
# Using the below code, we remove the observations having missing values in the variable fy_20.
mmr_2020 <- mmr_2020[!is.na(mmr_2020$fy_20),]
# Using the below code, we remove the observations having missing values in the variable tgt_20.
mmr_2020 <- mmr_2020[!is.na(mmr_2020$tgt_20),]
Using the below code, we load the package tidyverse to help us in creating the visualization displayed in the section titled “A Single Visualization.”
# Using the below code, we load the package tidyverse.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
The below output indicates that after removing all missing values, there are now 122 obervations left in the data set. The code that we used to produce this output is displayed directly below.
# Using the below code, we find the number of rows in the data set.
nrow(mmr_2020)
## [1] 122
We now create a new variable called goal_diff which is the difference between the goal in “tgt_20” and the performance in “fy_19”. We set this difference to be positive when the value for “tgt_20” is in the appropriate desired direction recorded in the variable “desired_direction”, and we set this difference to be negative when the value for “tgt_20” is not in the approrpriate desired direction (such cases occur if an agency set a goal for 2020 that was actually beneath their performance results in 2019). We do this using the below code.
# Using the below code, we create a new variable called goal_diff. We first set the values for goal_diff to be the numbers one through 122, as there are a total of 122 remaining observations.
goal_diff <- c(1:122)
# Using the below code, we create a variable i, and we initially set it equal to 1.
i = 1
# Using the below code, we set the appropriate values for goal_diff. An explanation for goal_diff was already provided in the preceding paragraph.
while(i <= 122){
if(mmr_2020$desired_direction[i] == "Up"){
goal_diff[i] = mmr_2020$tgt_20[i] - mmr_2020$fy_19[i]
}
if(mmr_2020$desired_direction[i] == "Down"){
goal_diff[i] = mmr_2020$fy_19[i] - mmr_2020$tgt_20[i]
}
i = i + 1
}
To convert these differences in goals (calculated in the preceding code) into a proportional relative difference, we divide each difference by the performance in fiscal year 2019. We do this using the below code, in which we create a new variable called “goal_rel_diff” to contain these relative differences for goals.
# Using the below code, we create a new variable to store the relative differences for the tgt_20 goals (i.e., relative to the performance in fy_19).
goal_rel_diff <- goal_diff / mmr_2020$fy_19
We now create a new variable called act_diff which is the difference between the actual performance in fiscal years 2019 and 2020 (i.e., the difference between the variables “fy_19” and “fy_20”). We set this difference to be positive when the performance change between fiscal years 2019 and 2020 is in the appropriate desired direction recorded in the variable “desired_direction” (such cases occur when the performance of the agency in fiscal year 2020 is better than that in fiscal year 2019), and we set this difference to be negative when the performance between fiscal years 2019 and 2020 indicates changes occuring in the opposite direction from what is desired (such cases occur when the performance of the agency in fiscal year 2020 is worse than that in fiscal year 2019). We do this using the below code.
# Using the below code, we create a new variable called act_diff. We first set the values for act_diff to be the numbers one through 122, as there are a total of 122 remaining observations in the data set mmr_2020.
act_diff <- c(1:122)
# Using the below code, we create a variable i, and we initially set it equal to one.
i = 1
# Using the below code, we set the appropriate values for act_diff. An explanation for act_diff was already provided in the preceding paragraph.
while(i <= 122){
if(mmr_2020$desired_direction[i] == "Up"){
act_diff[i] = mmr_2020$fy_20[i] - mmr_2020$fy_19[i]
}
if(mmr_2020$desired_direction[i] == "Down"){
act_diff[i] = mmr_2020$fy_19[i] - mmr_2020$fy_20[i]
}
i = i + 1
}
To convert these differences in performance between fiscal years (calculated in the preceding code) into a proportional relative difference, we divide each difference by the performance in fiscal year 2019. We do this using the below code, in which we create a new variable called “act_rel_diff” to contain these relative differences for actual performance.
# Using the below code, we create a new variable to store the relative differences for the performance results (i.e., comparing the performance in fy_20 relative to the performance in fy_19).
act_rel_diff <- act_diff / mmr_2020$fy_19
In this section, we create a single visualization comparing the relative differences calculated in the preceding section. This visualization is displayed in the output below and was produced using the following code.
# Using the below code, we create a data frame containing the two relative differences calculated in the preceding section. Doing so will allow us to use the ggplot() function to create a visualization that compares these two relative differences.
rel_diffs <- data.frame(goal_rel_diff, act_rel_diff)
# Using the below code, we create a visualization that compares the two relative differences calculated in the preceding section.
ggplot(data = rel_diffs, aes(x = goal_rel_diff, y = act_rel_diff)) + geom_point() + ggtitle("Relative Differences for Goals and Performance Among NYC Agencies", subtitle = "An Examination of the Influence of Goal-Setting on Performance") + theme(plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5)) + labs(x = "Relative Difference of Goals", y = "Relative Difference of Performance") + geom_smooth()
The above visualization is a scatter plot for which the horizontal axis is the “Relative Difference of Goals” (i.e., the variable goal_rel_diff calculated in the preceding section), and the vertical axis is the variable “Relative Difference of Performance” (i.e., the variable act_rel_diff calculated in the preceding section). We remind the reader that the variable goal_rel_diff is a fraction whose numerator represents the difference between the target goal for fiscal year 2020 and the performance in fiscal year 2019, and whose denominator is the performance in fiscal year 2019. We similarly remind the reader that the variable act_rel_diff is a fraction whose numerator represents the difference in performance between fiscal years 2019 and 2020, and whose denominator is the performance in fiscal year 2019. We also remind the reader that a description of the data set and motivation for this report can be found at the very beginning of the first section, along with a link to the website where the data set can be downloaded.
A smooth curve is fitted to the data, and this curve is depicted in blue in the preceding visualization. We notice that the left-most portion of the curve appears to be similar to the shape of a convex parabola. There is an extensive amount of literature demonstrating that as the difficulty of goals is increased, performance also often increases (Jung et al., 2010; Locke & Latham, 1990; O’Leary-Kelly et al., 1994; Mento et al., 1987; Locke & Lathan, 2002; Gellatly & Meyer, 1992). Hence the right-most side of this convex parabolic-like shape is to be expected, as it indicates this increased performance in the setting of more difficult goals. The left-most side of the convex parabolic-like shape (the portion showing a decrease in the “Relative Difference of Performance” as “Relative Difference of Goals” increases) appears to be due to one extreme outlier. This suggests that further research is needed to determine if the left-most side of the parabolic-like shape actually occurs often in real life, or if the left-most side instead is not generally observed in real life agency settings. On the right-most side of the above visualization (after the large convex parabolic-like shape), we see that the effect of increasing the “Relative Difference of Goals” tends to die down and the “Relative Difference of Performance” even begins to slightly decrease as the “Relative Difference of Goals” increases. This indicates that after a certain point, the effect on performance of increasing goal difficulty tends to diminish, and setting too high of goals can actually be slightly detrimental to performance. In comparison to the vast amount of literature indicating that the setting of challenging goals is important for increased performance, there are relatively fewer psychological and managerial studies which examine the potentially detrimental impacts of extreme goal-setting (Ivancevich, 1982; Sales, 1970; Fowles, 1982; Bandura & Locke, 2003). Even among those studies which investigate these detrimental effects, the findings often indicate only a slight negative impact, and some psychological studies suggest that this impact is only slight simply because those involved may choose to disregard overly ambitious goals in favor of more reasonable personal goals (Gellatly & Meyer, 1992). This effect may potentially be more fully studied in a corporate or agency setting (rather than in individual-level psychological studies), where management can make it difficult to set more realistic personal goals by enforcing the remembrance of extremely high agency goals. Our work helps to fill this literature gap, which has occurred due to the relatively fewer studies that measure detrimental impacts of unrealistic goal-setting. In particular, our findings indicate that when the difficulty of goals in New York City agencies are increased beyond a certain maximal point, performance can be negatively impacted.
Bandura A, Locke EA (2003) Negative self-efficacy and goal effects revisitied. J. of Applied Psychology 88(1):87-99.
Fowles DC (1982) Heart rate as an index of anxiety: Failure of a hypothesis. Cacioppo JT, Petty RE, eds. Perspectives in Cardiovascular Psychophysiology (Guilford Press, New York), 93-126.
Gellatly IR, Meyer JP (1992) The effects of goal difficulty on physiological arousal, cognition, and task performance. J. of Applied Psychology 77(5):694-704.
Ivancevich JM (1982) Subordinates’ reactions to performance appraisal interviews: A test of feedback and goal-setting techniques. J. of Applied Psychology 67(5):581-587.
Jung JH, Schneider C, Valacich J (2010) Enhancing the motivational affordance of information systems: The effects of real-time performance feedback and goal setting in group collaboration environments. Management Sci. 56(4):724-742.
Locke EA, Frederick E, Buckner E, Bobko P (1984) Effect of previously assigned goals on self-set goals and performance. J. of Applied Psychology 69(4):694-699.
Locke EA, Latham GP (1990) A Theory of Goal Setting and Task Performance (Prentice-Hall, Englewood Cliffs, NJ).
Locke EA, Latham GP (2002) Building a practically useful theory of goal setting and task motivation: A 35-year odyssey. American Psychologist 57(9):705-717.
Mayor’s Office of Operations (2020) FY20 MMR performance indicators. NYC Open Data Retrieved on November 15, 2020: https://data.cityofnewyork.us/City-Government/FY20-MMR-Performance-Indicators/qe3p-6ugh.
Mento AJ, Steel RP, Karren RJ (1987) A meta-analytic study of the effects of goal setting on task performance: 1966-1984. Organizational Behavior and Human Decision Processes 39(1):52-83.
O’Leary-Kelly AM, Martocchio JJ, Frink DD (1994) A review of the influence of group goals on group performance. Academy of Management J. 37(5):1285-1301.
Sales SM (1970) Some effects of role overload and role underload. Organizational Behavior and Human Performance 5(6):592-608.