key points:

Qualtrics data scrumb is insufficient

  • 584 complete response
  • 302 Qualtrics ‘good completes’
  • within the ‘good completes’
    • 30 definately abd
    • 10 - 20 likely bad

Qualtrics seems fine with throughing out responses we deem bad, and running additional people through

Refinement have been made to better identify good completes

  • Graduation year
  • State of residence

Pilot survey is a very helpful starting point

Detection model will be re-trained on ‘Paths’ responses

Proabaly there are false positive in the Qualtrics data scrumb, mabey ~5-10% (haven’t looked closely as this - yet)



Pilot Survey Data Quality

This document provides an overview of Russ and Jamie’s observations while scanning the pilot data.

There is, of course, no perfect way to identify a fraudulent response. However, our goal was to create simple identifiers of poor quality data. To do this, we. - Created variables that should be diagnostic of a bad response. - Validated by checking the free text responses for the identified individuals. - Identified 6 to 8 diagnostic criteria that were validated in the free text.

Qualtrics has their own process for identifying poor quality data. We are given the entire dataset including the respondents Qualtrics flagged for our own learnings. In the following sections showing figures, a value of 1 is a respondent that was marked as ‘good quality’ by Qualtrics, and a value of 5 is a respondent who was flagged by Qualtrics as ‘bad quality’ and was not included in the final dataset of 300 that we received.

Overview:

When creating diagnostic variables, we focused on a few qualities of the data.
- The length of free-text responses.
- Inconsistent or impossible combinations of responses.
- Location provided by latitude and longitude Some suspicious responses we observed and can consider when flagging but cannot serve as diagnostic of fraudulent data: C-suite jobs What can we do with these learnings?
- We can provide our formulas/flagging criteria to Qualtrics who can use these additional parameters to flag poor quality data. The outcome of this is we can spend less time on data quality measures.
- We can still run pre-written scripts on the new waves of data to flag any responses we don’t like early on and have Qualtrics re-field until we are satisfied with our full sample


Types of bad respsonse:

  • poor quality humans: too fast; inconsistent answers to pairs of questions; weird text from external sources, speach to text
  • fraudulent
    • mass responses from the same source: same lat-long or IP host & similar text repsonses.
    • machine generated responses: odd text; inconsistent answers to pairs of questions.

Text respsonses received:.

  • what could allstate have done to make you give a higher score and increase your likelihood to recommend them?
  • The only reason why you don’t have no money I have to understand The only reason why you are Roy and she has The only reason why you are Roy and she has wedding.
  • Very good and nice thanks so much I appreciate you guys and I‚Äôll continue to recommend them to my family and friends Very good and nice thanks so much I appreciate you guys and I‚Äôll continue to recommend them to my family and friends I feel very good and nice about my career and i like them so much.
  • University university of university university office university university of university university office University university of university university office university university of university university office University university of university university office university university of university.
  • Good project for a few days now and then I’ll be home in a few days now and then I’ll be home in a few days now and then I’ll
  • Hi I hope you have a great day to be out of town for a few days now and then I’ll be home in a bit I think I have a great day to be out of
  • Hey you have a great day to be out of town for a few days now and then I’ll be home in a few days now and then I’ll be home.

Possible AI gen text:

  • jobtitle: ChatGPT”! As an AI developed by OpenAI, I’m here to assist you with any questions or tasks you have.

  • The biggest challenge is the way to make the most out to you guys as well and you will have to work.

  • The biggest influence on my initial career choice was my passion for helping others and making a positive impact in people’s lives.

  • The biggest influence on my initial career choice were my parents.

  • The biggest influence on my initial career would be the finacial constraint,because i was training to get something going so that i can start servicing my student loans.

  • The biggest influence was my peer friends’ status in securing jobs early.

  • The biggest influence was my teacher, he provide information and was a positive role model in choosing my career.

  • The biggest influence was myself; what I wanted to do and believed I would be successful and advancement professionally into management and financially.

  • The biggest influence was pressure from my family for perfection.

  • the biggest influence was the help i got as a child from social worker and counselors so that was what my goal was and i achieved it.

  • The biggest influence would have been taking accounting classes and finding it came easy to me and I actually really liked it. And it helped that my professor was great at teaching the subject.

  • My biggest influence is my parents they guide me in all i do.

  • My biggest influence is that information.

  • My biggest influence is the best way of making my work good.

  • My biggest influence on my career is to make sure everyone in my oganizatio get his right.

  • My biggest influence was my mother. She supported me each and every step of the way. She truly had my back through everything and always kept me encouraged.

  • My biggest influence was my parents well my mom because she believed in me like no one‚Äôs ever before and she‚Äôs been with me all these years and still is.

  • MY BIGGEST INFLUENCE WAS MYSELF, I DID NOT RELY ON ANYONE WHEN IT CAME TO MAKING DECISIONS REGARDING WORK.

  • “My biggest influence were peers and friends that were driven and motivated. It taught me to be like them.”

  • My biggest influence would be my friends motivated and encouraged me to tried to be as successful as I could be.

RW generated AI text:

biggest_influence prompt: write a response to this survey question: Overall, what would you say was the biggest influence on your initial career choice?

  • The biggest influence on my initial career choice was my personal passion and interest in the field. This intrinsic motivation was supported by encouraging mentors and practical experiences, like internships and volunteer work, which solidified my decision.

and then 4 more…

  • My initial career choice was primarily influenced by my strong personal interest and passion for the field, bolstered by supportive mentors and hands-on experiences such as internships and volunteer work.
  • The main influence on my initial career choice was my deep passion for the field. Encouragement from mentors and practical experiences like internships played a significant role in solidifying my decision.
  • My career choice was largely driven by a personal passion for the field, with additional support from mentors and practical experiences such as internships and volunteer opportunities.
  • A strong passion for the field was the biggest influence on my initial career choice, supported by inspiring mentors and valuable hands-on experiences through internships and volunteering.

turn_point_story: Prompt: write a response to a survey questions: Please identify a particular episode in your story that you now see as a turning point in your career. Please briefly describe what happened, where and when, who was involved, and what you were thinking and feeling. Also, please say a word or two about what you think this event says about you as a person or about your life. If you cannot identify a key turning point that stands out clearly, please describe some event in your life wherein you went through an important change of some kind.

  • Original: A pivotal turning point in my career occurred during my first major project as a team lead in the summer of 2019. I coordinated a diverse team to develop a new software solution for a high-profile client. My manager, Sarah, entrusted me with this responsibility and provided ongoing support. Despite the pressure, we delivered the project on time, exceeding client expectations. This experience taught me about leadership, resilience, and trust in a team, boosting my confidence and reflecting my growth professionally and personally.
  • Version 1: In summer 2019, leading my first major project marked a turning point in my career. I coordinated a team to deliver a software solution for a high-profile client, with my manager Sarah’s support. Despite the pressure, we succeeded, teaching me about leadership and resilience, and boosting my confidence.
  • Version 2: A key career moment was leading a major project in 2019. With my manager Sarah’s support, I guided my team to deliver a critical software solution. This challenge taught me valuable lessons in leadership and teamwork, significantly boosting my confidence and professional growth.
  • Version 3: The summer of 2019 was a turning point when I led a project for a major client. My manager, Sarah, supported me as I managed the team. Successfully delivering the project under pressure taught me about leadership and resilience, enhancing my confidence and professional development.
  • Version 4: Leading a high-profile project in 2019 was a career-defining moment. My manager Sarah’s trust and support were crucial as I coordinated our team. Successfully completing the project taught me vital leadership and resilience lessons, marking significant personal and professional growth.

Questions:

  • false positives (Qualtrics): Qualtrics data scrub labeled ~60% of repsonses as bad. There are liekly ~5-10%

  • false positives (WPA): ?? relative importance FP v FN

  • copy-paster enable??

  • skip logic: Remove these??

    • age: What is your current age? {contains skip logic}.
    • education: What is the highest level of education you have completed? {contains skip logic}


Determination of bad response:

Age @ college degree (age - graduate) > 19 & < 35 (no data in pilot)

  • age: What is your current age?
  • graduate: In what year did you complete your Bachelor’s degree?

Advance degree at too young an age: (PhD OR MD) AND age<28.

this flagged ~4% in the pilot.


Lat-Long: ex-US

A US resident could complete a survey while traveling ex-US. In pilot all 32 ex-US respsonses were Bad.

  • US_lat_low <- 24.5
  • US_lat_high <- 49.5
  • US_long_low <- -125
  • US_long_high <- -66

need to fix to include AK & HA

ex-US response in pilot: - 25 Lithuania - 5 Nigreria - 2 other Africa


Repeat Lat-Long.

espcially from remote places.

  • 26 response from the same Lat-Long in Lithuania (free text is bad).
  • 21 response from the same Lat-Long in rural Kansas (free text is short & similar).
  • 5 response from the same Lat-Long in Nigeria (free text is bad).

Work_exp - job_tenure: work experience beyond current job

  • work_exp: How many years of post-college work experience do you have?
  • job_tenure: For how many years have you been employed by your current employer? Please provide the total number of years since you first started working for this employer, even if you have been promoted or changed jobs with this same employer since then.

fix work experience for users that entered calendar year
mutate(work_exp_fix = case_when(work_exp>1000 ~ 2024-work_exp, work_exp<1000 ~ work_exp) )
fix job tenure for users that entered calendar year
mutate(job_tenure_fix = case_when(job_tenure == 99 ~ 0, job_tenure >= 66 ~ NA, job_tenure < 66 ~ job_tenure) )

total work_exp must be >= job_tenure


Age-work_exp: age at first work experience

work_exp: How many years of post-college work experience do you have?

Negative values are impossible
Minimum should be around 20 years


Paired questions that should be inversely correlated:

Career decision control:

Initial_cont_1: I had very little control over the forces that influenced my career decisions.
initial_cont_3: I had control over the decisions I made about my career.

Initial_cont_1 & Initial_cont_2 are highly correlated.
Initial_cont_1 & initial_cont_3 are inversely correlated. except for bad responses - however this could easily be mis-reading of the scale.

Red & orange boxes show combinations that should not have good responses.


Career decision control:

intent_to_quit_2 I do not intend to leave my occupation. regret_1 I often think about switching occupational paths.

inversely correlated. except for bad responses - however this could easily be mis-reading of the scale.

jobtitle “What is your current job title? If you currently have multiple jobs / roles, please choose the one that is most significant to your career. Please also focus on this job when answering any questions about your current job / line of work. If you are not currently working, please list your most recent job title.”


Free text character count (only for paths block):

turn_point_story_nchar = nchar(turn_point_story, keepNA=FALSE), turn_point_prior_nchar = nchar(turn_point_prior, keepNA=FALSE), turn_point_after_nchar = nchar(turn_point_after, keepNA=FALSE), total_char = turn_point_story_nchar + turn_point_prior_nchar + turn_point_after_nchar)