key points:
Qualtrics data scrumb is insufficient
- 584 complete response
- 302 Qualtrics ‘good completes’
- within the ‘good completes’
- 30 definately abd
- 10 - 20 likely bad
Qualtrics seems fine with throughing out responses we deem bad, and
running additional people through
Refinement have been made to better identify good completes
- Graduation year
- State of residence
Pilot survey is a very helpful starting point
Detection model will be re-trained on ‘Paths’ responses
Proabaly there are false positive in the Qualtrics data scrumb,
mabey ~5-10% (haven’t looked closely as this - yet)
Pilot Survey Data Quality
This document provides an overview of Russ and Jamie’s observations
while scanning the pilot data.
There is, of course, no perfect way to identify a fraudulent
response. However, our goal was to create simple identifiers of poor
quality data. To do this, we. - Created variables that should be
diagnostic of a bad response. - Validated by checking the free text
responses for the identified individuals. - Identified 6 to 8 diagnostic
criteria that were validated in the free text.
Qualtrics has their own process for identifying poor quality data. We
are given the entire dataset including the respondents Qualtrics flagged
for our own learnings. In the following sections showing figures, a
value of 1 is a respondent that was marked as ‘good quality’ by
Qualtrics, and a value of 5 is a respondent who was flagged by Qualtrics
as ‘bad quality’ and was not included in the final dataset of 300 that
we received.
Overview:
When creating diagnostic variables, we focused on a few qualities of
the data.
- The length of free-text responses.
- Inconsistent or impossible combinations of responses.
- Location provided by latitude and longitude Some suspicious responses
we observed and can consider when flagging but cannot serve as
diagnostic of fraudulent data: C-suite jobs What can we do with these
learnings?
- We can provide our formulas/flagging criteria to Qualtrics who can use
these additional parameters to flag poor quality data. The outcome of
this is we can spend less time on data quality measures.
- We can still run pre-written scripts on the new waves of data to flag
any responses we don’t like early on and have Qualtrics re-field until
we are satisfied with our full sample
Types of bad respsonse:
- poor quality humans: too fast; inconsistent answers
to pairs of questions; weird text from external sources, speach to
text
- fraudulent
- mass responses from the same source: same lat-long or IP host &
similar text repsonses.
- machine generated responses: odd text; inconsistent answers to pairs
of questions.
Text respsonses received:.
- what could allstate have done to make you give a higher score and
increase your likelihood to recommend them?
- The only reason why you don’t have no money I have to understand The
only reason why you are Roy and she has The only reason why you are Roy
and she has wedding.
- Very good and nice thanks so much I appreciate you guys and I’ll
continue to recommend them to my family and friends Very good and nice
thanks so much I appreciate you guys and I’ll continue to recommend
them to my family and friends I feel very good and nice about my career
and i like them so much.
- University university of university university office university
university of university university office University university of
university university office university university of university
university office University university of university university office
university university of university.
- Good project for a few days now and then I’ll be home in a few days
now and then I’ll be home in a few days now and then I’ll
- Hi I hope you have a great day to be out of town for a few days now
and then I’ll be home in a bit I think I have a great day to be out
of
- Hey you have a great day to be out of town for a few days now and
then I’ll be home in a few days now and then I’ll be home.
Possible AI gen text:
jobtitle: ChatGPT”! As an AI developed by OpenAI, I’m here to
assist you with any questions or tasks you have.
The biggest challenge is the way to make the most out to you guys
as well and you will have to work.
The biggest influence on my initial career choice was my passion
for helping others and making a positive impact in people’s lives.
The biggest influence on my initial career choice were my
parents.
The biggest influence on my initial career would be the finacial
constraint,because i was training to get something going so that i can
start servicing my student loans.
The biggest influence was my peer friends’ status in securing
jobs early.
The biggest influence was my teacher, he provide information and
was a positive role model in choosing my career.
The biggest influence was myself; what I wanted to do and
believed I would be successful and advancement professionally into
management and financially.
The biggest influence was pressure from my family for
perfection.
the biggest influence was the help i got as a child from social
worker and counselors so that was what my goal was and i achieved
it.
The biggest influence would have been taking accounting classes
and finding it came easy to me and I actually really liked it. And it
helped that my professor was great at teaching the subject.
My biggest influence is my parents they guide me in all i
do.
My biggest influence is that information.
My biggest influence is the best way of making my work
good.
My biggest influence on my career is to make sure everyone in my
oganizatio get his right.
My biggest influence was my mother. She supported me each and
every step of the way. She truly had my back through everything and
always kept me encouraged.
My biggest influence was my parents well my mom because she
believed in me like no one’s ever before and she’s been with me all
these years and still is.
MY BIGGEST INFLUENCE WAS MYSELF, I DID NOT RELY ON ANYONE WHEN IT
CAME TO MAKING DECISIONS REGARDING WORK.
“My biggest influence were peers and friends that were driven and
motivated. It taught me to be like them.”
My biggest influence would be my friends motivated and encouraged
me to tried to be as successful as I could be.
RW generated AI text:
biggest_influence prompt: write a response to this
survey question: Overall, what would you say was the biggest influence
on your initial career choice?
- The biggest influence on my initial career choice was my personal
passion and interest in the field. This intrinsic motivation was
supported by encouraging mentors and practical experiences, like
internships and volunteer work, which solidified my decision.
and then 4 more…
- My initial career choice was primarily influenced by my strong
personal interest and passion for the field, bolstered by supportive
mentors and hands-on experiences such as internships and volunteer
work.
- The main influence on my initial career choice was my deep passion
for the field. Encouragement from mentors and practical experiences like
internships played a significant role in solidifying my decision.
- My career choice was largely driven by a personal passion for the
field, with additional support from mentors and practical experiences
such as internships and volunteer opportunities.
- A strong passion for the field was the biggest influence on my
initial career choice, supported by inspiring mentors and valuable
hands-on experiences through internships and volunteering.
turn_point_story: Prompt: write a response to a
survey questions: Please identify a particular episode in your story
that you now see as a turning point in your career. Please briefly
describe what happened, where and when, who was involved, and what you
were thinking and feeling. Also, please say a word or two about what you
think this event says about you as a person or about your life. If you
cannot identify a key turning point that stands out clearly, please
describe some event in your life wherein you went through an important
change of some kind.
- Original: A pivotal turning point in my career occurred during my
first major project as a team lead in the summer of 2019. I coordinated
a diverse team to develop a new software solution for a high-profile
client. My manager, Sarah, entrusted me with this responsibility and
provided ongoing support. Despite the pressure, we delivered the project
on time, exceeding client expectations. This experience taught me about
leadership, resilience, and trust in a team, boosting my confidence and
reflecting my growth professionally and personally.
- Version 1: In summer 2019, leading my first major project marked a
turning point in my career. I coordinated a team to deliver a software
solution for a high-profile client, with my manager Sarah’s support.
Despite the pressure, we succeeded, teaching me about leadership and
resilience, and boosting my confidence.
- Version 2: A key career moment was leading a major project in 2019.
With my manager Sarah’s support, I guided my team to deliver a critical
software solution. This challenge taught me valuable lessons in
leadership and teamwork, significantly boosting my confidence and
professional growth.
- Version 3: The summer of 2019 was a turning point when I led a
project for a major client. My manager, Sarah, supported me as I managed
the team. Successfully delivering the project under pressure taught me
about leadership and resilience, enhancing my confidence and
professional development.
- Version 4: Leading a high-profile project in 2019 was a
career-defining moment. My manager Sarah’s trust and support were
crucial as I coordinated our team. Successfully completing the project
taught me vital leadership and resilience lessons, marking significant
personal and professional growth.
Questions:
false positives (Qualtrics): Qualtrics data scrub labeled ~60% of
repsonses as bad. There are liekly ~5-10%
false positives (WPA): ?? relative importance FP v FN
copy-paster enable??
skip logic: Remove these??
- age: What is your current age? {contains skip logic}.
- education: What is the highest level of education you have
completed? {contains skip logic}

Determination of bad response:
Age @ college degree (age - graduate) > 19 &
< 35 (no data in pilot)
- age: What is your current age?
- graduate: In what year did you complete your Bachelor’s degree?
Advance degree at too young an age: (PhD OR MD) AND
age<28.
this flagged ~4% in the pilot.
Lat-Long: ex-US
A US resident could complete a survey while traveling ex-US. In pilot
all 32 ex-US respsonses were Bad.
- US_lat_low <- 24.5
- US_lat_high <- 49.5
- US_long_low <- -125
- US_long_high <- -66
need to fix to include AK & HA
ex-US response in pilot: - 25 Lithuania - 5 Nigreria - 2 other
Africa

Repeat Lat-Long.
espcially from remote places.
- 26 response from the same Lat-Long in Lithuania (free text is
bad).
- 21 response from the same Lat-Long in rural Kansas (free text is
short & similar).
- 5 response from the same Lat-Long in Nigeria (free text is
bad).
Work_exp - job_tenure: work experience beyond current
job
- work_exp: How many years of post-college work experience do you
have?
- job_tenure: For how many years have you been employed by your
current employer? Please provide the total number of years since you
first started working for this employer, even if you have been promoted
or changed jobs with this same employer since then.
fix work experience for users that entered calendar
year
mutate(work_exp_fix = case_when(work_exp>1000 ~ 2024-work_exp,
work_exp<1000 ~ work_exp) )
fix job tenure for users that entered calendar year
mutate(job_tenure_fix = case_when(job_tenure == 99 ~ 0, job_tenure >=
66 ~ NA, job_tenure < 66 ~ job_tenure) )
total work_exp must be >= job_tenure

Age-work_exp: age at first work experience
work_exp: How many years of post-college work experience do you
have?
Negative values are impossible
Minimum should be around 20 years
