notes about this draft

how I did 100k unstructured interviews

all data is private from the first message participants can manually “donate their data to science” participants were all recruited and inverviewed via Twitter.

obviously I can’t personally interview 100k people, that would take something like 34 years of never sleeping, always interviewing so first I built an algorithm which is able to learn to communicate with people.

but even though the algorithm could learn, it didn’t actually know how to communicate. so we spent 8 months teaching it to communicate, through its own self-learning routine.

unstructured interviews are not directed by the researcher. lines of questioning are freely traded between algorithm and participant, as they get to know one another. people choose to talk about all kinds of things - some about their family, or their day. these v the algorithm already knows some things about itself, through hand-coded “concepts” about “itself”. additionally, it continually learns more about what other people say about itself.

the algorithm keeps track of what you talked about and when, and is able to bring it back into conversation much later, ask for clarification, or wonder whether it’s still true.

the algorithm links statements to meaning through interactive conversation, and its extensive memory of its previous conversations.

if the algorithm doesn’t understand, it can use its turn in the conversation to collect new information. the algorithm learns almost exclusively by asking clarifying questions, always attempting to lower its uncertainty in interpretation. this is accomplished through a few speech acts which the algorithm employs the algorithm might attempt to restate what the person said, and ask if that’s what they mean. this links their original, in situ speech act to the speech act presented by the algorithm in restating, which typically is paraphrased from another person’s conversation with the algorithm. the algorithm will also extrapolate, attempting to deduce facts about the person it has not told the algorithm. these deductions can be as simple as converting from Fahrenheit to Centigrade, or as complex as understanding that others who believe X and Y almost also believe Z. these extrapolations are many times incorrect, and if the participant says so, the algorithm takes this as an opportunity to learn. these “sanity checks” are constantly updating the assumptions the algorithm is making, and thus the dataset I am collecting.

each speech act and composite speech act is represented by a class in Python.

grounded approach

we never ask a question, we just let them talk.

a substantial number of these facts are about other participants, nearly 20%.

what the data looks like

Participants told us a little about