After more than a year of the covid-19 pandemic, millions of people are searching for employment in the United States. AI-powered interview software claims to help employers sift through applications to find the best people for the job. Companies specializing in this technology reported a surge in business during the pandemic.
But as the demand for these technologies increases, so do questions about their accuracy and reliability. In the latest episode of MIT Technology Review’s podcast “In Machines We Trust,” we tested software from two firms specializing in AI job interviews, MyInterview and Curious Thing. And we found variations in the predictions and job-matching scores that raise concerns about what exactly these algorithms are evaluating.
Getting to know you
MyInterview measures traits considered in the Big Five Personality Test, a psychometric evaluation often used in the hiring process. These traits include openness, conscientiousness, extroversion, agreeableness, and emotional stability. Curious Thing also measures personality-related traits, but instead of the Big Five, candidates are evaluated on other metrics, like humility and resilience.
The algorithms analyze candidates’ responses to determine personality traits. MyInterview also compiles scores indicating how closely a candidate matches the characteristics identified by hiring managers as ideal for the position.
To complete our tests, we first set up the software. We uploaded a fake job posting for an office administrator/researcher on both MyInterview and Curious Thing. Then we constructed our ideal candidate by choosing personality-related traits when prompted by the system.
On MyInterview, we selected characteristics like attention to detail and ranked them by level of importance. We also selected interview questions, which are displayed on the screen while the candidate records video responses. On Curious Thing, we selected characteristics like humility, adaptability, and resilience.
One of us, Hilke, then applied for the position and completed interviews for the role on both MyInterview and Curious Thing.
Our candidate completed a phone interview with Curious Thing. She first did a regular job interview and received a 8.5 out of 9 for English competency. In a second try, the automated interviewer asked the same questions, and she responded to each by reading the Wikipedia entry for psychometrics in German.
Yet Curious Thing awarded her a 6 out of 9 for English competency. She completed the interview again and received the same score.
Our candidate turned to MyInterview and repeated the experiment. She read the same Wikipedia entry aloud in German. The algorithm not only returned a personality assessment, but it also predicted our candidate to be a 73% match for the fake job, putting her in the top half of all the applicants we had asked to apply.