(Multilingual Lab, Eastern KY University)
For any two sentences, you can derive from the participant-ratings a numerical measure M of much they differ in their difficulty.
We want users of the Corpus to be able to examine video/audio clips of pairs of speakers who:
From each pair, pick eight sentences and match them (sentence from native to sentence from foreign) so that the sum of the eight M’s is as small as possible.
With 40 speaker pairs and 12 sentences from each speaker, a brute-force approach would require the construction of about 395 billion matchings.
Not enough time!
“matching three workers with three jobs”
A small trick reduces the problem (for each speaker pair) to finding the optimal matching in a complete bipartite 8-8 graph.
There are
\[8! = 8 \times 7\times 6 \times 5 \times 4 \times 3 \times 2 = 40,320\]
possible matchings. That’s a lot …
… but the Hungarian Algorithm (Kuhn and Munkres, 1956) searches quickly and systematically for an optimal one.
Let’s look at a case with 5 workers and 5 jobs:
Another 5-worker, 5-job example, with a hitch:
I did all of my work
in the R programming language. (It helped that Kurt Hornik has a fine implementation of the Hungarian algorithm in his R-package clue.)
(1% mathematical modeling, 99% secretarial work)