Data Driven Optimization of Spanish Language Learning Curve.

There is a sequence in the “Matrix” movie when the main protagonist Neo got plugged to a computer through brain interface and learns new skills in a matter of seconds. In the absence of direct access to the brain we usually have to work with more conventional ways of information transfer, but this does not mean that we cannot improve the speed of learning a new skill.

When starting with we wanted to figure out how to speed up the learning process. Our goal was first, to measure how fast people can learn and then try to improve teaching techniques to speed it up. I am a former physicist and my biases are obviously in the realm of the exact sciences, but after a few tries we figured out that it is much easier to measure knowledge of something less abstract. Learning new language ( in our case Spanish) came up as a main candidate.

The first challenge was to come up with a good metric to describe human knowledge of a new language. Consider that English is your native language and you are trying to learn Spanish. What is the easiest thing to learn? Common sense suggests that guessing the English translation for a Spanish word might be the easiest, especially if there is a suitable context and there is a multiple choice between several answers. Lets say you were told that the Spanish word “blanco” is a color. Even without knowing the exact answer you might have guessed that “blanco” sounds very much like “blank”, and something devoid of any color would be “white”. Down the road, when more complicated constructs are introduced, you can start guessing the translation from the context of the sentence and from the words you already know.

It does not work as well when you are trying to translate from native language to foreign. If someone asks you what would be the Spanish word for “white”, unless you know the answer, it is impossible to guess the correct translation. It is clear from this example that knowledge of a foreign language could be divided into parts, some more difficult than the others. Common sense and experience tell us that the most difficult tasks are talking and writing. Everyone met a foreigner ( I was one ) who understands a lot, but have difficulty expressing themselves or write. If we want to measure the knowledge of a foreign language we need to come up with a metric which reflects this difference in difficulty.

After some experimentation we have put together a very comprehensive metric which quantifies every aspect of language learning and takes into account several idiosyncrasies shown by learners:

  • Translating from a native to a foreign language and back are two different skills.
  • From easiest to most difficult, language skills usually follow this order: reading, listening, writing, speaking. Some people might find listening easier than reading and speaking easier than writing. Personal learning preferences like these, is where approach of individually optimized learning curve shines the most.
  • All the possible grammar rules could be easily quantified as well. Using the right articles with genders, the right endings for plurals, the right tense for verbs, etc.

We provided our content in short bite-sized exercises similar to Duolingo and managed to disassemble every exercise into a set of, what we called “bits of knowledge” or BoKs. We created a comprehensive database of all possible BoKs about the language and constantly updated the snapshot of it for each individual user.

Once we successfully quantified what initially looked like very unstructured data, it became easy to utilize any quantitative instrument from a researcher’s tool box. We started adjusting the content to maximize the amount of BoKs user would retain by the end of the session. We had a good heuristic about how to structure the course by looking at Rosetta Stone, and similar sites as a guidance for our starting point. We decided to go with a language we do not know ourselves, this way we would be our first test subjects.

Once we started optimizing our learning curve we “rediscovered” several obvious things any good teacher knows already:

It is good to start from something very familiar.

In the case of Spanish language we all had exposure to Spanish words in popular culture from casual “Adios amigos” to classic “CasaBlanca” to more recent “Mar-a-Lago”. These are the fastest hooks to bootstrap the rest of the language. In case you do not know it, lets do an experiment. The name “Mar-a-Lago” is Spanish for “Sea-to-Lake,” referring to the fact that the resort extends from the Atlantic Ocean to what was known as Lake Worth. We will see if this helps a bit later.

The more people get engaged the faster they learn.

It boils down to the observation that if material is too easy, people get bored and distracted, and if material is too difficult, people get frustrated and quickly give up. There is a narrow range of difficulty where people get engaged. Good game designers know this really well when they throw carefully calibrated hordes of monsters in front of the player, so that people feel challenged but not overwhelmed. In our case difficulty is very easy to balance by adjusting the material so that the ratio of correct answers is around 80%-90%.

Changing context frequently does improve memorization. 

Repetition is needed as long as it does not become a drill. Challenging the brain to recognize words in different contexts not only makes learning more entertaining it leads to faster memorization and longer retention. The timing of repetition is also important. Show the same word too soon and risk making it boring. Make a pause too long and some people will forget it. It took us a few optimization cycles to figure out the right rhythm of repetition and  we are sure that it could be improved further.

In our experiments with Bazilinga we used about 90-120 min worth of material with about 250 simple challenges. It was short enough to do it in one sitting, but long enough so people would start forgetting the material they already learned by the middle of the session. We deliberately kept interface minimalistic, limiting content to simple text based challenges. It reduced number of variables we had to deal with without loss of research value. We also added a feedback mechanism so users could bet on the correctness of their answer. This allowed us to measure user level of confidence and feed it back to the optimization scheme.

By using the learning data from just a few hundred people we were able to increase the amount of memorized material by more than 150% and it looks like we might be able to double that using a bigger data set. Individualizing the learning material  is the most obvious area for further growth.  Do not forget we did it all with very simplistic text based interface. Adding graphical and video content would provide another avenue for improvement.

The feedback from users was also very encouraging. From “dull” and “boring” it graduated to “fun” and “engaging”. It seems that by optimizing for speed we inadvertently made learning process more interesting. We also started getting comments like: “It feels like IQ test…”, “I had to think all the time..”, which is exactly what we were aiming for. We did not try to find the easiest way to learn the language, we looked for the fastest.

Now, do you still remember what “Mar-a-Lago” means?





Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s