Attending the 4th Advanced Seminar on National K-12 English Teaching Powered by InfoTech to give keynote presentation (1,500 in the audience; 40,000 watching online), I got to know Zhu Qifei, the founder and CEO of the host company, Arivoc China. In our conversations, I grilled him about his business model and his principal product: Kouyu100, a desktop and mobile language learning tutor technology. Zhu started his company five years ago and now has 200 employees and 40 million paid subscribers, each paying 200-yuan (US$29) a year. The company is in a heavy growth stage, operating at the moment in 126 cities but set to continue to expand throughout the country. The company currently has 2,000 commission- based promoters who sign up new members. Zhu comes from an academic family; it turns out he lived for seven years at Beijing Normal University where I taught for three years and where his father was a nuclear physicist. Zhu studied computing and completed his PhD at the University of California in the area of computer-based speech recognition. He then headed up a DARPA project at Berkeley on speech recognition of telephone messages. Basically the system could listen to 200 million telephone conversations at a time and identify key words and phrases to be flagged and followed up on by law enforcement officials. (I’m not sure if he was supposed to kill me after telling me this.) He said that in part his interest in creating the language learning app/program was to put his knowledge about speech recognition to a more benign use. He was also motivated by what he saw as China’s need to develop new approaches for language teaching and learning. The program is a language tutor and is heavily customizable by teachers and students using it. In many ways, it resembles other language learning programs with speech recognition; I was curious to know how it made use of AI technology. At the first dinner, I asked to see how the AI worked. He took out his phone and showed me a photo he’d taken on the train on the way to the conference. He explained that even though there was only a small corner of the train window, the program was able to identify the photo with the label, “A photo of a field taken out of a train window.” Still skeptical, I asked him to take a photo of me and see what it said.
This was the result. He took the photo within the application and, after about three seconds, the rest of the information appeared. So this is the visual part of the AI program working with millions of images and algorithms to make sense of what is photographed. You can see that it offers two options: A man sitting at a table with a plate of food. and A man sitting at a table in front of a plate of food. In this case, both are correct, but you can choose which one you prefer. In others, there may be different points of interest identified in the photo that are the focus so the options might be different. As soon as the statements are ready, you can click on them to hear them read out in Siri-quality voices. It then listens to you repeat the sentence you have chosen and identifies any pronunciation errors. Zhu purposely mispronounced food and the program displayed the sentence showing that he’d said all the words correctly, except the word food, which was indicated in red.
The program is not perfect but, as it is AI-enabled, it learns. It does this by soliciting corrections from users, which it then adds to its database. To illustrate this, Zhu showed a photo he had taken of the chairs in the conference venue. It mistakenly came up with, “A room filled with wooden benches.” He corrected wooden benches to chairs and, moments later, someone using the program to take a photo of a similar auditorium elsewhere in China would get the right answer. I would imagine that 40 million users could will help the AI program learn extremely quickly. Zhu is presenting at TESOL in Chicago in 2018. I’ll certainly be in the audience!