Teaching Notes - Teaching about encoding language on computers

I teach a 100 level course called Language and Computers to a class of undergrads at all stages in their degree programs, and coming from diverse backgrounds. About the half the class are from CS and related disciplines, but there are also students from Journalism and Mass Communication, Physics, Biology, Chemical Engineering, and even Advertising, to name a few. This (and a few more future posts) are mostly my notes on teaching them (and it is my first time with this course), and on how to change for the next iteration, whenever that happens.

I largely rely on the following book, with a few added topics and a few deleted ones.

Language and Computers Authors: Marcus Dickinson, Chris Brew and Detmar Meurers

The first chapter is about encoding language on Computers. The book very neatly and concisely introduces students about the question - “what does a computer see where you see a piece of text or an audio snippet of human speech?”. It answers this question by walking us through what it means to encode text, and slowly introduces us to Unicode, and then has a brief discussion on Speech signals and analyzing spectrograms. The chapter ends with a discussion on the relation between written and spoken language, talking about Automatic Speech Recognition (ASR) and Text to Speech (TTS) synthesis.

I followed more or less the same flow while teaching. My major disappointment at the end of the chapter, and after spending about a week on teaching the topic was about not talking about “inputting language” on computer. A lot of time was spent on discussing how all those writing systems in the world are different. But very little was then spent on the question “How exactly can we type these languages on the computer?”. Clearly, it is not related to “encoding” a language, but I thought it is an obvious question. I then realized it is not so obvious to a class of young students who perhaps never had to bother about typing in another script. Had I taught this course in India, this would have been a more obvious concern. However, there was one student who asked how does the browser know when it sees languages that are read from right to left (e.g., Arabic). I think that is also an important point to discuss in future.

Further, considering that the next Chapter is actually on writers’ aids, this topic of text input methods for different languages would have been a nice transition topic - that is what I tried to do. However, I need to teach the course a second time to bring more structure into that transition. I kept wavering about how to do this, and in the end, it felt like I did in a hurry. I also thought the speech part needed a bit more of space in the chapter. I did talk about speech signals, but again thought it was rather rushed compared to the text discussion, just like in the book.

After teaching this topic, here are my thoughts on how to modify this next time (and perhaps spending an extra class):

  • Introduce different applications in which language and computers interact (I showed Google Home and IBM Watson demo videos)
  • Talk about how to encode speech on computers (with more examples)
  • Talk about the right-to-left versus left-to-right rendering
  • Introduce textual data by talking about ASR and TTS
  • Talk about how to encode written text on computers
  • Talk about how to input written text on computers - that will be a good point to start Topic 2. (I am still undecided about how to bring in ASR and TTS though, if I decide to go this route next time)

I should see how my thoughts on teaching this topic evolve over time!

Written on September 3, 2017