Last 2 days @COLING 2020

I attended COLING 2020 two weeks ago and started blogging about day to day learnings. After 2 days, I gave up due to conference fatigue and some family situation. However, I decided to wrap up the conference related posts today with those few papers I checked in the last 2 days of the main event. Here they are:

Cross-lingual Transfer

Cross-lingual Transfer Learning for Grammatical Error Correction
There has been a lot of interest in cross-lingual transfer for various NLP tasks these days. This paper proposes it for something where its use is not obvious - grammatical error correction. They explored the task for 3 languages- English, Czech and Russian, by learning from different source languages with varying similarity to target language. Through a series of experiments, they show that it is possible to transfer grammatical knowledge between languages to some extent. I found this line of work very interesting, and clearly more needs to be done to draw stronger conclusions, so I will look forward for related work in future.

Detecting Urgency Status of Crisis Tweets: A Transfer Learning Approach for Low Resource Languages
This paper developed a new twitter dataset of crisis situation tweets annotated at 4 levels of urgency for English. In addition, they tested a classification approach for detecing this urgency status, and studied the cross lingual transfer of this approach to Sinhala and Odia. While I felt it was an interesting idea, I was kind of surprised/disappointed with what felt like sloppy writing for me - for example, I did not understand how they resolved the inter-annotator differences (and agreement in general seemed somewhat low) to get a final categorization. I did not understand why they did 4-way annotation if it was 2 way classification. It is not like I completely read the paper thoroughly, but I did not really understand how the annotations for Sinhala/Odia were made and it seemed like the percentage of “true” tweets (in a true/false binary classification) was too low to trust the classification accuracies. Finally, there is no mention of code mixing in the data or in the paper, and I find it strange, considering how many of these tweets coming from Indian handles use more than one language in a tweet. So, overall, its a big disappointment.

Key Phrase Extraction
SaSAKE: Syntax and Semantics Aware Keyphrase Extraction from Research Papers
This paper proposes an interesting approach to key phrase extraction by using a combination of transformer and graph encoder architectures and incorporating syntactic and semantic dependency information. They suggest that this approach can identify the boundaries of multi-word items effectively and show SOTA results on three datasets. While I found this approach very interesting, I have the same complaint about KPE as I always had - why are the numbers on these standard datasets so low (F scores in 40s typically) despite so much of advancement in terms of algorithms? Is there a way to do some form of extrinsic evaluation for KPE to understand whether the problem lies with these datasets or the approaches themselves? (I feel the problem is with the datasets and the way they are annotated, but I don’t have evidence yet).

Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning
This paper proposes an approach for key phrase identification and classification (i.e., classifying the key phrase after extraction/identification) through “intermediate task transfer learning” i.e., transfer learning with training on related auxiliary tasks. I haven’t read anything related to key phrase classification before (although it completely makes sense), so I found the idea interesting. Considering that these categories associated with key phrases are not constant and keep changing with the dataset, I am curious to know what a “generic” solution of the future would look like.

Text Classification

Go Simple and Pre-Train on Domain-Specific Corpora: On the Role of Training Data for Text Classification
This paper examines whether pre-trained language models work well for classification tasks with limited labeled data, across genres (social media, news, reviews). They conclude that simpler approaches (fastText+domain specific word embeddings) may work better in such limited data scenarios than models such as BERT. They also conclude that pre-training on domain specific (unlabeled) corpora also improves performance significantly for this task. I only followed the poster and it felt like a sensible conclusion to me. It has also been my personal experience with other text classification tasks, although I did not do any thorough analysis with multiple datasets.

Using Eye-tracking Data to Predict the Readability of Brazilian Portuguese Sentences in Single-task, Multi-task and Sequential Transfer Learning Approaches
This paper presents an approach for sentence level readability prediction for Brazilian Portuguese in different model settings. They also compare these models using data collected from an eye-tracking experiment, by validating whether the linguistic features used in their models can effectively predict the eye-tracking measures. I think I have to get back to this paper at some point of time later, but it seems like this approach resulted in substantial improvements in sentence readability task for Brazilian Portuguese and all the source code is made publicly available.

Native-like Expression Identification by Contrasting Native and Proficient Second Language Speakers
It is sometimes challenging to understand some expressions of the language, based on your language and cultural background. This paper tackles a part of that problem by proposing to develop a “native like” expression identifier using BiLSTM classifier. While I find the task itself interesting, I felt the dataset creation and the assumption of “native like” somewhat troublesome to understand/accept. I felt what they describe as “native” can have just as well been culture specific expressions too (e.g., an Indian English expression may not be familiar to an American English speaker too). Also, it was not clear what is “native” as the authors put US, UK etc into one single group, whereas we all know some expressions used in one country are totally unfamiliar in another, although all these people are technically native speakers. So, at this point, it seems like an interesting problem to solve, but with no suitable and big enough data for NLP to do it.

Industry Track
Scalable Cross-lingual Treebank Synthesis for Improved Production Dependency Parsers
This paper from IBM Research proposes data augmentation techniques which use synthetic treebanks to improve production grade dependency parsing approaches. Their proposed parser shows improvements in parsing for 7 languages. I found this interesting, but considering that it is industry track, I would have loved to know how fast is their “fast” parser in comparison with other approaches, where did they deploy this (whether it is deployed), and stuff like that.

Surveys
Explainable Automated Fact-Checking: A Survey
There is a lot of interest in recent years on automatic fact checking and fake news detection. This paper presents a survey of advances made in automated fact checking, some common datasets used for this purpose, and outlines some of the challenges/open problems ahead. I only cursorily followed but it seemed like a neat survey of this topic. So, I would definitely check it out in detail soon!

Amongst the papers I found interesting, I noticed that there are 4 papers related to grammatical error detection/correction, and 3 each related to key phrase extraction and text simplification. Other topics seem somewhat scattered. Overall, I felt it was an interesting conference, and I really think industry track should exist in all conferences, with an emphasis on talking about deployment related issues in that track!

Written on December 10, 2020