Quick Notes - Day 12
All previous posts in this series here
theme: question generation
Evaluation methodologies in Automatic Question Generation 2013-2018
Authors: Jacopo Amidei, Paul Piwek, Alistair Willis
Published at: 11th International Conference on Natural Language Generation url
This paper surveyed the evaluation methodologies followed in different forms of automatic question generation approaches over a 5 year period, and they concluded that there are no standardized evaluation approaches being followed. This prevents the comparison of different systems, and hence, the paper calls for a common framework for testing the performance of question generation systems.
While I generally agree with the idea that there should be some standard way of evaluating question generation in general, I don’t think it is as straight forward as these authors say as these systems serve different purposes (e.g., educational, clarification, searching etc. listed in page 2), and there are different forms of questions (yes/no, fill in the blank, multiple choice, short answers etc). So, a more practical idea would be to have some standard guidelines that should be operationalized differently for different question forms and application scenarios. Some food for thought!
Recent advances in Neural Question Generation
Authors: Liangming Pan, Wenqiang Lei, Tat-Seng Chua, Min-Yen Kan
Published at: ArXiv (may not be peer reviewed)url
There has been a lot of work on using neural network methods for automatic question generation in the past 3,4 years. This paper attempts to survey this research, discuss general methods followed, corpora used, emerging trends etc, and points to two potential future directions.
The paper did a good job of presenting a lot of diverse recent work, and is definitely useful to a reader. However, two things I found lacking are:
- some discussion on evaluation methods and lack of a standard, replicable procedure for the same (or at least cite the previous paper?)
- a call for some form of generalizability analysis of these approaches i.e., how good do they do when the question generation model is given some outside data (not training/testing data).
I think these are both essential questions to ask on the topic.
Looks like there is still a lot to do in this area!