School of Science and Technology 科技學院
Computing Programmes 電腦學系

AI-Powered Question Generator

Jason Chun Wai KWOK

  
ProgrammeBachelor of Computing with Honours in Internet Technology
SupervisorDr. Keith Lee
AreasIntelligent Applications
Year of Completion2020
Award2nd Runner Up, IEEE (HK) Computational Intelligence Chapter 17th FYP Competition

Objectives

The aim of this project is to develop a system that can automatically generate questions from text using AI technology. This system can generate wh questions and grammar questions, and it is mainly designed for educational purposes. For example, helping teachers to generate quiz or assignment questions for their students and helping parents to generate practice questions for their children. It can also be used to generate questions for building a reading comprehension dataset, this kind of dataset is widely used in the studies of Natural Language Processing (NLP).

The proposed automated question generation system's objectives are fourfold:

  • Develop a subsystem that can generate wh-questions
  • Develop a subsystem that can generate grammar questions
  • Integrate the subsystems into a web backend
  • Build an easy to use web user interface

Video Demonstration

Background and Methodology

The key technology of the system is the Transformer encoder-decoder model, T5. It is the major component used for generating wh questions. The T5 is also using a pre-training and fine-tuning pattern just like many other Transformer-based models, therefore it can benefit from the transfer learning.

Figure 1: T5 framework showing the input and output in machine translation, classification, and text summarization tasks

As mentioned before, question answering is the sibling task of question generation, therefore the dataset for question answering can also be used for training question generation model. In our approach, we used the SQuAD 2.0 dataset (Rajpurkar et al. 2018) to fine-tune T5. SQuAD 2.0 is a reading comprehension dataset consisting of questions, answers, and articles, it is also a benchmark dataset for question answering.

Another key technology of this project is React, it is one of the most popular Javascript libraries for building user interfaces, we used it to build our single page application (SPA) user interface.

Other technologies used in this project include part-of-speech(POS) tagging and named-entity recognition(NER). These are the language processing tools that help to extract linguistic features from the text.

System Architecture

In this project, the major component is a Transformer-based model called T5: Text-To-Text Transfer Transformer, it is a Transformer encoder-decoder model similar to the original Transformer architecture. Although there are some Transformer-based approaches, no one had ever used T5 for question generation. Combined with other language processing components such as part-of-speech(POS) tagging and named-entity recognition(NER) tagging, the system is able to generate wh questions and multiple-choice grammar questions. The system user can control the generated questions by selecting the answers of the questions. The whole system will be put on a web server, and become a web application that is accessible over the browser. The user interface is a single page application written in React, there are some additional functions added in the user interface.

Figure 2: The overall design of the system

Figure 3: A higher-level overview of the system

System Design and Implementation

To set up the prototype, it needs to host the web server. It is a Python web server, therefore the host needs to install Python into the computer and install all the required libraries as well. If there is already a web server running, other users can just use a browser and enter the server URL to use the web application.

Figure 4: The user interface

The navigation bar shows 3 steps of the system: input text, select keywords, and results, they are clickable to switch between pages. Because it is a single page application, there will be no full page loading on user interaction. After inputting a text, click on the “Proceed” button to the next step.

Figure 5: The user interface of the second step: select keywords

Then the user receives the auto-tagged text based on the NER tagging. These keywords will be the answers for generating questions. The user can control what questions to be generated by adding or deleting keywords. By clicking the red cross button right next to the keywords to delete it, or click the “Remove all” button to remove all keywords. To add a new keyword, the user can highlight any words, then a green add button will appear next to the highlighted words. After that, click on the “Proceed” button to start generating questions.

Figure 6: The result page showing the generated wh-questions

Figure 7: When the user clicks on the “Source” button near the question to see the source of the question, and the answer of that question is highlighted

Figure 8: The user clicks on any question and edits it so that the user does not need to copy the question to other places to edit. The user can also click on the “Delete” button to delete the whole question, the question number will be adjusted automatically.

Figure 9: The user can click on the “Show Answer” checkbox to show or hide all the answers. And click on the “Copy to Clipboard” button can copy all the questions to the user's clipboard.

Figure 10: The result page showing the generated grammar questions.

The system will generate one question for each input sentence. The “Shuffle” button is used to shuffle the order of the choices. Similar to the wh questions page, the user can choose to show or hide answers, and copy the questions to their clipboard.

Evaluation

Figure 11: Result of the survey

Figure 12: Result of the survey

Table 1. The result of the response time test

The result shows that most of the respondents are satisfied with our system in both usefulness, usability, and look-and-feel. There are also some opinions related to the auto-selected keywords, the loading time, and the quality of the questions, it indicates that there is still room for improvement. We also conducted a question type test and response time test, these tests show some limitations and weaknesses of the system which is similar to the response of the survey, especially the keywords/answers selection process and the long processing time.

Conclusion and Future Development

The keywords/answers selection process that only relies on NER tagging can not select all possible answers and that affects the quality of the generated questions. Therefore we suggest improving this part in the future by using another selection approach and hence to reduce the user involvement in the generation process.

About the response time, as mentioned in the evaluation, it takes about 1 second for generating each question and it is not an ideal time. Therefore we suggest improving this area by optimizing the program or finding another approach to generate questions, such as the typical encoder-decoder solutions with some additional features, that should have a much faster processing speed because of the smaller size of the model.

We hope in the near future, the technology of question generation would be more powerful that can actually help teachers to generate questions for their students so that the teachers can have more time to focus on teaching.