Classification of Short Answers for Semi-Automated Grading and Feedback in online Assessment

School of Science and Technology 科技學院
Computing Programmes 電腦學系

Classification of Short Answers for Semi-Automated Grading and Feedback in online Assessment

TSUI Yiu Chuen, LAI Ka Wai, CHENG Mang Kwan


Programme	Bachelor of Computing with Honours in Internet Technology
Supervisor	Dr. Andrew Lui
Areas	Teaching and Classroom Support
Year of Completion	2017
Award Received	First Runner-Up, IEEE Computational Intelligence Chapter FYP Competition 2017

Objectives

The aim of the project is to develop a semi-automated grading short answer algorithm to reduce and make better use of instructors grading effort. In traditional way, short answer grading process need a lot of effort and time. We think that this process can be done in a more efficient way. Semi-automated grading is the combination of traditional grading and automated grouping of answer. We will design a framework for clustering the short answer in different group. The answers which are similar and share common feature are grouped in a cluster. We want to explore the ideal feature set so that the answers will successfully cluster the answers according to the similarity. Hence, instructors only grade representative answer of each group, the grading apply to all the answer in the cluster.

We also want to investigate the balance of the numbers of cluster needed to grade and the accuracy of clusters. We hope instructors can grade the fewest cluster and still have excellent grading performance.

To achieve the aim, the main of objective is to classify and cluster the student short answer and then grade the short answer automatically. The project has defined a number of sub-objectives as follows:

The design of the feature set for clustering the answers. For example, the essential factors and not essential factor for determine the similarity of answer.
The selection of data to be collected and processed.
Evaluation of the design. The evaluation can access the accuracy of the feature set.
Evaluation of the cluster quality indices. The evaluation can access the index that can estimate the purity of the cluster.

Video Demonstration

Background and Methodology

This project is a research based project which aims to cluster the short answers to amplify graders effort. Graders are likely to have less clusters to grade (i.e. the minimalized K value). However, with the lower clusters number, the harder to get pure cluster. Pure clusters mean that the group contains answers that deserve same grade.

To achieve this, there are some investigations on balancing these two dimensions as follows:

Feature model. It is an important issue that to determine which features are capable to distinguish the similarity between answers.
Clustering algorithm. Selection of clustering algorithm is the key leading to success. Choice of algorithm would affect the performance of the clustering result.
Cluster quality measurements. Purity of the clusters can measure whether the answers within a cluster is having same grade. However, in the real-life cases, the answers are not labelled. So, we need to find a measurement that able to estimate the purity.
Implementation technologies. The prototype system needs helpful technologies to implement
Dataset. The datasets will be used in the project.

Conclusion and Future Development

Semi-Automated grading is still a challenging problem. This research suggests the use of Typed Dependencies as features is advanced and worth to continue investigating further. Compared to fully-automated grading, semi-automating combines the human grading and computer-assisted, which provides higher flexibility and accuracy to the graders. Also, their effort would be amplified compared to traditional grading process. Our project aim is achieved. This research has mainly two contributions. In the following paragraphs, we will discuss them in the following sub-chapters.

There are some limitations in our project. Synonyms and antonyms cannot be identified in our algorithm. It is important to have it because students may use different words to present the same meaning. On the other hand, abbreviation cannot be detected in our algorithm. In some dataset, there is a frequently use of abbreviation to express their answers. For example, “us” can be literally understand as a pronoun and the abbreviation of “United State”. Handling them can benefit to the accuracy of the algorithm.

Also, we did not investigate other possibly use of clustering algorithm. We used K-Means in our experiment only. There is possibility that other clustering algorithm could have better performance then K-Means.

As the limitations stated above, we suggest that future work can aim at them. Synonyms, antonyms and abbreviation features can be investigated to check whether it is useful to identify the answers more accurately.

Clustering algorithm investigation is also one of the future work. The choice of algorithms can be a main issue to have different performance.