Interactive High-Quality Text Classification for Healthcare Information Support
Automatic text classification (TC) is a key component for the sharing, dissemination, and management of information. Previous studies have developed many promising TC techniques whose ideal performance is to accept all those documents that should be accepted (i.e. recall = 100%) and reject all those documents that should be rejected (i.e. precision = 100%). Unfortunately, the ideal performance was seldom achieved, due to several problems that are often inevitable in practice: imperfect selection of training documents (e.g. noises, over-fitting and content ambiguities) and imperfect system setting (e.g. parameter setting and feature selection). Therefore, there seems to be a performance limit for most automatic classifiers, making automatic TC not suitable for those domains in which each of the classifier’s erroneous decision may incur high cost and/or serious problems. An example application is healthcare information support through the Internet, in which patients require validated information and professional consultancy provided by medical experts who often have very little amount of time. In the project, we explore the way to achieve high-quality TC (i.e. high performance in precision and recall) by consulting the domain experts to make confirmations to the classifier’s decisions. The major challenge lies on achieving high-quality TC without incurring too heavy cognitive loading to the expert. We plan to comprehensively survey, develop, and evaluate suitable confirmation mechanisms. The contributions of the project are of both theoretical and practical significance to the classification of suitable information into suitable categories, especially when the quality of TC is a critical concern in the application domain.
Keywords: Text Classification, Classification Quality, Interactive Confirmation, Cognitive Loading