Text Classification and Ranking for Disease Diagnosis Support


Rey-Long Liu



The Internet has been a main channel from which healthcare professionals and consumers get medical information services. For the users to discover in-depth and relevant medical knowledge, identification of the target disease is a fundamental step, since the management and treatment of specific diseases are the topmost information needs of users. By identifying the target diseases, various kinds of information services may be provided online, including the provision of related information for disease management (for patients), the provision of medical information for medical education (for medical students), and the recommendation of suitable medical experts for medical consultancy (for patients). However, the development of a disease diagnosis support system is challenging since we need to consider a large number of risk factors, symptoms, and signs (RSS) of individual diseases, and moreover the RSS items are correlating and evolving, since two diseases may have overlapping RSS items and the RSS items of diseases may be updated by new medical research. Therefore in this project, we propose to develop a text-based approach to developing a disease ranking system to provide disease diagnosis support online. Technical motivations of the text-based approach include (1) new medical findings are often written in text form, (2) the input to a disease diagnosis system is often a description of RSS (from patients or medical students) that is in text form as well, and (3) many text analysis techniques were developed however their enhancement to build a disease diagnosis system deserves exploration. We systematically develop the techniques to extract diagnosis passages from medical texts, identify critical and correlating RSS items for individual diseases, and rank diseases with respect to the given RSS descriptions. An online system will also be implemented based on the techniques to provide disease diagnosis support to users. The research is of both theoretical and practical significance, and the interdisciplinary study may provide impacts to the communities of both information technology and healthcare.


Keywords: disease diagnosis support, risk factors of diseases, symptoms and signs of diseases, diagnosis passages extraction, disease ranking system.


     Back to Research Project