Core Entity Similarity Analysis for Retrieval of Highly Related Biomedical Literature
Biomedical literature has been an essential resource for biomedical research. As information needs of biomedical researchers are often about specific topics or issues, retrieval of the references (articles) that are highly related to the information needs is a fundamental step for the researchers. In this project, we plan to develop an information technique to retrieve articles that are highly related to a given target biomedical article r. An article d is said to be highly related to a target article r if d and r share similar core contents, including research goals, research methods, and main findings. Retrieval of the highly related articles is challenging and essential for the curators, authors, reviewers, and readers of r. The technical framework developed in the project is named CESA (Core Entity Similarity Analysis) as it retrieves those articles that have high core entity similarity with r. For an article x, core entities in x are those biomedical entities (e.g., proteins, genes, diseases, and chemicals) that are related to the conclusions of x. Contributions of CESA are of significance to the information retrieval community, as well as practical significance to the curation of biomedical evidences published in literature, writing of biomedical papers, review of biomedical papers, and reading of highly related biomedical articles.