Core Content Analysis for Retrieval of Highly Related Biomedical Literature
In this project, we plan to extend our previous studies on the retrieval of highly related biomedical articles. Given a target biomedical article r, an article d is said to be highly related to a target article r only if d and r share similar core contents, including their research goals and main findings. Retrieval of the highly related articles is challenging and essential for the curators, authors, reviewers, and readers of r. We plan to develop an information technique that retrieves highly related biomedical articles based on their titles and abstracts, which are commonly available on the Internet. The technique recognizes core contents from the abstracts, and based on the core contents estimate the similarity between articles. The technique is novel when compared with previous studies and existing search engines (e.g., PubMed), which often estimate the inter-article similarity based on certain statistical data of terms (e.g., rareness and frequency of each term). Contributions of the project are of significance to the information retrieval community, as well as practical significance to the curation of the biomedical evidence, writing of biomedical papers, review of biomedical papers, and reading of highly related biomedical articles.