Skip to content

Computer Science Department Colloquium Series Presents Dr. Cornelia Caragea


“Keyphrase Extraction in Document Networks”

 Dr. Cornelia Caragea

Assistant Professor

University of North Texas


Keyphrase extraction is defined as the problem of automatically extracting descriptive phrases or concepts from documents. Keyphrases for a document act as a concise summary of the document and have been successfully used in many applications such as query formulation, document clustering, classification, recommendation, indexing, and summarization. Previous approaches to keyphrase extraction generally use the textual content of a target document or a local neighborhood that consists of textually-similar documents. We posit that, in addition to a document’s textual content and textually-similar neighbors, other informative neighborhoods exist that have the potential to improve keyphrase extraction. For example, in a scholarly domain, research papers are not isolated. Rather, they are highly inter-connected in giant citation networks, in which papers cite or are cited by other papers in appropriate citation contexts, i.e., short text segments surrounding a citation’s mention. These contexts often serve as brief summaries of a cited paper. We effectively exploit the information available in document networks and show remarkable improvements in the performance of our models over strong baselines in both supervised and unsupervised settings. Through our research, we identify several aspects of the keyphrase extraction task that bring additional challenges, including the subjectivity of keyphrase assignment, which we quantify by crowdsourcing keyphrases for news and fashion magazine articles with many annotators per document.


Cornelia Caragea is an Assistant Professor at the University of North Texas in the Computer Science and Engineering department, where she directs the Machine Learning group. Her research interests lie at the intersection of artificial intelligence, machine learning, information retrieval, and natural language processing, with applications to text and image analysis, scientific data analysis, and social media. She has published research papers in prestigious venues such as AAAI, IJCAI, WWW, EMNLP, ICDM, and ACM Transactions on the Web. Cornelia reviewed for many journals including Nature, ACM TIST, JAIR, and IEEE TKDE, served on several NSF panels, and was a program committee member for top conferences such as AAAI, IJCAI, ACL, NAACL, EMNLP, and SIGIR. She also organized several workshops on scholarly big data in conferences such as IJCAI, AAAI, and IEEE BigData. Cornelia earned a Bachelor of Science degree in Computer Science and Mathematics from the University of Bucharest, and a Ph.D. in Computer Science from the Iowa State University. Prior to joining the University of North Texas in Fall 2012, she was a post-doctoral researcher at the Pennsylvania State University.

         Date:        Friday, April 14th, 2017

         Time:       11:00am to 12:00pm

         Location:  ECSS 2.410

         Refreshments  will be served at 10:45am