分类:Identifying Meaningful Citations
Marco Valenzuela, Vu Ha and Oren Etzioni, Identifying Meaningful Citations, http://go.nature.com/2th2voa
Abstract
We introduce the novel task of identifying important citations in scholarly literature, i.e., citations that indicate that the cited work is used or extended in the new effort. We believe this task is a crucial component in algorithms that detect and follow research topics and in methods that measure the quality of publications. We model this task as a supervised classification problem at two levels of detail: a coarse one with classes (important vs. non-important), and a more detailed one with four importance classes. We annotate a dataset of approximately 450 citations with this information, and release it publicly. We propose a supervised classification approach that addresses this task with a battery of features that range from citation counts to where the citation appears in the body of the paper, and show that, our approach achieves a precision of 65% for a recall of 90%.
总结和评论
这篇文章用机器学习的算法来解决关键引文的问题:有一些引文是真正的工作基础,有一些仅仅是大背景或者勉强的引用,如何区分它们。
具体来说,这篇文章用监督学习,将引用分为两大类(重要和非重要),更进一步分为4个层次。除考虑了直接引用外,还考虑了非显示引用(比如引用的算法名称、人名等),其它特征与Measuring academic influence: Not all citations are equal采用的特征类似。在结果上,直接引用的数量是非常有效的特征。
同时,论文标注了一些数据,可以供进一步研究。另一方面,其实,这个标记数据数量相当少。另一个这方面的工作里面分类:Measuring academic influence: Not all citations are equal也类似。也就是说,其实,我们要做进一步的工作还是比较容易超越这些研究的。
本分类目前不含有任何页面或媒体文件。