分类:The AI2 system at SemEval-2017 Task 10 (ScienceIE): semi-supervised end-to-end entity and relation extraction

来自Big Physics
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Waleed Ammar, Matthew E. Peters, Chandra Bhagavatula, R. Power, The AI2 system at SemEval-2017 Task 10 (ScienceIE): semi-supervised end-to-end entity and relation extraction


This paper describes our submission for the ScienceIE shared task (SemEval2017 Task 10) on entity and relation extraction from scientific papers. Our model is based on the end-to-end relation extraction model of Miwa and Bansal (2016) with several enhancements such as semi-supervised learning via neural language models, character-level encoding, gazetteers extracted from existing knowledge bases, and model ensembles. Our official submission ranked first in end-to-end entity and relation extraction (scenario 1), and second in the relation-only extraction (scenario 3).


  • 概述:这篇文章基于对经典的end-to-end 实体-关系联合抽取模型的改进,实现了一些科学论文中的概念和关系提取算法。原始模型采用了端到端的神经网络结构来进行建模,通过在双向序列LSTM-RNNs上叠加双向树型结构LSTM-RNNs来捕获单词序列和依存关系树的子结构信息。本文在以下几个方面进行了改进:通过神经网络语言模型进行半监督学习;表示词向量时利用CNN进行字符级编码;利用从现有知识库中提取的索引词典标记实体特征;通过模型集成优化模型。
  • 任务:从科研论文中识别实体类型(Task / Material / Process)以及抽取实体关系( Hyponym-of / Synonym-of)
  • Entity model

将预训练的GloVe词向量和利用CNN得到的字符级编码的词向量拼接起来,作为模型中输入的词向量。利用 BiLSTM 作为序列编码器来捕捉上下文的信息,得到的输出则是 BiLSTM 在同一个时刻两个方向上的输出 。将实体识别任务看作是一个序列标注任务,实体标签采用 BILOU(Begin, Inside, Last, Outside, Unit) 的标注方式,而实体类别接续在实体标签之后。BiLSTM层输出的分数将作为CRF层的输入,类别序列中分数最高的类别就是预测的最终结果。

  • Relation model


  • 缺点:考虑到模型的复杂度和计算效率,本文将两个模型单独训练,实体与关系的抽取还是存在先后性,两个任务并不是完全同步进行的。

Schematic diagram of the model.png



AI2-system 概念地图.png
