分类:Hybrid neural tagging model for open relation extraction

来自Big Physics


Jia S , Shijia E , Ding L , et al. Hybrid neural tagging model for open relation extraction. Expert Systems with Applications, 2022


Abstract

Open Relation Extraction (ORE) task remains a challenge to obtain a semantic representation by discovering arbitrary relations from the unstructured text. Conventional methods heavily depend on feature engineering or syntactic parsing, which are inefficient or error-cascading. Recently, leveraging supervised deep learning methods to address the ORE task is a promising way. However, there are two main challenges: (1) The lack of enough labeled corpus to support supervised training; (2) The exploration of specific neural architecture that adapts to the characteristics of open relation extracting. In this paper, we build a large-scale, high-quality training corpus in a fully automated way. And wedesign a tagging scheme to assist in transforming the ORE task into a sequence tagging processing. Furthermore, we propose a hybrid neural network model (HNN4ORT) for open relation tagging. The model employs the Ordered Neurons LSTM to encode potential syntactic information to capture the associations among the arguments and relations. It also emerges a novel Dual Aware Mechanism, including Local-aware Attention and Global-aware Convolution. The dual awarenesses complement each other. Takes the sentence-level semantics as a global perspective, and at the same time, the model implements salient local features to achieve sparse annotation. Experiment results on various testing sets show that our model achieves state-of-the-art performance compared toconventional methods or other neural models.

研究问题

  1. 利用神经网络结构来解决开放关系抽取的问题
  2. 构建一个有效的开放关系提取的数据集

研究方法

  1. 数据标注:句中有多个关系时,为每个关系单独进行标注
  2. 模型结构:GloVe词嵌入 + ON_LSTM + 局部词注意力 + CNN提取全局信息 + CRF
  3. 数据集构建:利用已有的三个开放关系抽取工具——OLLIE, Open IE-4, ClausIE对数据进行开放关系抽取,并筛选三个抽取工具都抽取的三元组,最后进行人工筛查。

研究结果

性能上与已有的开放关系抽取工具得到了SOTA结果,并对模型各部分功能进行了消融分析。同时对模型判断错误的例子也进行了错误分析。

总结和评论

  • 利用序列标注方式解决开放关系提取的想法可以参考
  • 目前看到的开放关系提取工作中性能最高的,但主要依赖于ON_LSTM的模型
  • 数据集目前是非公开的,只公开了一个demo

论文地址:https://www.sciencedirect.com/science/article/pii/S0957417422003797?via%3Dihub 代码地址:https://github.com/TJUNLP/NSL4OIE

概念地图

本分类目前不含有任何页面或媒体文件。