面向列语义识别的共现属性交互模型构建与优化

doi:10.13328/j.cnki.jos.006787

微信小程序

微信服务号

微信订阅号

首页 > 过刊浏览>2023年第34卷第3期 >1010-1026. DOI:10.13328/j.cnki.jos.006787

PDF HTML阅读 XML下载导出引用引用提醒

面向列语义识别的共现属性交互模型构建与优化
DOI:
                        10.13328/j.cnki.jos.006787
                    
CSTR:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:高珊(1997-),女,硕士,主要研究领域为自然语言处理,数据标准化;王兰(1993-),女,硕士,主要研究领域为自然语言处理;袁宛竹(1998-),女,硕士生,主要研究领域为自然语言处理;张静(1973-),女,博士,副教授,博士生导师,CCF专业会员,主要研究领域为数据挖掘,自然语言处理;卢卫(1981-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为数据库基础理论,大数据系统研制,时空背景下的查询处理,云数据库系统和应用;杜小勇(1963-),男,博士,教授,博士生导师,CCF会士,主要研究领域为智能信息检索,高性能数据库,非结构化数据管理.
通讯作者:卢卫，lu-wei@ruc.edu.cn;杜小勇，duyong@ruc.edu.cn
中图分类号:
基金项目:国家重点研发计划（2020YFB2104101）

Construction and Optimization of Co-occurrence-attribute-interaction Model for Column Semantic Recognition

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

政务数据治理正在经历从“物理数据汇聚”到“逻辑语义汇通”的新阶段.逻辑语义汇通是指针对各孤岛政务系统因长期“自治”而形成的元数据缺失、元数据同名不同义以及同义不同名等问题，在不重建或修改原系统代码以及不物理汇聚各政务数据的前提下，通过技术手段，统一各孤岛信息系统元数据的语义表达，实现元数据的语义互联互通.该工作是将各孤岛信息系统的元数据语义对齐到已有的标准元数据上，具体地，将标准元数据名称看作语义标签，对孤岛关系数据的列投影进行语义识别，从而建立列名和标准元数据的语义对齐，实现孤岛元数据标准化治理.已有基于列投影的语义识别技术无法捕捉到关系数据的列顺序无关性特征以及属性语义标签之间的相关性特征，针对这一问题，提出了基于预测阶段和纠错阶段的两阶段模型：在预测阶段，提出了共现属性交互的CAI模型（co-occurrence-attribute-interaction model），利用并行化的自注意力机制保证列顺序无关的共现属性交互；在纠错阶段，结合语义标签之间的共现性，通过引入纠错机制（correction mechanism），优化CAI模型预测结果.在政务基准数据和Magellan等多组公开英文数据集上进行了实验，结果表明，引入纠错机制的两阶段模型，在宏平均和加权平均两个指标上，比已有最优模型最多可分别提高20.03%，13.36%.

Abstract:

Government data governance is undergoing a new phase of transition from "physical data aggregation" to "logical semantic uniﬁcation". Thus far, long-term "autonomy" of government information silos, lead to a wide spectrum of metadata curation issues, such as attributes with the same names but having different meanings, or attributes with different names but having the same meanings. Instead of either rebuilding/modifying legacy information systems or physically aggregating data from isolated information systems, logical semantic uniﬁcation solves this problem by unifying the semantic expression of the metadata in government information silos and achieves the standardized metadata governance. This work semantically aligns the metadata of each government information silo to the existing standard metadata. Specifically, the standard metadata names are viewed as semantic labels, and the semantic meanings of columns of relations in each government information silo are semantically identified, so as to establish the semantic alignment of column names and standard metadata and achieve standardized governance of silo metadata.

参考文献

相似文献

引证文献

引用本文

高珊,袁宛竹,卢卫,王兰,张静,杜小勇.面向列语义识别的共现属性交互模型构建与优化.软件学报,2023,34(3):1010-1026

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-05-15
最后修改日期:2022-07-29
录用日期:
在线发布日期: 2022-10-26
出版日期: 2023-03-06

微信小程序

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码