具有關(guān)系敏感嵌入的知識庫錯誤檢測
2020年信息技術(shù)與網(wǎng)絡(luò)安全第10期
繆 琦,楊昕悅
遼寧工程技術(shù)大學(xué) 電子與信息工程學(xué)院,遼寧 葫蘆島125105
摘要: 準(zhǔn)確性與質(zhì)量對于知識庫而言尤為重要,盡管已經(jīng)有很多關(guān)于知識庫不完整性的研究,但是很少有工作者考慮到對于知識庫存在的錯誤進行檢測,按照傳統(tǒng)方法通常無法有效捕捉知識庫中錯誤事實內(nèi)在相關(guān)性。本文提出了一種知識庫具有關(guān)系敏感嵌入式方法NSIL,以獲取知識庫各關(guān)系之間的相關(guān)性,從而檢查出知識庫中的錯誤,以此提高知識庫的準(zhǔn)確性與質(zhì)量。該方法分為相關(guān)性處理和錯誤檢測兩階段。在相關(guān)性處理階段,使用NSIL的相關(guān)函數(shù)以分值形式獲取各關(guān)系之間的相關(guān)度;在錯誤檢測階段,基于相關(guān)度分值進行錯誤檢測,對于缺失主體或客體的三元組進行缺失成分預(yù)測。最后在知識庫之一Freebase生成的基準(zhǔn)數(shù)據(jù)集“FB15K”上進行了廣泛驗證,證明了該方法在知識庫錯誤知識檢測方面有著很高的性能。
中圖分類號: TP183
文獻標(biāo)識碼: A
DOI: 10.19358/j.issn.2096-5133.2020.10.005
引用格式: 繆琦,楊昕悅. 具有關(guān)系敏感嵌入的知識庫錯誤檢測[J].信息技術(shù)與網(wǎng)絡(luò)安全,2020,39(10):23-27,37.
文獻標(biāo)識碼: A
DOI: 10.19358/j.issn.2096-5133.2020.10.005
引用格式: 繆琦,楊昕悅. 具有關(guān)系敏感嵌入的知識庫錯誤檢測[J].信息技術(shù)與網(wǎng)絡(luò)安全,2020,39(10):23-27,37.
Knowledge base error detection with relation sensitive embedding
Miao Qi,Yang Xinyue
School of Electronic and Information Engineering,Liaoning Technical University,Huludao 125105,China
Abstract: Accuracy and quality are very important for the knowledge base. Although there have been many researches on the incompleteness of knowledge base, few workers consider the detection of errors in the knowledge base. According to the traditional methods, it is usually unable to effectively capture the internal correlation of errors in the knowledge base, so as to check the errors. In this paper, a relational sensitive embedded method NSIL for knowledge base is proposed to obtain the correlation among the relationships between them, so as to check out the errors in the knowledge base, so as to improve the accuracy and quality of the knowledge base. This method is divided into two stages: correlation processing and error detection. In the correlation processing stage, correlation function of NSIL is used to obtain the correlation degree of each relationship in the form of score; in the error detection stage, error detection is based on the score of correlation degree, and missing component prediction is carried out for the triplet of missing subject or object. At last, the method is verified on the benchmark data set "FB15K" which is generated by Freebase, one of the largest knowledge bases. It is proved that the method has high performance in knowledge base error detection.
Key words : knowledge base;embedding model;error detection
0 引言
如今,知識庫已經(jīng)成為各種研究和應(yīng)用越來越重要的和常用的數(shù)據(jù)源,如語義搜索、實體鏈接、問答系統(tǒng)和自然語言處理等。為了使龐大數(shù)據(jù)庫更易于操作,研究者提出了一種新的研究方向——知識庫嵌入。關(guān)鍵思想是嵌入KB(Knowledge Base)組件,包括將實體和關(guān)系轉(zhuǎn)化為連續(xù)的向量空間,從而簡化操作,同時保留KB原有的結(jié)構(gòu)。實體和關(guān)系嵌入能進一步應(yīng)用于各種任務(wù)中,如KB補全、關(guān)系提取、實體分類和實體解析。雖然龐大的知識庫中有數(shù)以億計的事實,但是在信息爆炸的時代遠遠不夠。大部分的研究工作聚焦知識庫對缺失邊的擴充,很少有人考慮到其中過時的、不正確的信息[1-3]。許多擴充知識庫研究將事實投射到k維向量空間,通過聚類來找到關(guān)系的相關(guān)性,很難實現(xiàn)高效有效處理。
本文詳細內(nèi)容請下載:http://theprogrammingfactory.com/resource/share/2000003133
作者信息:
繆 琦,楊昕悅
(遼寧工程技術(shù)大學(xué) 電子與信息工程學(xué)院,遼寧 葫蘆島125105)
此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。