《電子技術(shù)應(yīng)用》
您所在的位置:首頁 > 人工智能 > 設(shè)計應(yīng)用 > 面向多源異構(gòu)數(shù)據(jù)的知識圖譜可視化融合方法
面向多源異構(gòu)數(shù)據(jù)的知識圖譜可視化融合方法
電子技術(shù)應(yīng)用
梁浩1,付達(dá)2
1.深圳鵬銳信息技術(shù)股份有限公司;2.北京京能能源技術(shù)研究有限責(zé)任公司
摘要: 為解決數(shù)據(jù)冗余沖突與關(guān)聯(lián)缺失問題,研究面向多源異構(gòu)數(shù)據(jù)的知識圖譜可視化融合方法,提升數(shù)據(jù)融合的可靠性。利用網(wǎng)絡(luò)本體語言為多源異構(gòu)數(shù)據(jù)建立對應(yīng)的領(lǐng)域本體庫與全局本體庫,使得知識實體抽取和知識融合在同一框架下進(jìn)行;通過長短期記憶網(wǎng)絡(luò)-條件隨機場模型,在本體庫約束下,從多源異構(gòu)數(shù)據(jù)中抽取符合領(lǐng)域定義的知識實體;利用基于層次過濾思想的知識融合模型,可視化融合抽取的知識實體,解決多源異構(gòu)數(shù)據(jù)中冗余信息和不一致性問題,形成準(zhǔn)確、完整、可靠的多源異構(gòu)數(shù)據(jù)可視化融合知識圖譜,有助于發(fā)現(xiàn)潛在的數(shù)據(jù)關(guān)聯(lián),補全數(shù)據(jù)關(guān)聯(lián)缺失。實驗結(jié)果表明:隨著數(shù)據(jù)缺失比例的提升,尺度系數(shù)與屬性覆蓋度均開始下降,最低尺度系數(shù)與屬性覆蓋度是0.86與0.87,均顯著高于對應(yīng)的閾值;所提方法在處理四個數(shù)據(jù)源時,視覺清晰度達(dá)93%~97%,信息融合度達(dá)92%~96%,均優(yōu)于對比方法。說明該方法可有效抽取多源異構(gòu)數(shù)據(jù)知識實體,建立知識圖譜,實現(xiàn)多源異構(gòu)數(shù)據(jù)可視化融合;在不同數(shù)據(jù)缺失比例下,該方法多源異構(gòu)數(shù)據(jù)可視化融合的尺度系數(shù)與屬性覆蓋度均較大,即數(shù)據(jù)可視化融合效果較優(yōu);同時有效提升了數(shù)據(jù)可視化效果和信息整合程度。
中圖分類號:TP391 文獻(xiàn)標(biāo)志碼:A DOI: 10.16157/j.issn.0258-7998.245966
中文引用格式: 梁浩,付達(dá). 面向多源異構(gòu)數(shù)據(jù)的知識圖譜可視化融合方法[J]. 電子技術(shù)應(yīng)用,2025,51(6):47-53.
英文引用格式: Liang Hao,F(xiàn)u Da. Knowledge graph visualization fusion method for heterogeneous data from multiple sources[J]. Application of Electronic Technique,2025,51(6):47-53.
Knowledge graph visualization fusion method for heterogeneous data from multiple sources
Liang Hao1,F(xiàn)u Da2
1.Plant Resource Technology Co., Ltd.; 2.Beijing Jingneng Energy Technology Reach Co., Ltd.
Abstract: In order to solve the problem of data redundancy conflict and lack of association, a knowledge graph visualization fusion method for multi-source heterogeneous data is studied to improve the reliability of data fusion. The domain ontology database and global ontology database corresponding to multi-source heterogeneous data are established by using Web Ontdogy Languge(OWL), so that knowledge entity extraction and knowledge fusion are carried out under the same framework. Based on the Long Short-Term Memory network(LSTM) and Conditional Random Field(CRF) model, knowledge entities conforming to domain definition are extracted from heterogeneous data from multiple sources under the constraint of ontology library. The knowledge fusion model based on hierarchical filtering is used to visualize the extracted knowledge entities, solve the redundant information and inconsistency problems in multi-source heterogeneous data, and form an accurate, complete and reliable multi-source heterogeneous data visualization fusion knowledge graph, which helps to find potential data associations and complete the missing data associations. The experimental results show that with the increase of the proportion of missing data, the scaling coefficient and attribute coverage begin to decrease, and the lowest scaling coefficient and attribute coverage are 0.86 and 0.87, which are significantly higher than the corresponding thresholds. When dealing with four data sources, the visual clarity of the proposed method is 93%~97%, and the information fusion is 92%~96%, which are better than the comparison methods. It shows that the method can effectively extract the knowledge entities of multi-source heterogeneous data, establish the knowledge graph, and realize the visualization fusion of multi-source
Key words : multi-source heterogeneous data;knowledge graph;visual ization fusion;ontology library;long short-term memory network;conditional random field

引言

在實際應(yīng)用中,數(shù)據(jù)往往來源于多個不同的源頭,具有異構(gòu)性、多樣性和復(fù)雜性等特點,這給數(shù)據(jù)的處理、分析和應(yīng)用帶來了巨大挑戰(zhàn)[1]。多源異構(gòu)數(shù)據(jù)融合方法應(yīng)運而生,旨在通過先進(jìn)的技術(shù)手段,將來自不同數(shù)據(jù)源、不同格式、不同結(jié)構(gòu)的數(shù)據(jù)進(jìn)行有效整合與展示,為用戶提供直觀、全面、深入的數(shù)據(jù)洞察[2]。

多源異構(gòu)數(shù)據(jù)融合方法不僅有助于解決數(shù)據(jù)孤島問題,實現(xiàn)數(shù)據(jù)的互聯(lián)互通[3],還能夠顯著提升數(shù)據(jù)處理的效率和準(zhǔn)確性,為決策支持、科學(xué)研究、產(chǎn)業(yè)創(chuàng)新等領(lǐng)域提供強有力的數(shù)據(jù)支撐。例如,莫慧凌等人利用聯(lián)邦學(xué)習(xí)框架實現(xiàn)數(shù)據(jù)融合,各參與方均利用張量Tucker分解理論,提取數(shù)據(jù)特征;通過中央服務(wù)器收集并聚合來自各參與方的模型參數(shù),形成全局模型;以多次迭代方式優(yōu)化全局模型,完成數(shù)據(jù)融合[4]。在異構(gòu)數(shù)據(jù)中,存在冗余或沖突的信息。Tucker分解和聯(lián)邦學(xué)習(xí)框架在處理這些信息時無法完全避免冗余和沖突的影響,進(jìn)而影響數(shù)據(jù)融合效果。王姝等人利用信息熵評估各證據(jù)源的相對重要性,并通過散度計算來獲取證據(jù)可信度優(yōu)化證據(jù),得到差異信息量,確定各數(shù)據(jù)源的最終權(quán)重,進(jìn)行數(shù)據(jù)融合[5]。信息熵方法主要關(guān)注于信息量的評估,而對于數(shù)據(jù)之間的冗余性缺乏直接的識別能力,導(dǎo)致數(shù)據(jù)融合過程中冗余數(shù)據(jù)仍然被保留,增加數(shù)據(jù)處理的復(fù)雜性和計算成本??飶V生等人利用圖的聚類算法來識別數(shù)據(jù)中的相似性,進(jìn)而將相似的數(shù)據(jù)項進(jìn)行融合[6]。圖的聚類算法主要依賴于數(shù)據(jù)間的相似關(guān)系進(jìn)行聚類。然而,當(dāng)數(shù)據(jù)集中存在關(guān)聯(lián)缺失時,該算法無法準(zhǔn)確地將這些數(shù)據(jù)項劃分為同一聚類,導(dǎo)致數(shù)據(jù)融合結(jié)果無法完全反映數(shù)據(jù)間的真實關(guān)系。Gong等人提出了一種多粒度視覺引導(dǎo)的多模態(tài)異構(gòu)圖實體級融合命名實體識別方法,該方法通過在不同視覺粒度上整合文本與視覺的跨模態(tài)語義交互信息,構(gòu)建全面的多模態(tài)表示[7]。利用多模態(tài)異構(gòu)圖精確描述實體級單詞與視覺對象的語義關(guān)系,并借助異構(gòu)圖注意力網(wǎng)絡(luò)實現(xiàn)細(xì)粒度跨模態(tài)語義交互,顯著提升識別準(zhǔn)確率,但實現(xiàn)過程復(fù)雜度較高,可能影響應(yīng)用效率。

在多源數(shù)據(jù)融合過程中,數(shù)據(jù)冗余和沖突是常見問題。知識圖譜通過去重、糾錯等步驟,以及關(guān)系網(wǎng)絡(luò)的構(gòu)建,能夠減少數(shù)據(jù)冗余和沖突,提高數(shù)據(jù)融合的準(zhǔn)確性和可靠性。同時,知識圖譜通過構(gòu)建實體之間的關(guān)系網(wǎng)絡(luò),能夠發(fā)現(xiàn)數(shù)據(jù)之間的潛在關(guān)聯(lián),從而補全數(shù)據(jù)關(guān)聯(lián)缺失的問題。為此,研究面向多源異構(gòu)數(shù)據(jù)的知識圖譜可視化融合方法,充分利用各種數(shù)據(jù)資源,避免數(shù)據(jù)浪費,提高數(shù)據(jù)利用率。


本文詳細(xì)內(nèi)容請下載:

http://theprogrammingfactory.com/resource/share/2000006561


作者信息:

梁浩1,付達(dá)2

(1.深圳鵬銳信息技術(shù)股份有限公司,廣東 深圳 518055;

2.北京京能能源技術(shù)研究有限責(zé)任公司,北京 100020)


Magazine.Subscription.jpg

此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。