《電子技術(shù)應(yīng)用》
您所在的位置:首頁(yè) > 其他 > 設(shè)計(jì)應(yīng)用 > 基于PCNN的工業(yè)制造領(lǐng)域質(zhì)量文本實(shí)體關(guān)系抽取方法
基于PCNN的工業(yè)制造領(lǐng)域質(zhì)量文本實(shí)體關(guān)系抽取方法
信息技術(shù)與網(wǎng)絡(luò)安全
張 彤1,宋明艷1,王 俊1,2,白 洋1
(1.北京京航計(jì)算通訊研究所,北京100071;   2.哈爾濱工業(yè)大學(xué) 經(jīng)濟(jì)與管理學(xué)院,黑龍江 哈爾濱150006)
摘要: 對(duì)汽車(chē)、機(jī)械等工業(yè)制造行業(yè)的質(zhì)量報(bào)告進(jìn)行關(guān)系抽取,對(duì)于該行業(yè)質(zhì)量知識(shí)圖譜、質(zhì)量問(wèn)答系統(tǒng)等研究有著極為重要的意義。針對(duì)在工業(yè)制造領(lǐng)域的質(zhì)量知識(shí)圖譜構(gòu)建過(guò)程中尚無(wú)公開(kāi)數(shù)據(jù)集可用的情況,收集了質(zhì)量文本并進(jìn)行相應(yīng)的專(zhuān)業(yè)標(biāo)注,構(gòu)建了工業(yè)制造領(lǐng)域質(zhì)量知識(shí)圖譜關(guān)系抽取專(zhuān)業(yè)數(shù)據(jù)集?;谠摂?shù)據(jù)集利用分段卷積神經(jīng)網(wǎng)絡(luò)(Piecewise Convolutional Neural Network,PCNN)實(shí)現(xiàn)關(guān)系抽取,然后根據(jù)中文特性,提出了改進(jìn)的PCNN模型(C-PCNN),以提升在中文語(yǔ)料中關(guān)系抽取的性能。在本文構(gòu)建的數(shù)據(jù)集中,改進(jìn)后模型的準(zhǔn)確率、召回率以及F1值優(yōu)于對(duì)比的PCNN和RNN模型,驗(yàn)證了該方法的可行性和有效性。該研究對(duì)從事制造行業(yè)的人員有一定的實(shí)際意義。
中圖分類(lèi)號(hào): TP391
文獻(xiàn)標(biāo)識(shí)碼: ADOI: 10.19358/j.issn.2096-5133.2021.03.002
引用格式: 張彤,宋明艷,王俊,等。 基于PCNN的工業(yè)制造領(lǐng)域質(zhì)量文本實(shí)體關(guān)系抽取方法[J].信息技術(shù)與網(wǎng)絡(luò)安全,2021,40(3):8-13.
Entity relation extraction method for quality text of industrial manufacturing based on Piecewise Convolutional Neural Network
Zhang Tong1,Song Mingyan1,Wang Jun1,2,Bai Yang1
(1.Beijing Jinghang Research Institute of Computing and Communication,Beijing 100071,China; 2.School of Management,Harbin Institute of Technology,Harbin 150006,China)
Abstract: Relation extraction of quality reports in industrial manufacturing industries such as automobiles and machinery is of great significance to the research of quality knowledge graph and quality question answering system of the industry. Aiming at the situation that there is no public dataset available for relation extraction of quality reports in the industrial manufacturing field, this paper collects quality reports in the field of industrial manufacturing and makes corresponding professional labels to construct a professional dataset for relation extraction. Based on this dataset, Piecewise Convolutional Neural Network(PCNN) is used for relation extraction. To be more specific, then based on Chinese characteristics, an improved PCNN model(C-PCNN) based on chinese characteristics is proposed to improve the performance of relation extraction in chinese corpus. Experimental results on the constructed dataset show that the accuracy, recall, and F1 values of the C-PCNN are respectively better than PCNN and RNN, indicating the feasibility and effectiveness of the method. This research has practical significance for personnel engaged in the manufacturing industry.
Key words : industrial manufacturing;quality text;relation extraction;Piecewise Convolutional Neural Network

0 引言

汽車(chē)、機(jī)械等工業(yè)制造行業(yè)的產(chǎn)品是涉及多個(gè)技術(shù)領(lǐng)域的高精度、高可靠性產(chǎn)品,具有結(jié)構(gòu)復(fù)雜,生產(chǎn)周期長(zhǎng)、生產(chǎn)狀態(tài)多等特點(diǎn)[1]。隨著信息化時(shí)代的發(fā)展,在生產(chǎn)研制過(guò)程中產(chǎn)生的各類(lèi)質(zhì)量數(shù)據(jù)日趨龐大,但由于現(xiàn)階段缺乏統(tǒng)一的數(shù)據(jù)管理,各類(lèi)質(zhì)量信息散落在業(yè)務(wù)系統(tǒng)中,以電子或紙質(zhì)文檔方式存在,這些離散存儲(chǔ)的質(zhì)量信息包含各類(lèi)質(zhì)量問(wèn)題的原因、問(wèn)題部件、采取措施等關(guān)鍵信息。如何從這些離散存儲(chǔ)的質(zhì)量信息中抽取出有效信息,為工業(yè)制造提供數(shù)據(jù)支撐,幫助相關(guān)人員有效監(jiān)督產(chǎn)品生產(chǎn)、快速解決質(zhì)量問(wèn)題,構(gòu)成工業(yè)制造領(lǐng)域質(zhì)量管理的迫切需求。本文從質(zhì)量文本出發(fā),利用關(guān)系抽取技術(shù)挖掘文本中實(shí)體間存在的語(yǔ)義關(guān)系,為后續(xù)構(gòu)建質(zhì)量知識(shí)圖譜、質(zhì)量問(wèn)答系統(tǒng)奠定堅(jiān)實(shí)基礎(chǔ)。




本文詳細(xì)內(nèi)容請(qǐng)下載:http://theprogrammingfactory.com/resource/share/2000003422




作者信息:

張  彤1,宋明艷1,王  俊1,2,白  洋1

(1.北京京航計(jì)算通訊研究所,北京100071;

2.哈爾濱工業(yè)大學(xué) 經(jīng)濟(jì)與管理學(xué)院,黑龍江 哈爾濱150006)


此內(nèi)容為AET網(wǎng)站原創(chuàng),未經(jīng)授權(quán)禁止轉(zhuǎn)載。