
XML鏁版嵁涓庢満鍣ㄥ涔犵殑鏁村悎锛屾牳蹇冨湪浜庡皢鍏跺崐缁撴瀯鍖栫敋鑷崇湅浼尖€滄澗鏁b€濈殑淇℃伅锛屽阀濡欏湴杞寲涓烘満鍣ㄥ涔犳ā鍨嬭兘澶熺悊瑙e苟瀛︿範鐨勭粨鏋勫寲鐗瑰緛銆傞澶勭悊闃舵鏄噸涓箣閲嶏紝瀹冨喅瀹氫簡鍚庣画妯″瀷璁粌鐨勮川閲忓拰鏁堢巼锛屽叾鏈川灏辨槸灏哫ML鐨勫眰绾ц涔夎浆鍖栦负鎵佸钩鍖栫殑鏁板€煎悜閲忋€?/p> 瑙e喅鏂规
灏哫ML鏁版嵁铻嶅叆鏈哄櫒瀛︿範娴佺▼锛屾垜涓汉瑙夊緱锛屾渶鍏抽敭鐨勪竴姝ュ氨鏄€滆В鏋勨€濅笌鈥滈噸鏋勨€濄€俋ML鐨勫眰绾х粨鏋勫拰涓板瘜鐨勮涔夋爣绛撅紝鏃㈡槸瀹冪殑浼樺娍锛屼篃鏄畠鍦ㄦ満鍣ㄥ涔犻潰鍓嶇殑鈥滃寘琚扁€濄€傛垜浠緱鎶婂畠鎷嗗紑锛屾彁鍙栧嚭閭d簺鐪熸鏈夋剰涔夌殑纰庣墖锛屽啀鎸夌収鏈哄櫒瀛︿範妯″瀷鐨勮姹傦紝閲嶆柊缁勭粐鎴愭暣榻愮殑鐗瑰緛鐭╅樀銆?/p>
杩欎釜杩囩▼閫氬父鍖呮嫭鍑犱釜鍏抽敭鐜妭锛?/p>
1. XML瑙f瀽锛?/strong> 杩欐槸绗竴閬撻棬妲涖€傛垜浠渶瑕?a style="color:#f60; text-decoration:underline;" title="宸ュ叿" href="https://www.php.cn/zt/16887.html" target="_blank">宸ュ叿鏉ヨ鍙朮ML鏂囦欢锛屽苟灏嗗叾鍐呴儴缁撴瀯鏄犲皠鍒板唴瀛樹腑銆傚父瑙佺殑瑙f瀽鏂瑰紡鏈塂OM锛圖ocument Object Model锛夊拰SAX锛圫imple API for XML锛夈€侱OM浼氫竴娆℃€у皢鏁翠釜XML鏂囦欢鍔犺浇鍒板唴瀛樹腑锛屾瀯寤轰竴涓爲褰㈢粨鏋勶紝鏂逛究鎴戜滑杩涜闅忔満璁块棶鍜屼慨鏀广€傝€孲AX鍒欐槸浜嬩欢椹卞姩鐨勶紝瀹冨湪璇诲彇XML鏃朵細瑙﹀彂涓€绯诲垪浜嬩欢锛堟瘮濡傞亣鍒板紑濮嬫爣绛俱€佺粨鏉熸爣绛俱€佹枃鏈唴瀹圭瓑锛夛紝鎴戜滑鍙互鍦ㄨ繖浜涗簨浠朵腑澶勭悊鏁版嵁锛?a style="color:#f60; text-decoration:underline;" title="鍐呭瓨鍗犵敤" href="https://www.php.cn/zt/38616.html" target="_blank">鍐呭瓨鍗犵敤灏忥紝浣嗛渶瑕佽嚜宸辩淮鎶ょ姸鎬併€傞€夋嫨鍝釜锛屽線寰€鍙栧喅浜嶺ML鏂囦欢鐨勫ぇ灏忓拰鎴戜滑闇€瑕佺殑鎿嶄綔澶嶆潅鎬с€傚浜庡ぇ澶氭暟鏈哄櫒瀛︿範浠诲姟锛屽鏋滄枃浠朵笉澶э紝DOM浼氭洿鐩磋锛涘鏋滄枃浠跺法澶э紝SAX鎴栧儚Python鐨?div class="code" style="position:relative; padding:0px; margin:0px;">
lxml鐧诲綍鍚庡鍒?/div>搴撲腑鏇撮珮鏁堢殑娴佸紡瑙f瀽鏂规硶浼氭槸棣栭€夈€?/p>
2. 鐗瑰緛鎻愬彇锛?/strong> 杩欐槸鑹烘湳涓庣瀛︾粨鍚堢殑鍦版柟銆傛垜浠渶瑕佷粠XML鐨勬爣绛俱€佸睘鎬у拰鏂囨湰鍐呭涓瘑鍒嚭瀵归娴嬩换鍔℃湁鐢ㄧ殑淇℃伅銆?/p>
- 缁撴瀯鐗瑰緛锛?/strong> 姣斿鏌愪釜鐗瑰畾鏍囩鍑虹幇鐨勯鐜囥€佹爣绛剧殑宓屽娣卞害銆佺壒瀹氱埗瀛愬叧绯荤殑瀛樺湪涓庡惁銆傝繖浜涜兘鍙嶆槧鏁版嵁鐨勭粍缁囨ā寮忋€?/li>
- 鍐呭鐗瑰緛锛?/strong> 杩欐槸鏈€鐩存帴鐨勩€傚鏋淴ML鏍囩鍐呭寘鍚枃鏈紝鎴戜滑鍙兘闇€瑕佸鍏惰繘琛屾枃鏈悜閲忓寲锛堝TF-IDF銆乄ord2Vec銆丅ERT embeddings锛夈€傚鏋滃寘鍚暟鍊硷紝鐩存帴鎻愬彇骞惰繘琛屽綊涓€鍖栨垨鏍囧噯鍖栥€?/li>
-
灞炴€х壒寰侊細 鏍囩鐨勫睘鎬у€煎線寰€鎼哄甫鍏抽敭淇℃伅锛屼緥濡?div class="code" style="position:relative; padding:0px; margin:0px;">
<item id="123" status="active">
鐧诲綍鍚庡鍒?/div>涓殑id
鐧诲綍鍚庡鍒?/div>鍜?div class="code" style="position:relative; padding:0px; margin:0px;">status
鐧诲綍鍚庡鍒?/div>銆傚畠浠彲浠ユ槸绫诲埆鍨嬬壒寰侊紝闇€瑕佽繘琛岀嫭鐑?a style="color:#f60; text-decoration:underline;" title="缂栫爜" href="https://www.php.cn/zt/16108.html" target="_blank">缂栫爜锛圤ne-Hot Encoding锛夋垨鏍囩缂栫爜锛圠abel Encoding锛夈€?/li>
3. 鐗瑰緛閫夋嫨涓庨檷缁达細 鎻愬彇鍑虹殑鐗瑰緛鍙兘闈炲父澶氾紝鏈変簺鏄啑浣欑殑锛屾湁浜涚敋鑷充細寮曞叆鍣0銆傞€氳繃鐗瑰緛閫夋嫨锛堝鍗℃柟妫€楠屻€佷簰淇℃伅锛夋垨闄嶇淮锛堝PCA銆乼-SNE锛夛紝鎴戜滑鍙互淇濈暀鏈€閲嶈鐨勭壒寰侊紝鍑忓皯妯″瀷鐨勫鏉傛€э紝閬垮厤鈥滅淮搴︾伨闅锯€濄€?/p>
4. 鏍煎紡杞崲锛?/strong> 鏈€缁堢洰鏍囨槸灏嗚繖浜涚壒寰佽浆鎹㈡垚鏈哄櫒瀛︿範妯″瀷鑳界洿鎺ュ鐞嗙殑鏍煎紡锛屾渶甯歌鐨勫氨鏄簩缁寸殑鐗瑰緛鐭╅樀锛堝Pandas DataFrame鎴朜umPy鏁扮粍锛夛紝鍏朵腑姣忎竴琛屼唬琛ㄤ竴涓牱鏈紝姣忎竴鍒椾唬琛ㄤ竴涓壒寰併€?/p>
鏁翠釜娴佺▼涓嬫潵锛屾垜浠疄闄呬笂鏄湪鎶奨ML鐨勨€滆涔夌綉鈥濈紪缁囨垚鏈哄櫒瀛︿範妯″瀷鎵€闇€鐨勨€滅壒寰佽〃鈥濓紝杩欎腑闂村厖婊′簡瀵规暟鎹湰韬殑鐞嗚В鍜屽妯″瀷闇€姹傜殑鏉冭 銆?/p> 濡備綍楂樻晥鍦颁粠澶嶆潅XML缁撴瀯涓彁鍙栨満鍣ㄥ涔犳墍闇€鐨勭壒寰侊紵
闈㈠澶嶆潅鐨刋ML缁撴瀯锛屾垜涓汉瑙夊緱锛岄珮鏁堢殑鐗瑰緛鎻愬彇灏卞儚鏄湪涓€鐗囪寕瀵嗙殑妫灄涓鎵剧壒瀹氱殑瀹濊棌銆傚畠涓嶅彧鏄畝鍗曠殑鈥滄壘鈥濓紝鏇撮渶瑕佺瓥鐣ュ拰宸ュ叿銆?/p>
棣栧厛锛?strong>璺緞琛ㄨ揪寮忥紙濡俋Path锛?/strong>鏄垜浠殑鍒╁櫒銆傚畠鍏佽鎴戜滑鍍廏PS涓€鏍风簿鍑嗗畾浣嶅埌XML鏍戜腑鐨勪换浣曚竴涓妭鐐规垨灞炴€с€傛瘮濡傦紝鎴戜滑鍙兘鍙叧蹇?div class="code" style="position:relative; padding:0px; margin:0px;">
<product>鐧诲綍鍚庡鍒?/div>鏍囩涓?div class="code" style="position:relative; padding:0px; margin:0px;">
<price>鐧诲綍鍚庡鍒?/div>鐨勫€硷紝鎴栬€呮墍鏈?div class="code" style="position:relative; padding:0px; margin:0px;">
status="active"鐧诲綍鍚庡鍒?/div>鐨?div class="code" style="position:relative; padding:0px; margin:0px;">
<order>鐧诲綍鍚庡鍒?/div>鑺傜偣鐨勫垱寤烘棩鏈熴€俋Path鑳界洿鎺ュ府鎴戜滑鎶撳彇杩欎簺鐗瑰畾淇℃伅锛岄伩鍏嶄簡閬嶅巻鏁翠釜鏍戠殑寮€閿€銆傚湪Python涓紝
lxml鐧诲綍鍚庡鍒?/div>搴撳XPath鐨勬敮鎸侀潪甯稿己澶э紝鐢ㄨ捣鏉ュ緱蹇冨簲鎵嬨€?/p>
鍏舵锛?strong>鐞嗚ВXML鐨勨€滀笂涓嬫枃鈥?/strong>鑷冲叧閲嶈銆備竴涓爣绛剧殑鎰忎箟锛屽線寰€涓庡叾鐖惰妭鐐广€佸厔寮熻妭鐐圭敋鑷崇鍏堣妭鐐圭浉鍏炽€備緥濡傦紝涓€涓?div class="code" style="position:relative; padding:0px; margin:0px;">
<value>鐧诲綍鍚庡鍒?/div>鏍囩锛屽鏋滃畠鍦?div class="code" style="position:relative; padding:0px; margin:0px;">
<temperature>鐧诲綍鍚庡鍒?/div>涓嬶紝浠h〃娓╁害鍊硷紱濡傛灉瀹冨湪
<humidity>鐧诲綍鍚庡鍒?/div>涓嬶紝鍒欎唬琛ㄦ箍搴︺€傝繖鏃跺€欙紝鎴戜滑涓嶈兘瀛ょ珛鍦版彁鍙?div class="code" style="position:relative; padding:0px; margin:0px;">
value鐧诲綍鍚庡鍒?/div>锛岃€岄渶瑕佸皢瀹冪殑鐖惰妭鐐逛俊鎭篃浣滀负鐗瑰緛銆傝繖閫氬父闇€瑕佹垜浠繘琛屾爲缁撴瀯鐨勯亶鍘嗭紙娣卞害浼樺厛鎴栧箍搴︿紭鍏堬級锛屽湪閬嶅巻杩囩▼涓瀯寤鸿矾寰勬垨璁板綍涓婁笅鏂囦俊鎭€備綘鍙互鎯宠薄鎴愶紝鎴戜滑涓嶄粎璁板綍浜嗏€滄槸浠€涔堚€濓紝杩樿褰曚簡鈥滃湪鍝噷鈥濄€?/p>
鍐嶆潵锛?strong>瀵规枃鏈唴瀹圭殑娣卞害澶勭悊鏄笉鍙垨缂虹殑銆傚鏋淴ML涓寘鍚ぇ閲忔弿杩版€ф枃鏈紝渚嬪浜у搧鎻忚堪銆佺敤鎴疯瘎璁猴紝閭d箞绠€鍗曠殑璇嶉缁熻鍙兘涓嶅銆傛垜浠簲璇ヨ€冭檻浣跨敤鏇撮珮绾х殑鏂囨湰鍚戦噺鍖栨妧鏈€備緥濡傦紝浣跨敤棰勮缁冪殑璇嶅祵鍏ユā鍨嬶紙濡俉ord2Vec銆丟loVe锛夊皢璇嶈鏄犲皠鍒板悜閲忕┖闂达紝鎴栬€呭埄鐢═ransformer妯″瀷锛堝BERT锛夋潵鎹曟崏鏇村鏉傜殑璇箟鍏崇郴銆傝繖浜涘悜閲忓彲浠ヤ綔涓洪珮缁寸壒寰佸姞鍏ュ埌鎴戜滑鐨勬暟鎹泦涓紝鏋佸ぇ鍦颁赴瀵屼簡妯″瀷鐨勮〃杈捐兘鍔涖€?/p>
鏈€鍚庯紝鍒╃敤濂芥爣绛惧拰灞炴€х殑绫诲埆淇℃伅銆俋ML鐨勬爣绛惧悕鏈韩灏卞甫鏈夎涔夛紝姣斿
item鐧诲綍鍚庡鍒?/div>銆?div class="code" style="position:relative; padding:0px; margin:0px;">
user鐧诲綍鍚庡鍒?/div>銆?div class="code" style="position:relative; padding:0px; margin:0px;">
transaction鐧诲綍鍚庡鍒?/div>銆傝繖浜涘彲浠ヨ瑙嗕负绫诲埆鐗瑰緛銆傚悓鏍凤紝灞炴€у€硷紙濡?div class="code" style="position:relative; padding:0px; margin:0px;">
type="book"鐧诲綍鍚庡鍒?/div>銆?div class="code" style="position:relative; padding:0px; margin:0px;">
status="completed"鐧诲綍鍚庡鍒?/div>锛変篃鏄噸瑕佺殑绫诲埆淇℃伅銆傛垜浠彲浠ュ杩欎簺鏍囩鍚嶅拰灞炴€у€艰繘琛岀嫭鐑紪鐮佹垨宓屽叆锛岃妯″瀷鑳藉鐞嗚В瀹冧滑涔嬮棿鐨?a style="color:#f60; text-decoration:underline;" title="鍖哄埆" href="https://www.php.cn/zt/27988.html" target="_blank">鍖哄埆鍜岃仈绯汇€備緥濡傦紝鎴戜滑鍙互缁熻鐗瑰畾鏍囩鐨勫嚭鐜版鏁帮紝鎴栬€呭皢鏍囩璺緞锛堝
/root/products/product/name鐧诲綍鍚庡鍒?/div>锛変綔涓轰竴绉嶇壒寰併€?/p>
璇村疄璇濓紝杩欎釜杩囩▼娌℃湁涓€鍔虫案閫哥殑鏂规锛屾洿澶氱殑鏄牴鎹叿浣撶殑涓氬姟鍦烘櫙鍜屾暟鎹壒鐐癸紝鐏垫椿杩愮敤杩欎簺宸ュ叿鍜岀瓥鐣ャ€傛垜甯稿父鍙戠幇锛岃姳鏃堕棿娣卞叆鐞嗚ВXML鏁版嵁鐨勫唴鍦ㄩ€昏緫锛屾瘮鐩茬洰灏濊瘯鍚勭鎻愬彇鏂规硶瑕佹湁鏁堝緱澶氥€?/p> DOM鍜孲AX瑙f瀽鍣ㄥ湪XML棰勫鐞嗕腑鍚勬湁鍝簺浼樼己鐐癸紵
鍦╔ML棰勫鐞嗙殑涓栫晫閲岋紝DOM鍜孲AX灏卞儚鏄袱绉嶆埅鐒朵笉鍚岀殑鏃呰鏂瑰紡锛屽悇鏈夊悇鐨勯€傜敤鍦烘櫙銆傛垜涓汉鍦ㄩ€夋嫨鏃讹紝甯稿父浼氱籂缁撲簬鍐呭瓨鍜屼究鎹锋€т箣闂寸殑鏉冭 銆?/p>
Teleporthq
涓€浣撳寲AI缃戠珯鐢熸垚鍣紝鑳藉蹇€熻璁″拰閮ㄧ讲闈欐€佺綉绔?/p>
182
鏌ョ湅璇︽儏
DOM (Document Object Model) 瑙f瀽鍣細
-
浼樼偣锛?/strong>
- 鏄撶敤鎬ч珮锛?/strong> 瀹冨皢鏁翠釜XML鏂囨。鍔犺浇鍒板唴瀛樹腑锛屾瀯寤轰竴涓畬鏁寸殑鏍戝舰缁撴瀯銆傝繖浣垮緱鎴戜滑鍙互鍍忔搷浣滀竴涓璞′竴鏍凤紝閫氳繃鑺傜偣銆佸睘鎬х瓑杩涜闅忔満璁块棶鍜屼慨鏀广€傚鏋滀綘闇€瑕侀绻佸湴鍦╔ML鏍戜腑璺宠浆銆佹煡璇㈢壒瀹氳妭鐐癸紝鎴栬€呯敋鑷抽渶瑕佸湪瑙f瀽鍚庝慨鏀规枃妗o紝DOM鏃犵枒鏄洿鏂逛究鐨勯€夋嫨銆?/li>
- 鍏ㄥ眬瑙嗗浘锛?/strong> 鎻愪緵瀵规暣涓枃妗g殑瀹屾暣瑙嗗浘锛屼究浜庣悊瑙e拰澶勭悊澶嶆潅鐨勫眰绾у叧绯汇€?/li>
-
缂虹偣锛?/strong>
- 鍐呭瓨娑堣€楀ぇ锛?/strong> 杩欐槸DOM鏈€澶х殑鐥涚偣銆傛暣涓枃妗i兘鍦ㄥ唴瀛樹腑锛屽浜庡ぇ鍨媂ML鏂囦欢锛堝嚑鐧綧B鐢氳嚦GB绾у埆锛夛紝瀹冨彲鑳戒細鑰楀敖绯荤粺鍐呭瓨锛屽鑷寸▼搴忓穿婧冩垨杩愯缂撴參銆傝繖鍦ㄥ鐞嗘捣閲忔暟鎹繘琛屾満鍣ㄥ涔犻澶勭悊鏃讹紝鏄釜闈炲父鐜板疄鐨勬寫鎴樸€?/li>
- 鎬ц兘寮€閿€锛?/strong> 鏋勫缓鏁翠釜DOM鏍戞湰韬氨闇€瑕佹椂闂村拰璁$畻璧勬簮銆?/li>
SAX (Simple API for XML) 瑙f瀽鍣細
-
浼樼偣锛?/strong>
- 鍐呭瓨鏁堢巼楂橈細 SAX鏄簨浠堕┍鍔ㄧ殑銆傚畠涓嶄細灏嗘暣涓枃妗e姞杞藉埌鍐呭瓨涓紝鑰屾槸閫愯璇诲彇锛屽苟鍦ㄩ亣鍒癤ML缁撴瀯涓殑鐗瑰畾浜嬩欢锛堝鍏冪礌寮€濮嬨€佸厓绱犵粨鏉熴€佹枃鏈唴瀹癸級鏃堕€氱煡浣犮€傝繖鎰忓懗鐫€瀹冪殑鍐呭瓨鍗犵敤鏋佷綆锛岄潪甯搁€傚悎澶勭悊瓒呭ぇ鍨媂ML鏂囦欢銆?/li>
- 澶勭悊閫熷害蹇細 鐢变簬涓嶆瀯寤哄畬鏁寸殑鍐呭瓨鏍戯紝SAX鍦ㄨВ鏋愰€熷害涓婇€氬父姣擠OM鏇村揩銆?/li>
- 娴佸紡澶勭悊锛?/strong> 閫傚悎浜庢暟鎹祦鐨勫満鏅紝渚嬪浠庣綉缁滄帴鏀禭ML鏁版嵁骞跺疄鏃跺鐞嗐€?/li>
-
缂虹偣锛?/strong>
- 缂栫▼澶嶆潅鎬ч珮锛?/strong> SAX瑕佹眰鎴戜滑鑷繁缂栧啓浜嬩欢澶勭悊鍣?/a>鏉ョ淮鎶よВ鏋愮姸鎬併€備緥濡傦紝濡傛灉浣犳兂鐭ラ亾褰撳墠瑙f瀽鐨勫厓绱犲湪XML鏍戜腑鐨勫畬鏁磋矾寰勶紝浣犻渶瑕佽嚜宸辫窡韪埗瀛愬叧绯汇€傝繖浣垮緱浠g爜閫昏緫鐩稿澶嶆潅锛屽鏄撳嚭閿欍€?/li>
- 鏃犳硶淇敼鏂囨。锛?/strong> SAX鏄彧璇荤殑锛屽畠鍙兘瑙f瀽鍜屾姤鍛婁簨浠讹紝涓嶈兘鐢ㄦ潵淇敼XML鏂囨。銆?/li>
- 缂轰箯闅忔満璁块棶鑳藉姏锛?/strong> 鐢变簬鏄『搴忓鐞嗭紝浣犳棤娉曠洿鎺ヨ烦鍒版枃妗g殑鏌愪釜閮ㄥ垎銆傚鏋滀綘闇€瑕佺殑淇℃伅鍦ㄦ枃妗g殑涓嶅悓浣嶇疆锛屼綘鍙兘闇€瑕佸娆¢亶鍘嗘垨鏇村鏉傜殑閫昏緫鏉ュ叧鑱斿畠浠€?/li>
鍦ㄦ満鍣ㄥ涔犻澶勭悊鐨勮澧冧笅锛屽鏋滀綘鐨刋ML鏂囦欢涓嶅ぇ锛屾垨鑰呬綘闇€瑕佸XML缁撴瀯杩涜澶嶆潅鐨勫垎鏋愬拰鐗瑰緛缁勫悎锛孌OM锛堟垨绫讳技
ElementTree鐧诲綍鍚庡鍒?/div>杩欑鍏奸【鏄撶敤鎬у拰鎬ц兘鐨勫簱锛夊彲鑳戒細璁╀綘浜嬪崐鍔熷€嶃€備絾濡傛灉浣犵殑鏁版嵁鏄疶B绾у埆鐨勬棩蹇楁枃浠讹紝鎴栬€呬綘鍙渶瑕佷粠涓娊鍙栫壒瀹氱被鍨嬬殑鏁版嵁娴侊紝閭d箞SAX鎴栧熀浜庢祦鐨勮В鏋愭柟妗堬紙渚嬪
lxml鐧诲綍鍚庡鍒?/div>鐨?div class="code" style="position:relative; padding:0px; margin:0px;">
iterparse鐧诲綍鍚庡鍒?/div>锛夊皢鏄洿鏄庢櫤鐨勯€夋嫨銆傝鍒板簳锛岄€夋嫨鍝瑙f瀽鍣紝灏卞儚鏄€夋嫨涓€杈嗚溅锛屽緱鐪嬩綘瑕佽窇鐨勮矾鍐靛拰杩愰€佺殑璐х墿鏄粈涔堛€?/p> 灏哫ML鏁版嵁杞崲涓篜andas DataFrame鎴朜umPy鏁扮粍鐨勫疄鐢ㄦ柟娉?/h3>
灏哫ML鏁版嵁杞寲涓篜andas DataFrame鎴朜umPy鏁扮粍锛屾槸杩炴帴XML涓庢満鍣ㄥ涔犳ā鍨嬬殑妗ユ銆傛垜鍙戠幇锛岃繖涓繃绋嬬殑鍏抽敭鍦ㄤ簬灏哫ML鐨勫眰绾х粨鏋勨€滄墎骞冲寲鈥濅负琛ㄦ牸褰㈠紡銆?/p>
杞崲涓篜andas DataFrame锛?/strong>
杩欐槸鏈€甯哥敤涔熸渶鐏垫椿鐨勬柟娉曘€傚畠鐨勬牳蹇冩€濊矾鏄細閬嶅巻XML锛屼负姣忎釜鈥滄牱鏈€濇彁鍙栧嚭鎵€闇€鐨勬墍鏈夌壒寰侊紝灏嗚繖浜涚壒寰佺粍缁囨垚涓€涓瓧鍏革紝鐒跺悗灏嗚繖浜涘瓧鍏哥殑鍒楄〃杞寲涓篋ataFrame銆?/p>
鍋囪鎴戜滑鏈変竴涓猉ML鏂囦欢锛岄噷闈㈠寘鍚涓?div class="code" style="position:relative; padding:0px; margin:0px;">
<record>鐧诲綍鍚庡鍒?/div>锛屾瘡涓?div class="code" style="position:relative; padding:0px; margin:0px;">
<record>鐧诲綍鍚庡鍒?/div>鏈?div class="code" style="position:relative; padding:0px; margin:0px;">
<id>鐧诲綍鍚庡鍒?/div>銆?div class="code" style="position:relative; padding:0px; margin:0px;">
<name>鐧诲綍鍚庡鍒?/div>鍜?div class="code" style="position:relative; padding:0px; margin:0px;">
<value>鐧诲綍鍚庡鍒?/div>绛夋爣绛撅紝浠ュ強涓€涓?div class="code" style="position:relative; padding:0px; margin:0px;">
type鐧诲綍鍚庡鍒?/div>灞炴€э細
<data>
<record type="A">
<id>1</id>
<name>Item One</name>
<value>10.5</value>
<description>This is a description for item one.</description>
</record>
<record type="B">
<id>2</id>
<name>Item Two</name>
<value>20.0</value>
<tags>alpha, beta</tags>
</record>
<record type="A">
<id>3</id>
<name>Item Three</name>
<value>15.2</value>
<description>Another description.</description>
</record>
</data>鐧诲綍鍚庡鍒?/div>鎴戜滑鍙互杩欐牱澶勭悊锛堜娇鐢≒ython鐨?div class="code" style="position:relative; padding:0px; margin:0px;">
lxml鐧诲綍鍚庡鍒?/div>搴擄紝瀹冩€ц兘寰堝ソ锛夛細
from lxml import etree
import pandas as pd
xml_string = """
<data>
<record type="A">
<id>1</id>
<name>Item One</name>
<value>10.5</value>
<description>This is a description for item one.</description>
</record>
<record type="B">
<id>2</id>
<name>Item Two</name>
<value>20.0</value>
<tags>alpha, beta</tags>
</record>
<record type="A">
<id>3</id>
<name>Item Three</name>
<value>15.2</value>
<description>Another description.</description>
</record>
</data>
"""
root = etree.fromstring(xml_string)
records_data = []
# 閬嶅巻鎵€鏈?lt;record>鑺傜偣
for record_elem in root.xpath('//record'):
record_dict = {}
# 鎻愬彇灞炴€? record_dict['type'] = record_elem.get('type')
# 鎻愬彇瀛愭爣绛惧唴瀹? record_dict['id'] = record_elem.xpath('./id/text()')[0] if record_elem.xpath('./id/text()') else None
record_dict['name'] = record_elem.xpath('./name/text()')[0] if record_elem.xpath('./name/text()') else None
record_dict['value'] = float(record_elem.xpath('./value/text()')[0]) if record_elem.xpath('./value/text()') else None
# 瀵逛簬鍙兘涓嶅瓨鍦ㄧ殑鏍囩锛岄渶瑕佸仛None澶勭悊锛岄伩鍏岾eyError
description = record_elem.xpath('./description/text()')
record_dict['description'] = description[0] if description else None
tags = record_elem.xpath('./tags/text()')
record_dict['tags'] = tags[0] if tags else None
records_data.append(record_dict)
df = pd.DataFrame(records_data)
# 姝ゆ椂 df 宸茬粡鏄竴涓粨鏋勫寲鐨勮〃鏍?# df['value'] = pd.to_numeric(df['value']) # 纭繚鏁板€肩被鍨?# df['id'] = pd.to_numeric(df['id']) # 纭繚鏁板€肩被鍨?print(df)鐧诲綍鍚庡鍒?/div>鍦ㄨ繖涓緥瀛愪腑锛?div class="code" style="position:relative; padding:0px; margin:0px;">
df鐧诲綍鍚庡鍒?/div>浼氬寘鍚?div class="code" style="position:relative; padding:0px; margin:0px;">
type鐧诲綍鍚庡鍒?/div>,
id鐧诲綍鍚庡鍒?/div>,
name鐧诲綍鍚庡鍒?/div>,
value鐧诲綍鍚庡鍒?/div>,
description鐧诲綍鍚庡鍒?/div>,
tags鐧诲綍鍚庡鍒?/div>杩欎簺鍒椼€傚浜?div class="code" style="position:relative; padding:0px; margin:0px;">
description鐧诲綍鍚庡鍒?/div>鍜?div class="code" style="position:relative; padding:0px; margin:0px;">
tags鐧诲綍鍚庡鍒?/div>杩欑被鏂囨湰鐗瑰緛锛屽悗缁彲鑳借繕闇€瑕佽繘琛孴F-IDF鎴栬瘝宓屽叆澶勭悊锛岀劧鍚庡皢缁撴灉浣滀负鏂扮殑鍒楀姞鍏ataFrame銆?/p>
杞崲涓篘umPy鏁扮粍锛?/strong>
NumPy鏁扮粍閫氬父鏄満鍣ㄥ涔犳ā鍨嬬洿鎺ユ帴鍙楃殑杈撳叆鏍煎紡锛屼絾瀹冭姹傛墍鏈夋暟鎹兘鏄暟鍊肩被鍨嬨€傚洜姝わ紝鎴戜滑閫氬父浼氬厛灏嗘暟鎹暣鐞嗗埌Pandas DataFrame涓紝鐒跺悗杩涜鏁板€煎寲澶勭悊锛堝绫诲埆鐗瑰緛鐨勭嫭鐑紪鐮併€佹枃鏈壒寰佺殑鍚戦噺鍖栵級锛屾渶鍚庡啀杞崲涓篘umPy鏁扮粍銆?/p>
import numpy as np
# 鍋囪df鏄笂闈㈢敓鎴愮殑DataFrame
# 棣栧厛澶勭悊闈炴暟鍊煎垪
# 绀轰緥锛氬'type'杩涜鐙儹缂栫爜
df_processed = pd.get_dummies(df, columns=['type'], prefix='type')
# 绀轰緥锛氬鐞?description'鍜?tags'锛堣繖閲岀畝鍖栦负濉厖缂哄け鍊硷紝瀹為檯闇€杩涜鏂囨湰鍚戦噺鍖栵級
df_processed['description'] = df_processed['description'].fillna('')
df_processed['tags'] = df_processed['tags'].fillna('')
# 鍋囪鎴戜滑鍙敤鏁板€煎垪鍜岀嫭鐑紪鐮佸悗鐨勫垪鏉ヨ缁冩ā鍨?# 鎺掗櫎鏂囨湰鍒楀拰鍘熷ID/Name锛堝鏋滃畠浠笉鐩存帴浣滀负鐗瑰緛锛?features_df = df_processed[['id', 'value', 'type_A', 'type_B']]
# 纭繚鎵€鏈夊垪閮芥槸鏁板€肩被鍨?features_df = features_df.apply(pd.to_numeric, errors='coerce')
features_df = features_df.fillna(0) # 濉厖鍙兘鍥犱负coerce浜х敓鐨凬aN
X = features_df.to_numpy()
print("\nNumPy Array:")
print(X)鐧诲綍鍚庡鍒?/div>杩欓噷锛屾垜浠厛瀵?div class="code" style="position:relative; padding:0px; margin:0px;">
type鐧诲綍鍚庡鍒?/div>鍒楄繘琛屼簡鐙儹缂栫爜锛岀劧鍚庡皢
id鐧诲綍鍚庡鍒?/div>銆?div class="code" style="position:relative; padding:0px; margin:0px;">
value鐧诲綍鍚庡鍒?/div>浠ュ強缂栫爜鍚庣殑
type_A鐧诲綍鍚庡鍒?/div>銆?div class="code" style="position:relative; padding:0px; margin:0px;">
type_B鐧诲綍鍚庡鍒?/div>鍒楁彁鍙栧嚭鏉ワ紝鏈€缁堣浆鍖栦负NumPy鏁扮粍
X鐧诲綍鍚庡鍒?/div>銆傛枃鏈壒寰佺殑鍚戦噺鍖栫粨鏋滐紙姣斿涓€涓枃鏈钀借浆鎹㈡垚鐨?00缁村悜閲忥級涔熷彲浠ヤ綔涓?00涓柊鐨勫垪鍔犲叆DataFrame锛屽啀涓€骞惰浆鎹负NumPy鏁扮粍銆?/p>
鎴戜釜浜鸿寰楋紝杩欎釜杞崲杩囩▼鏈€閲嶈鐨勬槸鐏垫椿鎬?/strong>銆俋ML缁撴瀯鍗冨彉涓囧寲锛屾病鏈変竴濂椾唬鐮佽兘閫氬悆鎵€鏈夋儏鍐点€備綘闇€瑕佹牴鎹叿浣撶殑XML schema鍜屾満鍣ㄥ涔犱换鍔$殑闇€姹傦紝瀹氬埗鍖栦綘鐨勮В鏋愬拰鐗瑰緛鎻愬彇閫昏緫銆傝繖灏卞儚鏄洉濉戯紝浣犲緱鏍规嵁鍘熸潗鏂欑殑褰㈢姸鍜屾渶缁堟兂瑕佺殑浣滃搧锛屽幓绮剧粏鍦版墦纾ㄣ€?/p>
浠ヤ笂灏辨槸XML濡備綍涓庢満鍣ㄥ涔犳暣鍚堬紵 XML鏍煎紡鏁版嵁鍦ㄦ満鍣ㄥ涔犺缁冧腑鐨勯澶勭悊鏂规硶鐨勮缁嗗唴瀹癸紝鏇村璇峰叧娉╬hp涓枃缃戝叾瀹冪浉鍏虫枃绔狅紒
鐩稿叧鏍囩锛?/span> word python 澶勭悊鍣?/a> 缂栫爜 app 宸ュ叿 鍖哄埆 xml瑙f瀽 鍐呭瓨鍗犵敤 Python numpy pandas Object for xml 鍊肩被鍨?/a> 瀵硅薄 浜嬩欢 dom word2vec transformer bert 閲嶆瀯 澶у閮藉湪鐪嬶細 RSS璁㈤槄濡備綍鏍囪宸茶锛?RSS闃呰鍣ㄥ凡璇荤姸鎬佹爣璁颁笌鍚屾鐨勫疄鐜版柟娉?/a> SAX瑙f瀽鍣ㄧ殑宸ヤ綔娴佺▼鏄€庢牱鐨勶紵 RSS闃呰鍣ㄧ晫闈㈠浣曡璁★紵 RSS闃呰鍣ㄥ浣曞疄鐜版洿鏂版彁閱掞紵 XML濡備綍琛ㄧず鍖栧缁撴瀯锛?/a>






发表评论:
◎欢迎参与讨论,请在这里发表您的看法、交流您的观点。