Python,从DOM中提取相同的元素并制作列表



这是一种在Python中制作脚本的方法吗;abc=";并列出清单?例如,我有一个代码,其中包含网格中的元素列表:

<div data-componentid="fa-gamepad" class="x-component x-button x-has-icon mainTopMenuItem
widerIcon x-icon-align-left x-arrow-align-right x-layout-box-item x-layout-hbox-item" 
data-xid="237" data-exttouchaction="11" id="fa-gamepad" senchatest="mainMenu_game">
<div class="x-inner-el" id="ext-element-877">
<div class="x-body-el" id="ext-element-876" style="padding: 0px !important;">
<div class="x-icon-el x-font-icon x-fa fa-gamepad" id="ext-element-878">
</div>
<div class="x-text-el">
</div>
</div>
<div class="x-arrow-el x-font-icon">
</div>
</div>
<div class="x-badge-el">
</div>
<button class="x-button-el" type="button" id="ext-element-875" data-componentid="fa-gamepad">
</button>
</div> 
<div data-componentid="plus" class="x-component x-button x-has-icon mainTopMenuItem x-icon-align-left x-arrow-align-right x-has-menu x-layout-box-item x-layout-hbox-item" data-xid="238" data-exttouchaction="11" id="plus" senchatest="mainMenu_plus">
<div class="x-inner-el" id="ext-element-881">
<div class="x-body-el" id="ext-element-880" style="padding: 0px !important;">
<div class="x-icon-el x-font-icon x-fa fa-plus" id="ext-element-882"></div><div class="x-text-el">
</div>
</div>
<div class="x-arrow-el x-font-icon">
</div>
</div>
<div class="x-badge-el">
</div>
<button class="x-button-el" type="button" id="ext-element-879" data-componentid="plus">
</button>
</div>

在上面的代码中,我有两个元素;senchatest=";现在我想让Python找到这些元素,并列出它们的列表,如下所示:

senchatest="mainMenu_game"
senchatest="mainMenu_plus"

在我的HTML代码中,我有>300个这样的元素,我需要列出它们进行测试。

我们可以使用Beautiful Soup,这是一个用于从HTML和XML文件中提取数据的python库。

# Importing BeautifulSoup class from the bs4 module
from bs4 import BeautifulSoup
import re

# Opening the html file(test.html contains the code snippet shared in the question)
HTMLFile = open("test.html", "r")

# Reading the file
index = HTMLFile.read()

# Creating a BeautifulSoup object and specifying the parser
S = BeautifulSoup(index, 'lxml')
#list to hold the values
l=[]
#find all 'div' tags
tag_name = S.find_all('div')
for tag in tag_name:
#search for 'senchatest' in tags within div 
if 'senchatest' in str(tag):
tag=str(tag)
#split the tag at 'senchatest'
x = tag.partition("senchatest=")[2]
#extract the value after "senchatest="
x = x.split(""")[1]
#append to list
l.append(x)
#To list as them , as you have mentioned in your expected output
for i in l:
print("senchatest=" +"""+i+""")

输出为:

senchatest="mainMenu_game"
senchatest="mainMenu_plus"

最新更新