Python Selenium:从嵌套类中提取动态表

我试图从两个表中检索数据，这两个表的数据每0.5秒更新一次，该表存在于嵌套类结构中。我使用了Selenium的driver.find_element_by_xpath，并试图通过div[contains(@class,one)]/div[contains(@class,two)]...序列获取数据，但没有成功。

我尝试了它的其他可用方法，并搜索了其他资源，但无法接近。

任何线索！这是HTML代码：

<body>
<div class="one"> 
<div class="two">
<div class="three">
<!--- First table -->
<table class="Jan">
<thread>...</thread>
<tbody>
<tr>
<td class='cost'>100 </td>
</tr>
<tr>
<td class='cost'>86 </td>
</tr>
.
.
.
<tr>
<td class='cost'>56 </td>
</tr>           
</tbody>
<tfoot>...</tfoot>
</table>
<!--- Second table -->
<table class="Feb">
<thread>...</thread>
<tbody>
<tr>
<td class='cost'>-856 </td>
</tr>
<tr>
<td class='cost'>-8986 </td>
</tr>
.
.
.
<tr>
<td class='cost'>-9856 </td>
</tr>           
</tbody>
<tfoot>...</tfoot>
</table>
</div>
</div>
</div>

您可以创建这样的类名列表，然后迭代查找表的行数，然后获取文本。

listclass=['Jan','Feb']

代码：

listclass=['Jan','Feb']
for item in listclass:
tablerows=driver.find_elements_by_xpath("//table[@class='"+ item + "']/tbody/tr")
data=[row.get_attribute("textContent") for row in tablerows]
print(data)

列表输出：

['100', '86', '56']
['-856', '-8986', '-9856']

您可以使用panda和read_html()并将数据加载到dataframe中的另一个选项。

代码：

driver.get("url here")
time.sleep(3)
page=driver.page_source
dfs=pd.read_html(page)
for df in dfs:
print(df.T)

控制台输出：

0   1   2
0  100  86  56
0     1     2
0 -856 -8986 -9856

相关内容

最新更新

热门标签：