无法创建循环以自定义方式打印项目



我正在尝试使用请求解析网页的内容。内容看起来在一个表中,但实际上没有tr或td标记。然而,由于内容是动态的,我想利用硒来获取它们。到目前为止,我写的剧本可以在一行中抓住它们。我现在想做的是连续打印每5个项目,等等,就像它们在该网站上的可见方式一样。

网站地址

到目前为止,我已经写过:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = 'https://www.usdebtclock.org/world-debt-clock.html'
with webdriver.Chrome() as driver:
wait = WebDriverWait(driver,10)
driver.get(link)
for elem in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"[id^='layer'] > span"))):
print(elem.text)

我得到的输出是:

330,151,536
$26,755,252,293,256
103.70%
$19,568,761,011,755
128.27%
1,438,097,306
$7,331,488,897,052
50.34%
$14,562,936,025,896
15.94%
126,619,535
$12,174,468,045,729
268.35%
$4,536,842,867,333

我希望得到的输出:

330,151,536 $26,755,252,293,256 103.70% $19,568,761,011,755 128.27%
1,438,097,306 $7,331,488,897,052 50.34% $14,562,936,025,896 15.94%
126,619,535 $12,174,468,045,729 268.35% $4,536,842,867,333 91.86%
83,793,324 $2,936,570,308,213 78.11% $3,759,189,613,048 166.88%

如何实现预期输出

这看起来是列表分块的完美匹配。

把你得到的元素列表分成一个5元素列表。然后循环,连接子列表并打印它们。

试试这个:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

link = 'https://www.usdebtclock.org/world-debt-clock.html'

def chunks(_list, chunk_size):
for i in range(0, len(_list), chunk_size):
yield _list[i:i + chunk_size]

with webdriver.Chrome() as driver:
wait = WebDriverWait(driver, 10)
driver.get(link)
elements = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[id^='layer'] > span")))
for chunk in chunks(elements, 5):
print(' '.join(e.text for e in chunk))

输出:

330,151,637 $26,755,352,639,988 103.70% $19,568,816,364,550 128.26%
1,438,097,743 $7,331,515,883,064 50.34% $14,562,956,376,740 15.94%
126,619,510 $12,174,512,856,639 268.35% $4,536,849,207,058 91.86%
83,793,352 $2,936,581,116,897 78.11% $3,759,192,266,049 166.88%
and so on...

您可以通过创建一个数组来保存所有结果,并将每个结果表示为一个数组本身来解决这个问题。

我还没有测试过这段代码,但想法是将元素添加到它们自己的"中;行";直到你遇到一个"%"&";。然后,将found_percent设置为True。一旦你找到了另一个,也就是说这是行中的最后一个元素,你就把当前行(它是完整的(添加到你的结果中,并擦除current_row,这样你就可以把下一行放进去。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = 'https://www.usdebtclock.org/world-debt-clock.html'
results = []
with webdriver.Chrome() as driver:
wait = WebDriverWait(driver,10)
driver.get(link)
current_row = []
found_first_percent = False
for elem in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"[id^='layer'] > span"))):
current_row.append(elem)
if "%" in elem:
if found_first_percent:
results.append(current_row)
current_row = []
found_first_percent = False
else:
found_first_percent = True
for row in results:
print(row)

最新更新