如何使用python和selenium对具有相同类名的表数据(包括合并的单元格)进行网络抓取



从图像中,

2002和12月以及1262.516->>在html标记中,2002为td[1],12月为td[2],1262.516为td[3]

紧接着的下一行,假设2002年第4季度为td[1]4017.422 td[2]

那么我如何在国内存储4017.422?

https://i.stack.imgur.com/45F3v.png

如果有人知道请评论你的电子邮件id。

此处的链接可参考:https://www.transtats.bts.gov/freight.asp

您需要导入以下库。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

要获得所有元素,请尝试以下代码。visibility_of_all_elements_located((和随后的xpath

allelements=WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.XPATH,"//td[@class='dataTD' and @colspan='2']/following::td[1]")))
for item in allelements:
print(item.text)

或者使用以下css选择器。

allelements=WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"td.dataTD[colspan='2']+td")))
for item in allelements:
print(item.text)

**预期输出**:

Year        Month         Domestic
2003         December        1,424.216
2003Total       -           15,232.525
%Chg over 2002  -
Q4 Only                       5.13% 
2004        January         1,234.820   

编辑基于您的预期输出。这是更新后的代码。创建一个函数并传递字符串值。

def Get_details(strtext):
Year=[]
Month=[]
Domestic=[]
allelements=WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,"td.dataTD")))
for element in allelements:
if element.text==strtext and element.get_attribute("colspan")=='2':
Year.append(element.find_element_by_xpath("./parent::tr/preceding-sibling::tr[1]/td[1]").text)
Month.append(element.find_element_by_xpath("./parent::tr/preceding-sibling::tr[1]/td[2]").text)
Domestic.append(element.find_element_by_xpath("./parent::tr/preceding-sibling::tr[1]/td[3]").text)
Year.append(element.text)
Month.append("-")
Domestic.append(element.find_element_by_xpath("./following::td[1]").text)
if element.find_element_by_xpath("./parent::tr/following-sibling::tr[1]/td[1]").get_attribute("colspan")=='2':
Year.append(element.find_element_by_xpath("./parent::tr/following-sibling::tr[1]/td[1]").text)
Month.append("-")
Domestic.append(element.find_element_by_xpath("./parent::tr/following-sibling::tr[1]/td[2]").text)
Year.append(element.find_element_by_xpath("./parent::tr/following-sibling::tr[2]/td[1]").text)
Month.append(element.find_element_by_xpath("./parent::tr/following-sibling::tr[2]/td[2]").text)
Domestic.append(element.find_element_by_xpath("./parent::tr/following-sibling::tr[2]/td[3]").text)

print(Year)
print(Month)
print(Domestic)

如果使用2003 Total调用函数

Get_details("2003 Total")

输出

['2003', '2003 Total', '%Chg over 2002nQ4 Only', '2004']
['December', '-', '-', 'January']
['1,424.216', '15,232.525', '5.13%', '1,234.820']

如果使用2004 Total调用函数

Get_details("2004 Total")

输出

['2004', '2004 Total', '%Chg over 2003', '2005']
['December', '-', '-', 'January']
['1,526.984', '16,452.807', '8.01%', '1,280.516']

相关内容

最新更新