使用python BeautifulSoup从网站下载csv文件



我正试图从起诉python BeautifulSoup的网站下载csv文件。在点击下载按钮之前,我需要在屏幕上进行一些选择。这是网站URL(https://apps.who.int/flumart/Default?ReportNo=16)。我是Python的新手。通过谷歌搜索,我可以编写以下代码。但我陷入了前进的代码中。如果有人帮助我,那将是有帮助的:

import requests
import html5lib
import bs4
import requests 
from bs4 import BeautifulSoup 
URL = "http://apps.who.int/flumart/Default?ReportNo=16"
#"http://www.values.com/inspirational-quotes"
#https://apps.who.int/flumart/Default?ReportNo=16
r = requests.get(URL) 
soup = BeautifulSoup(r.content, 'html5lib') # If this line causes an error, run 'pip install 
html5lib' 
or install html5lib
#soup.prettify()
quotes=[]  # a list to store quotes 
#Filter by:
optionFilterBy = soup.find('div', attrs = {'id':'ctl_ReportViewer_ctl04_ctl03'}) 
optionFilterBy1 = optionFilterBy.select('selected' option[value] = 1')

#Select by:
selectby = soup.find('div', attrs = {'id':'div id='ctl_ReportViewer_ctl04_ctl05'}) 
selectby.select('alt')= "Albania"

#From year 
Fromyear = soup.find('select', attrs = {'id': 'ctl_ReportViewer_ctl04_ctl07_ddValue'}) 
Fromyear.select('option[value]= '2020')

#To year:
Toyear = soup.find('select', attrs = {'id':'ctl_ReportViewer_ctl04_ctl09_ddValue'}) 
Toyear.select('option[value]= '2020')

#fro mweek:  
fromWeek = soup.find('select', attrs = {'id':'ctl_ReportViewer_ctl04_ctl11_ddValue'}) 
fromWeek.select('option[value]= '1')            
#To week:
ToWeek = soup.find('select', attrs = {'id':'ctl_ReportViewer_ctl04_ctl13_ddValue'}) 
ToWeek.select('option[value]= '52')                            

#Age group by:
Agegroup = soup.find('select', attrs = {'id':'ctl_ReportViewer_ctl04_ctl15_ddValue'}) 
Agegroup.select('option[value]= '1') 

# Click view report:

soup.find('id':'ctl_ReportViewer_ctl04_ctl00').click()              

#Down load CSV file form the drop down to save the file in local folder
soup.find('a', attrs = {'title':'CSV (comma delimited)'}) .click()

尽管代码中存在语法错误,但您必须使用selenium来点击网站中的按钮。

import time
from selenium import webdriver

# Here Chrome  will be used
driver = webdriver.Chrome()

# URL of website
url = "https://www.mywebsite.com/"

# Opening the website
driver.get(url)
# Here you can get your button by ID and Class too
button = driver.find_element_by_id(ID)
button = driver.find_element_by_class_name("slide-out-btn")

# clicking on the button
button.click()

最新更新