如何将从汤对象中刮取的日期保存到CSV中



我希望只将刮取的日期保存到CSV文件中
这是刮取的数据和代码:

url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN- 
SkillsNetwork/labs/datasets/Programming_Languages.html"
from bs4 import BeautifulSoup 
import requests
data  = requests.get(url).text 
soup = BeautifulSoup(data,"html5lib")
table = soup.find('table')
for row in table.find_all('tr'): 
cols = row.find_all('td') 
programing_language = cols[1].getText()
salary = cols[3].getText() 
print("{}--->{}".format(programing_language,salary))

这是解决方案。

import pandas as pd
from bs4 import BeautifulSoup
import requests
data=[]
url = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/datasets/Programming_Languages.html"
from bs4 import BeautifulSoup 
import requests
data  = requests.get(url).text 
soup = BeautifulSoup(data,"html5lib")
table = soup.find('table')
for row in table.find_all('tr'): 
cols = row.find_all('td') 
programing_language = cols[1].getText()
salary = cols[3].getText() 
data.append([programing_language,salary])
#print("{}--->{}".format(programing_language,salary))
cols=['programing_language','salary']
df = pd.DataFrame(data,columns=cols)
df.to_csv("data.csv", index=False)

对于轻量级解决方案,您只需使用csv。使用tr:nth-child(n+2)忽略标题行。该CCD_ 3范围选择器从第二CCD_。然后在后续行的循环中,选择第二列和第四列,如下所示:

from bs4 import BeautifulSoup as bs
import requests, csv
response = requests.get('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DA0321EN-SkillsNetwork/labs/datasets/Programming_Languages.html',
headers={'User-Agent': 'Mozilla/5.0'})
soup = bs(response.content, 'lxml')
with open("programming.csv", "w", encoding="utf-8-sig", newline='') as f:
w = csv.writer(f, delimiter=",", quoting=csv.QUOTE_MINIMAL)
w.writerow(["Language", "Average Annual Salary"])
for item in soup.select('tr:nth-child(n+2)'):
w.writerow([item.select_one('td:nth-child(2)').text,
item.select_one('td:nth-child(4)').text])

最新更新