所有的信息都只在excel python的一个块中获取



你好,我是网络抓取的新手。我废弃了一个网站,但在我将其写入CSV后,只有一个块填充了所有信息,我希望信息按行填充,它们是否在一行中无关紧要,但它们必须在不同的行中。这是代码:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
from csv import writer
url = 'https://virtualhs.pwcs.edu/about/faculty'
res = requests.get(url)
soup = BeautifulSoup(res.content, 'lxml')
Title = soup.find('div', id='divContent')
if Title:
for p in Title.select("p"):
p.extract()
for h2 in Title.select("h2"):
h2.extract()
Title =Title.text
print(Title)
with open('now.csv','w',encoding='utf-8', newline='') as f:
thewriter = writer(f)
thewriter.writerow([Title])

目前还不清楚您想刮到什么。是老师和系里的人吗?

如果使用'w'作为参数,它将在每次迭代后覆盖。您需要在每次迭代后使用'a'进行追加,但也需要确保您编写了一个初始的";空白";csv附加到.

就我个人而言,我认为构建一个数据帧,然后将其写入文件更容易:

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://virtualhs.pwcs.edu/about/faculty'
res = requests.get(url)
soup = BeautifulSoup(res.content, 'html.parser')
departments = soup.find_all('h3')
rows = []
for department in departments:
for teacher in department.find_next('ul').find_all('li'):
row = {
'teacher':teacher.text,
'department':department.text}
rows.append(row)

df = pd.DataFrame(rows)
df.to_csv('now.csv', index=False)    

最新更新