我是Python的新手,并尝试通过进行小型小项目来学习。我目前正在尝试从各个网页收集一些信息,但是,每当将刮擦数据输出到CSV时,它似乎只会从上一个URL输出数据。
理想情况下,我希望它能够写信给CSV反对附加的CSV,因为我只想要一个CSV,只有最新数据中的最新数据。
我在stackoverflow上浏览了与此相似的其他查询,但我要么不了解它们,要么他们只是对我不起作用。(可能是前者(。
任何帮助将不胜感激。
import csv
import requests
from bs4 import BeautifulSoup
import pandas as pd
URL = ['URL1','URL2']
for URL in URL:
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html.parser')
nameElement = soup.find('p', attrs={'class':'name'}).a
nameText = nameElement.text.strip()
priceElement = soup.find('span', attrs={'class':'price'})
priceText = priceElement.text.strip()
columns = [['Name','Price'], [nameText, priceText]]
with open('index.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerows(columns)
您必须在for
循环之前打开文件,然后在for
LOOP中写下每一行
URL = ['URL1','URL2']
with open('index.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerow( ['Name','Price'] )
for URL in URL:
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html.parser')
nameElement = soup.find('p', attrs={'class':'name'}).a
nameText = nameElement.text.strip()
priceElement = soup.find('span', attrs={'class':'price'})
priceText = priceElement.text.strip()
writer.writerow( [nameText, priceText] )
,或者您必须在for
循环之前创建列表,然后append()
数据列表
URL = ['URL1','URL2']
columns = [ ['Name','Price'] ]
for URL in URL:
response = requests.get(URL)
soup = BeautifulSoup(response.content, 'html.parser')
nameElement = soup.find('p', attrs={'class':'name'}).a
nameText = nameElement.text.strip()
priceElement = soup.find('span', attrs={'class':'price'})
priceText = priceElement.text.strip()
columns.append( [nameText, priceText] )
with open('index.csv', 'w', newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerows(columns)