我将如何从网站抓取数据并每天使用新信息更新文件,同时保存旧数据?



我最初计划使用 CSV 文件,但它需要我每天手动登录 VScode 并运行我的脚本将数据添加到 csv 文件中,它将替换我之前输入的旧数据。

如果抓取的数据集很小,请将数据抓取到字典的嵌套列表中,其中包含要保存的每一行的结构[{<column1>: <data>, <column2>: <data>, ...}, ...],然后使用此函数通过执行以下操作将该字典附加到 csv 文件中append_csv_dict(<path_to_your_csv>, <your_dictionary>)

import csv
def append_csv_dict(path, data):
'''
Append a csv with a dictionary keys as column headers
Args:
path (str): Path to the csv file
data (dict or list): Dictionary or list(dict) with keys as 
column  headers and values as column data
'''
with open(path, 'a') as file:
# set the field names to the keys of the dictionary or keys of the first item
fieldnames = list(data.keys()) if isinstance(data, dict) else data[0].keys()
writer = csv.DictWriter(file, fieldnames=fieldnames)
# write the header if the file is new
if file.tell() == 0:
writer.writeheader()
if isinstance(data, dict):
fieldnames = list(data.keys())
# write the row
writer.writerow(data)
elif isinstance(data, list):
# write the rows if it is a list
writer.writerows(data)
# some example data, you can do one dictionary at a time if you only do one row per day
scraped_data = [
{
'first_name': 'John',
'last_name': 'Do',
'age': 31
},
{
'first_name': 'Jane',
'last_name': 'Do',
'age': 33
},
{
'first_name': 'Foo',
'last_name': 'Bar',
'age': 58
}
]
append_csv_dict('./scrape.csv', scraped_data)

输出(刮擦.csv(:

first_name,last_name,age
John,Do,31
Jane,Do,33
Foo,Bar,58

最新更新