在Python Scraping过程中,将数据存储到csv文件中只存储一行数据



Csv文件只存储一行数据,如果我在Csv中使用范围,那么它只会一次又一次地执行一行,直到达到范围。

我无法修复这些错误,我花了两天时间。


for page in range(0,10):
url = "https://cryptonews.net/?page={page}".format(page =page)
# print(url)

# open the file in the write mode
# f = open('file.csv', 'w',newline='' )
header = ['Title', 'Tag', 'UTC','Web_Address']

# write a row to the csv file
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
lists = soup.find_all("main")
for lis in lists:
title = lis.find('a', class_="title").text
tag = lis.find('span', class_="etc-mark").text
datetime = lis.find('span', class_="datetime").text
address = lis.find('div', class_="middle-xs").text
img = lis.find('span', class_="src")
data =([title, tag, datetime,address,img])

counter = range(100)
with open('crypto.csv', 'a', newline='') as crypto:
FileWriter = csv.writer(crypto)
FileWriter.writerow(header)
for x in counter:

FileWriter.writerow(data)# writer.writerows(data)



您没有存储数据,而且如上所述,每次迭代lists时都会覆盖数据。其次,我会选择在这里使用panda来创建一个数据帧,然后将其写入文件。

此外,您收集了5个项目来编写,并且只有4个列名。

import pandas as pd
import requests
from bs4 import BeautifulSoup

data = []
for page in range(0,10):
print(page)
url = "https://cryptonews.net/?page={page}".format(page =page)
# print(url)
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
lists = soup.find_all("main")
for lis in lists:
title = lis.find('a', class_="title").text
tag = lis.find('span', class_="etc-mark").text
datetime = lis.find('span', class_="datetime").text
address = lis.find('div', class_="middle-xs").text
img = lis.find('span', class_="src")
data.append([title, tag, datetime,address,img])

header = ['Title', 'Tag', 'UTC','Web_Address','Image']
df = pd.DataFrame(data, columns=header)
df.to_csv('crypto.csv', index=False)

此外,我不确定你想要什么作为输出(正如你没有说的(。这更准确吗?

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
data = []
for page in range(0,10):
print(page)
url = "https://cryptonews.net/?page={page}".format(page =page)
# print(url)
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
lists = soup.find_all("div", {'class':re.compile('^row news-item.*')})
for lis in lists:
title = lis['data-title']
tag = lis.find('span', class_="etc-mark").text
datetime = lis.find('span', class_=re.compile("^datetime")).text.strip()
address = lis['data-domain']
img = lis['data-image']
data.append([title, tag, datetime,address,img])

header = ['Title', 'Tag', 'UTC','Web_Address','Image']
df = pd.DataFrame(data, columns=header)
df.to_csv('crypto.csv', index=False)

输出:

print(df)
Title  ...                                              Image
0    ETH Breaches $1,500 Level As Ethereum Adds Ove...  ...  https://cnews24.ru/uploads/e29/e29a5677e448f6e...
1    India Seeing Spike in Drug Smuggling Using Cry...  ...  https://cnews24.ru/uploads/65b/65b50302f65e12c...
2    Optimism (OP) Price Prediction: 87% Rally Is J...  ...  https://cnews24.ru/uploads/5e1/5e1189bbb2c1e2b...
3        Mysterious Whale Adds 3.94 Trillion Shiba Inu  ...  https://cnews24.ru/uploads/54a/54af6726248c29a...
4    Are the big fundraising efforts of blockchain ...  ...  https://cnews24.ru/uploads/5af/5afb066d81be4a6...
..                                                 ...  ...                                                ...
195  Terra Classic (LUNC) Chief Community Officer S...  ...  https://cnews24.ru/uploads/a53/a53fd4206ab5f95...
196  Reddit NFT Collection: How to Sell Your Avatar...  ...  https://cnews24.ru/uploads/ab6/ab6718f707c3428...
197  In Topsy Turvy Market Logic, Positive U.S. GDP...  ...  https://cnews24.ru/uploads/264/264ab9327f4774a...
198  XRP Wallets Spikes Above 4.34M, Gaining 29,883...  ...  https://cnews24.ru/uploads/2e5/2e56d092b7c253b...
199                     Are crypto trading bots legit?  ...  https://cnews24.ru/uploads/ccb/ccb73d9d9b79280...
[200 rows x 5 columns]

首先,在每个循环迭代中设置data=([title,tag,datetime,address,img](,但不将其保存在任何位置。在每次循环迭代中,数据的值都会被下一行的数据替换,并且您不会将整个数据集保存在任何地方。

然后,在每次循环迭代中都将相同的东西("data"(传递给FileWriter.writerow((,而不更改"data"的值;数据"您需要为每个循环迭代编写特定的行。

修复这两个问题,您的代码就会正常工作。

相关内容

  • 没有找到相关文章

最新更新