通过美丽的汤,我抓取推特数据.我能够获取数据,但无法保存在csv文件中



我在Twitter上抓取了用户名,推文,回复,转发,但无法保存在CSV文件中。

这是代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup
file = "5_twitterBBC.csv"
f = open(file, "w")
Headers = "tweet_user, tweet_text,  replies,  retweetsn"
f.write(Headers)
for page in range(0,5):
url = "https://twitter.com/BBCWorld".format(page)
html = urlopen(url)
soup = BeautifulSoup(html,"html.parser")
tweets = soup.find_all("div", {"class":"js-stream-item"})
for tweet in tweets:
try:
if tweet.find('p',{"class":'tweet-text'}):
tweet_user = tweet.find('span',{"class":'username'}).text.strip()
tweet_text = tweet.find('p',{"class":'tweet-text'}).text.encode('utf8').strip()
replies = tweet.find('span',{"class":"ProfileTweet-actionCount"}).text.strip()
retweets = tweet.find('span', {"class" : "ProfileTweet-action--retweet"}).text.strip()
print(tweet_user, tweet_text,  replies,  retweets)
f.write("{}".format(tweet_user).replace(",","|")+ ",{}".format(tweet_text)+ ",{}".format( replies).replace(",", " ")+ ",{}".format(retweets) +  "n")
except: AttributeError
f.close()

我获取了数据,但无法保存在CSV文件中。有人向我解释如何将数据保存在CSV文件中。

如您所见,您在查找此处的推文时只犯了一个小错误tweets = soup.find_all("div", {"class":"js-stream-item"}),您忘记传递参数键名称,它应该是这样的tweets = soup.find_all("div", attrs={"class":"js-stream-item"})

这是一个有效的解决方案,但它只获取前 20 条推文

from urllib.request import urlopen
from bs4 import BeautifulSoup
file = "5_twitterBBC.csv"
f = open(file, "w")
Headers = "tweet_user, tweet_text,  replies,  retweetsn"
f.write(Headers)
url = "https://twitter.com/BBCWorld"
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")
# Gets the tweet
tweets = soup.find_all("li", attrs={"class":"js-stream-item"})
# Writes tweet fetched in file
for tweet in tweets:
try:
if tweet.find('p',{"class":'tweet-text'}):
tweet_user = tweet.find('span',{"class":'username'}).text.strip()
tweet_text = tweet.find('p',{"class":'tweet-text'}).text.encode('utf8').strip()
replies = tweet.find('span',{"class":"ProfileTweet-actionCount"}).text.strip()
retweets = tweet.find('span', {"class" : "ProfileTweet-action--retweet"}).text.strip()
# String interpolation technique
f.write(f'{tweet_user},/^{tweet_text}$/,{replies},{retweets}n')
except: AttributeError
f.close()
filename = "output.csv"
f = open(filename, "w",encoding="utf-8")
headers = " tweet_user, tweet_text, replies, retweets n"
f.write(headers)
***your code***
***loop****
f.write(''.join(tweet_user + [","] + tweet_text + [","] + replies + [","] + retweets + [","] + ["n"]) )
f.close()

最新更新