我想知道如何像Twitter存档一样格式化CSV文件,这样R就不会有阅读问题(遇到一堆问题,没有解决方案)。Twitter 存档是用户时间线,我的 CSV(我将使用 R 在其上进行情绪分析)是包含推文的搜索结果。
推特档案样本
"tweet_id","in_reply_to_status_id","in_reply_to_user_id","timestamp","source","text","retweeted_status_id","retweeted_status_user_id","retweeted_status_timestamp","expanded_urls"
"81423594213695488","","","2016-12-29 14:18:08 +0000","<a href=""http://twitter.com/download/android"" rel=""nofollow"">Twitter for Android</a>","RT @SwiftOnSecurity: We're going to tell kids that laptops used to store data on tiny mirrors spinning @ 7200rpm and they're going to think…","814187405175570432","2436389418","2016-12-28 19:12:58 +0000",""
"876926582348550143","","","2016-12-22 13:29:16 +0000","<a href=""http://twitter.com/download/android"" rel=""nofollow"">Twitter for Android</a>","RT @MKBHD: Shout-out to everyone going home and becoming family tech support for the holidays","811910809521680384","29873662","2016-12-22 12:26:36 +0000",""
到目前为止我设法做了什么
"text"
b'RT @notCORYGREGORY: when hillary uses a private email server asking how to print recipes vs when trump takes healthcare from 20+ million amxe2x80xa6'
b'RT @Salon: Germany is giving up on President Trump'
我如何在Python中做到这一点:
csvFile = open('tweets.csv', 'a')
csvWriter = csv.writer(csvFile, delimiter=',')
for tweet in tweepy.Cursor(api.search,
q="trump",
rpp=100,
result_type="recent",
include_entities=True,
lang="en").items(5):
print (tweet.text)
csvWriter.writerow([tweet.text.encode('utf-8')])
csvFile.close()
我对 R 中的解决方案持开放态度
我不完全理解你的问题,但你可能想看看 R 中的 twitteR 库,尤其是函数"twListToDF"。如果将其与write.csv结合使用,则可以更正以csv格式收集的推文,R也可以读取该格式。
write.csv(twListToDF(your_tweets), file="your_tweets.csv")