格式化推文,就像使用 R 的 Twitter 存档一样



我想知道如何像Twitter存档一样格式化CSV文件,这样R就不会有阅读问题(遇到一堆问题,没有解决方案)。Twitter 存档是用户时间线,我的 CSV(我将使用 R 在其上进行情绪分析)是包含推文的搜索结果。

推特档案样本

"tweet_id","in_reply_to_status_id","in_reply_to_user_id","timestamp","source","text","retweeted_status_id","retweeted_status_user_id","retweeted_status_timestamp","expanded_urls"
"81423594213695488","","","2016-12-29 14:18:08 +0000","<a href=""http://twitter.com/download/android"" rel=""nofollow"">Twitter for Android</a>","RT @SwiftOnSecurity: We're going to tell kids that laptops used to store data on tiny mirrors spinning @ 7200rpm and they're going to think…","814187405175570432","2436389418","2016-12-28 19:12:58 +0000",""
"876926582348550143","","","2016-12-22 13:29:16 +0000","<a href=""http://twitter.com/download/android"" rel=""nofollow"">Twitter for Android</a>","RT @MKBHD: Shout-out to everyone going home and becoming family tech support for the holidays","811910809521680384","29873662","2016-12-22 12:26:36 +0000",""

到目前为止我设法做了什么

"text"
b'RT @notCORYGREGORY: when hillary uses a private email server asking how to print recipes vs when trump takes healthcare from 20+ million amxe2x80xa6'
b'RT @Salon: Germany is giving up on President Trump'

我如何在Python中做到这一点:

csvFile = open('tweets.csv', 'a')
csvWriter = csv.writer(csvFile, delimiter=',')
for tweet in tweepy.Cursor(api.search,
    q="trump",
    rpp=100,
    result_type="recent",
    include_entities=True,
    lang="en").items(5):
        print (tweet.text)
        csvWriter.writerow([tweet.text.encode('utf-8')])
csvFile.close()

我对 R 中的解决方案持开放态度

我不完全理解你的问题,但你可能想看看 R 中的 twitteR 库,尤其是函数"twListToDF"。如果将其与write.csv结合使用,则可以更正以csv格式收集的推文,R也可以读取该格式。

write.csv(twListToDF(your_tweets), file="your_tweets.csv")

最新更新