如何在 python 上从用户那里提取特定日期的推文?



我正在尝试从路透社(@reuters(推特帐户下载2019年11月的推文。

我在python上使用tweepy,这是我的代码:

pip install tweepy
import tweepy as tw
#Keys
consumer_key = "..."
consumer_secret = "..."
access_token = "..."
access_token_secret = "..."
# Login
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)
#Get user's tweets
tweets = tw.Cursor(api.user_timeline,
id="reuters",
lang="en",
since="2019-11-01",
until="2019-11-30").items()
all_tweets = [tweet.text for tweet in tweets]
all_tweets[:100]

"直到"参数似乎不起作用,因为我的代码拉取的推文包括最新的推文。

tweepy 库目前仅支持 Twitter 较旧的标准搜索 API,标准搜索仅涵盖 7 天的历史记录。若要搜索最早到 2019 年 11 月,需要使用高级完整存档搜索 API 或企业完整存档搜索。这些 API 都是商业性的,但高级 API 有一个称为"沙盒"的免费层,也可以使用。在 Python 中,你可以使用搜索推文库。

另一个答案中提到的时间线方法也是一种选择,但这取决于 11 月的推文是否在时间线 API 的范围内,该时间线 API 支持从今天返回的最多 3200 条推文。

以下是我们可以提取特定持续时间和特定用户的推文的两种简单方法。 解决方案1:使用TwitterAPI。 如andy_piper所述,您需要高级或沙盒访问权限,高级帐户太贵了。在您没有从Twitter中提取庞大的语料库之前,拥有免费的沙盒帐户就绰绰有余了。您只需使用 https://developer.twitter.com/en/pricing/aaa-all 启用沙盒帐户,这将使您能够访问请求数量有限的存档。

创建链接到 Twitter 帐户的开发环境标签:转到 Twitter 帐户中的开发环境,为沙盒创建相应的标签。 配置标签后。下面的代码将提取相应的推文。(相应地更改最大结果(

from TwitterAPI import TwitterAPI
Product = 'fullarchive'
label = 'Dev'
api = TwitterAPI(consumer_key, consumer_secret, access_token, access_token_secret)
tweets = api.request('tweets/search/%s/:%s' % (Product, label),
{'query' : 'from:reuters', 'maxResults': '10', 'fromDate':'201911010000', 'toDate':'201911300000'}) 
for tweet in tweets:
print(tweet['id'])

解决方案2:使用GetOldTweet3 api,我不喜欢这种方式,因为不确定许可证,但它甚至没有Twitter开发人员帐户,就像魅力一样工作,但对Twitter的隐私政策有点可疑,无论如何这是代码。

import GetOldTweets3 as got
username = 'reuters'
count = 100
tweetCriteria = got.manager.TweetCriteria().setUsername(username)
.setMaxTweets(count).setSince("2019-11-01")
.setUntil("2019-11-30")
tweets = got.manager.TweetManager.getTweets(tweetCriteria)
for tweet in tweets:
print(tweet.id,tweet.author_id,tweet.date)

参考: https://pypi.org/project/GetOldTweets3/https://github.com/geduldig/TwitterAPI/blob/master/examples/premium_search.py

我有答案。如果不去溢价,你就无法做到这一点。

import tweepy
import csv
import pandas as pd
####input your credentials here
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
# Open/Create a file to append data
csvFile = open('tweets.csv', 'a')
#Use csv Writer
csvWriter = csv.writer(csvFile)
# tracklist = ["Womens Day", "internationalwomensday", "internationalwomensday2021", "internationalwomensday21","women's day", "international women's day", "IWD", "womensday", "WomensDay", "HappyInternationalWomensDay","Happy Women's Day", "HappyWomensDay", "happywomensday", "happyinternationalwomensday", "Women", "women"]
# tracklist = ''.join(str(e) for e in tracklist)
# import pdb; pdb.set_trace()
count = 0
# for tweet in tweepy.Cursor(api.search,q="Womens Day OR internationalwomensday OR internationalwomensday2021 OR internationalwomensday21 OR women's day OR international women's day OR IWD or womensday OR WomensDay OR HappyInternationalWomensDay OR Happy Women's Day OR HappyWomensDay OR happywomensday OR happyinternationalwomensday OR Women OR women",count=10000,
#                            lang="en",
#                            since="2021-03-06", 
#                            include_rts=False).items():
#     print (tweet.created_at, tweet.text)
#     csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

for tweet in tweepy.Cursor(api.search,q="Womens Day OR internationalwomensday OR internationalwomensday2021 OR internationalwomensday21 OR women's day OR international women's day OR IWD OR HappyInternationalWomensDay OR Happy Women's Day OR HappyWomensDay OR happywomensday OR happyinternationalwomensday",
count=100000,
include_rts=False,
country_code=True,
coordinates=True,
lang="en",
since="2021-03-06",
until="2021-03-10"
).items():
print (tweet.created_at, tweet.text)
csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])