收集大量数据时出现Tweepy错误



我正试图在Twitter上获得某个用户的所有追随者。大多数用户拥有超过10万粉丝。我当前的代码如下:

import tweepy
import time
from ttictoc import tic,toc

key1 = ""
key2 = ""
key3 = ""
key4 = ""
accountvar = ""
auth = tweepy.OAuthHandler(key1, key2)
auth.set_access_token(key3, key4)
tic()
#First, Make sure you have set wait_on_rate_limit to True while connecting through Tweepy
api = tweepy.API(auth, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
#Below code will request for 5000 follower ids in one request and therefore will give 75K ids in every 15 minute window (as 15 requests could be made in each window).
followerids =[]
for user in tweepy.Cursor(api.followers_ids, screen_name=accountvar,count=5000).items():
followerids.append(user)    
print (len(followerids))
#Below function could be used to make lookup requests for ids 100 at a time leading to 18K lookups in each 15 minute window
def get_usernames(userids, api):
fullusers = []
u_count = len(userids)
print(u_count)
try:
for i in range(int(u_count/100) + 1):            
end_loc = min((i + 1) * 100, u_count)
fullusers.extend(
api.lookup_users(user_ids=userids[i * 100:end_loc])                
)
return fullusers
except:
import traceback
traceback.print_exc()
print ('Something went wrong, quitting...')
#Calling the function below with the list of followeids and tweepy api connection details
fullusers = get_usernames(followerids,api)
print(toc())

不幸的是,我遇到了一个错误。我在Jupyter笔记本中使用Python 3.8。

TweepError: Failed to send request: ('Connection aborted.', OSError("(10054, 'WSAECONNRESET')"))

错误是由于每个75000个用户数据后的等待时间很长(15分钟(。由于空闲情况,连接可能会超时。感谢@Tim Roberts的帮助。

一种解决方案可以是在收集每个follower_id并将它们存储在列表中(如下面给出的代码片段(之后,使用足够小的睡眠时间time.sleep(.02)。因此,连接不需要等待很长时间。通过这种方式,我们可以防止连接断开。

for user in tweepy.Cursor(api.followers_ids, screen_name=accountvar,count=5000).items():
followerids.append(user)
time.sleep(.02)
print (len(followerids))

最新更新