处理连接中断 for 循环,错误行为



我有以下 For 循环,它使用Tweepy获取一系列用户的关注者 ID:

def download_followers(user, api):
all_followers = []
try:
for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():
all_followers.extend(map(str, page))
return all_followers
except tweepy.TweepError:
print('Could not access user {}. Skipping...'.format(user))

按以下方式调用该函数:

for username in lookup_users:
user_followers = download_followers(username, main_api)
if user_followers:
new_followers = pd.DataFrame({
"Handles": username,
"Follower_ID": user_followers,
"Start_Date": today})
new_followers_df = new_followers_df.append(new_followers)

print('Finished outputting: {} at {}'.format(username, datetime.now().strftime('%Y/%m/%d %H:%M:%S')))

根据每个user可能拥有的关注者数量,Twitter's API可能必须被调用两次或三次才能抓住所有user's followers

因此,在对 API 进行另一次调用之前,还有 15 分钟的休息时间。这是通过将以下参数添加到Tweepy来处理的:

main_api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

结果是这样的:

Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Finished outputting: @barackobama at 2017/07/01 10:36:07

在这种情况下,API两次达到其极限。每次等待15分钟,然后抓住@barackobama的所有追随者。

但是,有时for loop会失败。打印出消息:

'Could not access user @barackobama. Skipping...'

这主要是由于连接问题,Twitter API未发送正确的请求,或者帐户有很多关注者,而Tweepy的软件包无法相应地处理它。

为了解释可能的连接失败,我尝试将 api 包装在While True参数中,如下所示:

def download_followers(user, api):
all_followers = []
while True:
try:
for page in tweepy.Cursor(api.followers_ids, screen_name=user).pages():
all_followers.extend(map(str, page))
return all_followers
except tweepy.TweepError:
print('Could not access user {}. Trying Again...'.format(user))
continue
break

但是,通过以这种方式包装函数,for 循环无法正常工作。 只Iterating一次user,而不是抓住它的所有追随者,然后转到"lookup_user列表中的下一个user"。

例如,instead以下行为方式:

Rate limit reached. Sleeping for: 895
'Could not access user @barackobama. Trying again...'
Rate limit reached. Sleeping for: 895
Finished outputting: @barackobama at 2017/07/01 10:36:07
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Rate limit reached. Sleeping for: 895
Finished outputting: @donaldtrump at 2017/07/01 10:36:07

它的作用如下:

Finished outputting: @barackobama at 2017/07/01 10:36:07
Finished outputting: @donaldtrump at 2017/07/01 10:36:07
Finished outputting: @georgebush at 2017/07/01 10:36:07
Rate limit reached. Sleeping for: 895
Finished outputting: @richardnixon at 2017/07/01 10:41:08

因此,仅迭代每个用户一次

我做错了什么吗?

return语句位于for循环内,因此程序在第一次迭代后退出for循环。