如何通过 Youtube JSON 响应有效地分页?



我正在从 Youtube 频道收集有关视频的信息,其中视频数量> 50。

所以这意味着我需要发出多个请求,因为每个 JSON 响应的最大结果是 50 个视频。

我找到了一些解决方案,现在代码如下所示

videoMetadata = [] #declaring our list, where the results will be stored
# First request
url = 'https://www.googleapis.com/youtube/v3/search?part=snippet&channelId='+CHANNEL_ID+'&maxResults=50&type=video&key='+API_KEY
response = urllib.request.urlopen(url) #makes the call to YouTube
videos = json.load(response) #decodes the response so we can work with it
nextPageToken = videos.get("nextPageToken") #gets the token of next page
# Retrieve all the rest of the pages
while nextPageToken:
url = 'https://www.googleapis.com/youtube/v3/search?part=snippet&channelId='+CHANNEL_ID+'&maxResults=50&type=video&key='+API_KEY+"&pageToken="+nextPageToken
response = urllib.request.urlopen(url)
videos_next_page = json.load(response)
nextPageToken = videos_next_page.get("nextPageToken")

# loops through results and appends it to videoMetadata list 
# loop for the first page
for video in videos['items']:
if video['id']['kind'] == 'youtube#video':
videoMetadata.append(video['id']['videoId'])
# loop for the next page       
for video in videos_next_page['items']:
if video['id']['kind'] == 'youtube#video':

它工作正常,但也许有更好的解决方案,如何将来自多个 JSON 响应的结果存储在列表中?

任何建议将不胜感激。

它工作正常,

实际上,它没有,除非你只有一个"下一页" - 这个:

while nextPageToken:
url = 'https://www.googleapis.com/youtube/v3/search?part=snippet&channelId='+CHANNEL_ID+'&maxResults=50&type=video&key='+API_KEY+"&pageToken="+nextPageToken
response = urllib.request.urlopen(url)
videos_next_page = json.load(response)
nextPageToken = videos_next_page.get("nextPageToken")

将在每次迭代时覆盖videos_next_page,因此您只能获得最后一页。

如何将来自多个 JSON 响应的结果存储在列表中

一旦取消序列化,"来自 JSON 响应的结果"就是普通的 python 对象(通常dicts(。您可以将它们附加到列表中,就像处理其他任何事情一样。

这是一个可能的重写,可以正确处理此问题(并且也可以更好地利用内存( - 警告:未经测试的代码,所以我不能保证没有错别字或其他什么,但至少你明白了。

def load_page(page_token=None):
url = "https://www.googleapis.com/youtube/v3/search?part=snippet&channelId={}&maxResults=50&type=video&key={}".format(CHANNEL_ID, API_KEY)
if page_token:
url += ("&pageToken={}".format(page_token))
response = urllib.request.urlopen(url) #makes the call to YouTube
return json.load(response)
def collect_videos_meta(page):
return [video['id']['videoId'] for video in page['items'] if video['id']['kind'] == 'youtube#video']
def main():
videoMetadata = []
nextPageToken = None # default initial value for the first page
# using `while True` and `break` avoids having to repeat the same
# code twice (once before the loop and once within the loop).
# This is a very common pattern in Python, you just have to make
# sure you will break out of the loop at some point...
while True:
page = load_page(nextPageToken)
videoMetadata.extend(collect_videos_meta(page))
nextPageToken = page.get("nextPageToken")
if not nextPageToken:
break
# now do what you want with those data...
print(videoMetadata)

if __name__ = "__main__":
main()

最新更新