如何使用YouTube Data API V3获取视频的所有评论(超过100条)



我目前正在进行一个项目,我需要收集一些特定youtube视频的所有评论
我可以使用commentThreads().list函数获得最多100条评论(更多信息)。有什么办法可以得到所有的评论吗?

我正在使用以下功能,这是由谷歌YouTube数据API开发者指南提供的。

def get_comment_threads(youtube, video_id):
  results = youtube.commentThreads().list(
    part="snippet",
    maxResults=100,
    videoId=video_id,
    textFormat="plainText"
  ).execute()
  for item in results["items"]:
    comment = item["snippet"]["topLevelComment"]
    author = comment["snippet"]["authorDisplayName"]
    text = comment["snippet"]["textDisplay"]
    print "Comment by %s: %s" % (author, text)
  return results["items"]

正如上面的注释中所说,您可以简单地使用next_page_token并调用while循环,直到停止获取下一页令牌。但要注意,有些视频的评论量确实很大,这需要很长时间才能加载。

此外,我写信是为了扩展您上面提到的代码。

我还从一些Github存储库中复制了一些我现在不记得了的代码。

更新youtubevideo_id变量,就像之前在get_comment_threads函数中使用它们一样。

def load_comments(match):
    for item in match["items"]:
        comment = item["snippet"]["topLevelComment"]
        author = comment["snippet"]["authorDisplayName"]
        text = comment["snippet"]["textDisplay"]
        print("Comment by {}: {}".format(author, text))
        if 'replies' in item.keys():
            for reply in item['replies']['comments']:
                rauthor = reply['snippet']['authorDisplayName']
                rtext = reply["snippet"]["textDisplay"]
            print("ntReply by {}: {}".format(rauthor, rtext), "n")
def get_comment_threads(youtube, video_id):
    results = youtube.commentThreads().list(
        part="snippet",
        maxResults=100,
        videoId=video_id,
        textFormat="plainText"
    ).execute()
    return results
video_id = ""
youtube = ""
match = get_comment_thread(youtube, video_id)
next_page_token = match["nextPageToken"]
load_comments(match)
while next_page_token:
    match = get_comment_thread(youtube, video_id)
    next_page_token = match["nextPageToken"]
    load_comments(match)

要添加到@minhaj的答案中,

while循环将一直运行到最后一个commentThreads.list()响应,但是最后一个响应没有nextPageToken键,并且会抛出一个键错误。

一个简单的尝试,除了解决这个问题:

try:
  while next_page_token:
      match = get_comment_thread(youtube, video_id)
      next_page_token = match["nextPageToken"]
      load_comments(match)
except KeyError:
      match = get_comment_thread(youtube, video_id)
      load_comments(match)

@Anthony Camarillo你是对的,在这种情况下,异常处理是必要的。其次,我对@minhaj的回答进行了一些更正,因为它一直在调用该视频的同一评论页面,因此我们最终陷入了一个无限的while循环。关键是使用nextPageToken参数调用get_comment_threads()函数。我正在使用panda将数据存储在DataFrame中。

这是对我有效的代码:

import pandas as pd
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
DEVELOPER_KEY = "Your_API_KEY"
video_id = "Your_Video_id"
youtube = googleapiclient.discovery.build(
        api_service_name, api_version, developerKey = DEVELOPER_KEY)
comments = []
authors = []
def load_comments(match):
    for item in match["items"]:
        comment = item["snippet"]["topLevelComment"]
        author = comment["snippet"]["authorDisplayName"]
        text = comment["snippet"]["textDisplay"]
        comments.append(text)
        authors.append(author)
    
        print("Comment by {}: {}".format(author, text))
        if 'replies' in item.keys():
            for reply in item['replies']['comments']:
                rauthor = reply['snippet']['authorDisplayName']
                rtext = reply["snippet"]["textDisplay"]
            print("ntReply by {}: {}".format(rauthor, rtext), "n")
def get_comment_threads(youtube, video_id, nextPageToken):
    results = youtube.commentThreads().list(
        part="snippet",
        maxResults=100,
        videoId=video_id,
        textFormat="plainText",
        pageToken = nextPageToken
    ).execute()
    return results

match = get_comment_threads(youtube, video_id, '')
next_page_token = match["nextPageToken"]
load_comments(match)
try:
    while next_page_token:
        match = get_comment_threads(youtube, video_id, next_page_token)
        next_page_token = match["nextPageToken"]
        load_comments(match)
except:
    data = pd.DataFrame(comments, index = authors,columns=["Comments"])
    print(data)

相关内容

  • 没有找到相关文章