我目前正在进行一个项目,我需要收集一些特定youtube视频的所有评论
我可以使用commentThreads().list函数获得最多100条评论(更多信息)。有什么办法可以得到所有的评论吗?
我正在使用以下功能,这是由谷歌YouTube数据API开发者指南提供的。
def get_comment_threads(youtube, video_id):
results = youtube.commentThreads().list(
part="snippet",
maxResults=100,
videoId=video_id,
textFormat="plainText"
).execute()
for item in results["items"]:
comment = item["snippet"]["topLevelComment"]
author = comment["snippet"]["authorDisplayName"]
text = comment["snippet"]["textDisplay"]
print "Comment by %s: %s" % (author, text)
return results["items"]
正如上面的注释中所说,您可以简单地使用next_page_token
并调用while循环,直到停止获取下一页令牌。但要注意,有些视频的评论量确实很大,这需要很长时间才能加载。
此外,我写信是为了扩展您上面提到的代码。
我还从一些Github存储库中复制了一些我现在不记得了的代码。
更新youtube
和video_id
变量,就像之前在get_comment_threads
函数中使用它们一样。
def load_comments(match):
for item in match["items"]:
comment = item["snippet"]["topLevelComment"]
author = comment["snippet"]["authorDisplayName"]
text = comment["snippet"]["textDisplay"]
print("Comment by {}: {}".format(author, text))
if 'replies' in item.keys():
for reply in item['replies']['comments']:
rauthor = reply['snippet']['authorDisplayName']
rtext = reply["snippet"]["textDisplay"]
print("ntReply by {}: {}".format(rauthor, rtext), "n")
def get_comment_threads(youtube, video_id):
results = youtube.commentThreads().list(
part="snippet",
maxResults=100,
videoId=video_id,
textFormat="plainText"
).execute()
return results
video_id = ""
youtube = ""
match = get_comment_thread(youtube, video_id)
next_page_token = match["nextPageToken"]
load_comments(match)
while next_page_token:
match = get_comment_thread(youtube, video_id)
next_page_token = match["nextPageToken"]
load_comments(match)
要添加到@minhaj的答案中,
while循环将一直运行到最后一个commentThreads.list()响应,但是最后一个响应没有nextPageToken
键,并且会抛出一个键错误。
一个简单的尝试,除了解决这个问题:
try:
while next_page_token:
match = get_comment_thread(youtube, video_id)
next_page_token = match["nextPageToken"]
load_comments(match)
except KeyError:
match = get_comment_thread(youtube, video_id)
load_comments(match)
@Anthony Camarillo你是对的,在这种情况下,异常处理是必要的。其次,我对@minhaj的回答进行了一些更正,因为它一直在调用该视频的同一评论页面,因此我们最终陷入了一个无限的while循环。关键是使用nextPageToken参数调用get_comment_threads()函数。我正在使用panda将数据存储在DataFrame中。
这是对我有效的代码:
import pandas as pd
os.environ["OAUTHLIB_INSECURE_TRANSPORT"] = "1"
api_service_name = "youtube"
api_version = "v3"
DEVELOPER_KEY = "Your_API_KEY"
video_id = "Your_Video_id"
youtube = googleapiclient.discovery.build(
api_service_name, api_version, developerKey = DEVELOPER_KEY)
comments = []
authors = []
def load_comments(match):
for item in match["items"]:
comment = item["snippet"]["topLevelComment"]
author = comment["snippet"]["authorDisplayName"]
text = comment["snippet"]["textDisplay"]
comments.append(text)
authors.append(author)
print("Comment by {}: {}".format(author, text))
if 'replies' in item.keys():
for reply in item['replies']['comments']:
rauthor = reply['snippet']['authorDisplayName']
rtext = reply["snippet"]["textDisplay"]
print("ntReply by {}: {}".format(rauthor, rtext), "n")
def get_comment_threads(youtube, video_id, nextPageToken):
results = youtube.commentThreads().list(
part="snippet",
maxResults=100,
videoId=video_id,
textFormat="plainText",
pageToken = nextPageToken
).execute()
return results
match = get_comment_threads(youtube, video_id, '')
next_page_token = match["nextPageToken"]
load_comments(match)
try:
while next_page_token:
match = get_comment_threads(youtube, video_id, next_page_token)
next_page_token = match["nextPageToken"]
load_comments(match)
except:
data = pd.DataFrame(comments, index = authors,columns=["Comments"])
print(data)