Python:获取一个频道的所有YouTube视频url

我想获得特定频道的所有视频url。我认为json与python或java将是一个很好的选择。我可以用下面的代码得到最新的视频，但是我怎么能得到所有的视频链接(>500)?

import urllib, json
author = 'Youtube_Username'
inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?max-results=1&alt=json&orderby=published&author=' + author)
resp = json.load(inp)
inp.close()
first = resp['feed']['entry'][0]
print first['title'] # video title
print first['link'][0]['href'] #url

youtube API更改后，最大k。他的回答不起作用。作为替代，下面的函数提供了给定频道中youtube视频的列表。请注意，你需要一个API密钥才能工作。

import urllib
import json
def get_all_video_in_channel(channel_id):
    api_key = YOUR API KEY
    base_video_url = 'https://www.youtube.com/watch?v='
    base_search_url = 'https://www.googleapis.com/youtube/v3/search?'
    first_url = base_search_url+'key={}&channelId={}&part=snippet,id&order=date&maxResults=25'.format(api_key, channel_id)
    video_links = []
    url = first_url
    while True:
        inp = urllib.urlopen(url)
        resp = json.load(inp)
        for i in resp['items']:
            if i['id']['kind'] == "youtube#video":
                video_links.append(base_video_url + i['id']['videoId'])
        try:
            next_page_token = resp['nextPageToken']
            url = first_url + '&pageToken={}'.format(next_page_token)
        except:
            break
    return video_links

简短回答:

这里有一个库可以帮助你。

pip install scrapetube

import scrapetube
videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")
for video in videos:
    print(video['videoId'])

长答:

上面提到的模块是由我创建的，因为缺乏任何其他解决方案。以下是我的尝试:

硒。它是有效的，但有三个很大的缺点:1。它需要安装web浏览器和驱动程序。2. 有很大的CPU和内存需求。3.无法处理大频道。
。这样的:

import youtube_dl
    youtube_dl_options = {
        'skip_download': True,
        'ignoreerrors': True
    }
    with youtube_dl.YoutubeDL(youtube_dl_options) as ydl:
        videos = ydl.extract_info(f'https://www.youtube.com/channel/{channel_id}/videos')

这也适用于小频道，但对于大频道，我会被youtube阻止，因为在这么短的时间内发出这么多请求(因为youtube-dl下载了频道中每个视频的更多信息)。

所以我做了一个库scrapetube，它使用web API来获取所有的视频。

将max-results从1增加到你想要的任何数量，但要注意他们不建议在一次调用中抓取太多，并且将限制在50 (https://developers.google.com/youtube/2.0/developers_guide_protocol_api_query_parameters)。

相反，您可以考虑以25个为一批抓取数据，例如，通过更改start-index直到没有返回。

编辑:这是我如何做的代码

import urllib, json
author = 'Youtube_Username'
foundAll = False
ind = 1
videos = []
while not foundAll:
    inp = urllib.urlopen(r'http://gdata.youtube.com/feeds/api/videos?start-index={0}&max-results=50&alt=json&orderby=published&author={1}'.format( ind, author ) )
    try:
        resp = json.load(inp)
        inp.close()
        returnedVideos = resp['feed']['entry']
        for video in returnedVideos:
            videos.append( video ) 
        ind += 50
        print len( videos )
        if ( len( returnedVideos ) < 50 ):
            foundAll = True
    except:
        #catch the case where the number of videos in the channel is a multiple of 50
        print "error"
        foundAll = True
for video in videos:
    print video['title'] # video title
    print video['link'][0]['href'] #url

基于这里和其他一些地方找到的代码，我编写了一个小脚本来完成此操作。我的脚本使用Youtube的API v3，并且没有达到Google为搜索设置的500个结果限制。

代码可在GitHub: https://github.com/dsebastien/youtubeChannelVideosFinder

独立的做事方式。没有api，没有速率限制。

import requests
username = "marquesbrownlee"
url = "https://www.youtube.com/user/username/videos"
page = requests.get(url).content
data = str(page).split(' ')
item = 'href="/watch?'
vids = [line.replace('href="', 'youtube.com') for line in data if item in line] # list of all videos listed twice
print(vids[0]) # index the latest video

上面的代码将只废弃有限数量的视频url的最大值为60。如何抓住所有的视频网址，这是目前在频道。你能建议一下吗?

上面的代码片段将只显示列出两次的所有视频的列表。并不是所有的视频url都在这个频道里

使用Selenium Chrome Driver:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import time
driverPath = ChromeDriverManager().install()
driver = webdriver.Chrome(driverPath)
url = 'https://www.youtube.com/howitshouldhaveended/videos'
driver.get(url)
height = driver.execute_script("return document.documentElement.scrollHeight")
previousHeight = -1
while previousHeight < height:
    previousHeight = height
    driver.execute_script(f'window.scrollTo(0,{height + 10000})')
    time.sleep(1)
    height = driver.execute_script("return document.documentElement.scrollHeight")
vidElements = driver.find_elements_by_id('thumbnail')
vid_urls = []
for v in vidElements:
    vid_urls.append(v.get_attribute('href'))

这段代码已经工作了几次我已经尝试;但是，您可能需要调整睡眠时间，或者添加一种方法来识别浏览器何时仍在加载额外的信息。它很容易为我工作，获得300+视频的频道，但它有一个问题，有7000+视频，由于所需的时间加载新视频在浏览器上变得不一致。

我修改了最初由dermasmid发布的脚本以适应我的需要。结果如下:

import scrapetube
import sys
path = '_list.txt'
sys.stdout = open(path, 'w')
videos = scrapetube.get_channel("UC9-y-6csu5WGm29I7JiwpnA")
for video in videos:
    print("https://www.youtube.com/watch?v="+str(video['videoId']))
#    print(video['videoId'])

基本上它是从播放列表中保存所有的url到"_list.txt"文件。我用的是这个"_list。txt"文件下载所有的视频使用yt-dlp.exe。所有下载的文件扩展名为。mp4。

现在我需要创建另一个"_playlist.txt"文件，该文件包含"_List.txt"中每个URL对应的所有文件名。

例如，For: "https://www.youtube.com/watch?v=yG1m7oGZC48"拥有"Apple M1 Ultra"NUMA - Computerphile.mp4"作为输出到"_playlist.txt"

我确实做了一些进一步的改进，以便能够将通道URL添加到控制台，在屏幕上打印结果，也可以将结果打印到一个名为"_list.txt"的外部文件中。

import scrapetube
import sys
path = '_list.txt'
print('**********************n')
print("The result will be saved in '_list.txt' file.")
print("Enter Channel ID:")
# Prints the output in the console and into the '_list.txt' file.
class Logger:
 
    def __init__(self, filename):
        self.console = sys.stdout
        self.file = open(filename, 'w')
 
    def write(self, message):
        self.console.write(message)
        self.file.write(message)
 
    def flush(self):
        self.console.flush()
        self.file.flush()
sys.stdout = Logger(path)
# Strip the: "https://www.youtube.com/channel/"
channel_id_input = input()
channel_id = channel_id_input.strip("https://www.youtube.com/channel/")
videos = scrapetube.get_channel(channel_id)
for video in videos:
    print("https://www.youtube.com/watch?v="+str(video['videoId']))
#    print(video['videoId'])

相关内容

最新更新

热门标签：