从某个频道获取所有youtube视频(有些视频丢失)



我正在Youtube上使用v3 Google API:

$url = 'https://www.googleapis.com/youtube/v3/search?part=id&channelId=' . $channelID . '&maxResults=50&order=date&key=' . $API_key;

我已经设置了一个脚本,它应该给我一个给定频道ID的所有视频。对于一些频道,我得到了所有视频,对于一些频道(与Youtube中直接显示的视频数量相比),我得到的最大结果是488个视频,尽管有更多。

pageToken是一个奇怪的东西。例如,一个频道有955个视频。我有18页,每页50个项目(相当于900个视频)。其中一些是播放列表,但如果减去23个播放列表,我仍然有877个视频。如果我删除重复项,我只有488个结果!JSON输出中的totalResults显示了975个结果!?

这是我的递归函数:

function fetchAllVideos($parsed_json){
    $foundIds = array();
    if($parsed_json != ''){
        $foundIds = getVideoIds($parsed_json);
        $nextPageToken = getNextPageToken($parsed_json);
        $prevPageToken = getPrevPageToken($parsed_json);
        if($nextPageToken != ''){
            $new_parsed_json = getNextPage($nextPageToken);
            $foundIds = array_merge($foundIds, fetchAllVideos($new_parsed_json));
        }
        if($prevPageToken != ''){
            $new_parsed_json = getNextPage($prevPageToken);
            $foundIds = array_merge($foundIds, fetchAllVideos($new_parsed_json));
        }
    }
    return $foundIds;
}

我用$videoIds = fetchAllVideos($parsed_json);来调用它,$parsed_json是我检索到的第一个URL的结果。你能在这里看到错误吗?

有人知道直接在Youtube上显示的视频数量是如何计算的吗?有人能得到一份与Youtube上的数字相对应的完整列表吗?

此脚本一次选择一个60天的周期,并检索其结果,然后将其添加到现有的数据数组中。通过这样做,允许的视频数量没有限制,尽管可能需要一些时间才能在较大的YouTube频道上搜索到几千个视频。请确保设置了API_KEY、时区、用户名、开始日期(应在频道上的第一个视频之前开始)和时间段(默认设置为60*60*24*60,即60天(秒)。如果视频频率在60天内高于约50,则此时间段需要更低。)(5184000秒)。

*所有这些都在脚本中进行了注释。

date_default_timezone_set("TIMEZONE"); 
//youtube api key
$API_KEY = "YOUR API KEY";
function search($searchTerm,$url){
    $url = $url . urlencode($searchTerm);
    $result = file_get_contents($url);
    if($result !== false){
        return json_decode($result, true);
    }
    return false;
}
function get_user_channel_id($user){
    global $API_KEY;
    $url = 'https://www.googleapis.com/youtube/v3/channels?key=' . $API_KEY . '&part=id&forUsername=';
    return search($user,$url)['items'][0]['id'];
}
function push_data($searchResults){
    global $data;
    foreach($searchResults['items'] as $item){
        $data[] = $item;
    }
    return $data;
}
function get_url_for_utc_period($channelId, $utc){
    //get the API_KEY
    global $API_KEY;
    //youtube specifies the DateTime to be formatted as RFC 3339 formatted date-time value (1970-01-01T00:00:00Z)
    $publishedAfter = date("Y-m-dTH:i:sP",strval($utc));
    //within a 60 day period
    $publishedBefore_ = $utc + (60 * 60 * 24 * 60);
    $publishedBefore = date("Y-m-dTH:i:sP",$publishedBefore_);
    //develop the URL with the API_KEY, channelId, and the time period specified by publishedBefore & publishedAfter
    $url = 'https://www.googleapis.com/youtube/v3/search?part=snippet&type=video&key=' . $API_KEY . '&maxResults=50&channelId=' . $channelId . '&publishedAfter=' . urlencode($publishedAfter) . '&publishedBefore=' . urlencode($publishedBefore);
    return array("url"=>$url,"utc"=>$publishedBefore_);
}
//the date that the loop will begin with, have this just before the first videos on the channel.
//this is just an example date
$start_date = "2013-1-1";
$utc = strtotime($start_date);
$username = "CHANNEL USERNAME NOT CHANNEL ID";
//get the channel id for the username
$channelId = get_user_channel_id($username);
while($utc < time()){
    $url_utc = get_url_for_utc_period($channelId, $utc);
    $searchResults = search("", $url_utc['url']);
    $data = push_data($searchResults);
    $utc += 60 * 60 * 24 * 60;
}
print "<pre>";
print_r($data);
print "</pre>";
//check that all of the videos have been accounted for (cross-reference this with what it says on their youtube channel)
print count($data);

https://gdata.youtube.com/feeds/api/users/USERNAME_HERE/uploads?max-results=50&alt=json&start-index=1做到了。这是一个JSON提要,您必须循环,直到得到少于50个结果。

编辑:

这应该是我使用的脚本:

ini_set('max_execution_time', 900);
function getVideos($channel){
    $ids = array();
    $start_index = 1;
    $still_have_results = true;
    if($channel == ""){
        return false;   
    }
    $url = 'https://gdata.youtube.com/feeds/api/users/' . $channel . '/uploads?max-results=50&alt=json&start-index=' . $start_index;
    $json = file_get_contents($url);
    $obj = json_decode($json);
    while($still_have_results){
        foreach($obj->feed->entry as $video){
            $video_url = $video->id->{'$t'};
            $last_pos = strrpos($video_url, '/');
            $video_id = substr($video_url, $last_pos+1, strlen($video_url) - $last_pos);
            array_push($ids, $video_id);
        }
        $number_of_items = count($obj->feed->entry);
        $start_index += count($obj->feed->entry);
        if($number_of_items < 50) {
            $still_have_results = false;
        }
        $url = 'https://gdata.youtube.com/feeds/api/users/' . $channel . '/uploads?max-results=50&alt=json&start-index=' . $start_index;
        $json = file_get_contents($url);
        $obj = json_decode($json);
    }
    return $ids;    
}
$videoIds = getVideos('youtube');
echo '<pre>';
print_r($videoIds);
echo '</pre>';

现在我做了一个测试,但我没有收集到100%的视频。尽管如此,我还是想出了最好的选择。

相关内容

  • 没有找到相关文章

最新更新