我有一个目标艺术家,想获取它的对应id,如下所示:
import re
target = 'Portishead'
videos = ['Portishead - Roads (Vg1jyL3cr60)', 'Portishead - Roads - (WQYsGWh_vpE)', 'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)', 'Lawson - Roads (I-SOaSU0ieA)', 'Vargas & Lagola - Roads (Audio) (Kd3s20GmPVE)']
for item in videos:
artist = item.split('-')[0]
# here I get whats inside parenthesis, not always an id
video_id = re.findall('(([^)]+)', item)
# and here the id, which is always the last split item
id_ = (video_id[-1])
if artist == target:
print id_
但是我的CCD_ 1条件对目标艺术家不起作用。我没有打印任何结果。
考虑到实际列表非常大,使用for
循环或其他方式实现这一点的最佳方法是什么?
我想获取以上"Vg1jyL3cr60">
编辑:@Alexandre Cécile。我在这里发布了调用youtube API的整个函数,如果你有兴趣完善缩小艺术家视频搜索范围的函数,一旦你传递了曲目标题和艺术家名称。不过,你需要一把钥匙。
from google.oauth2 import service_account
def youtube_id(track_name, target_artist):
GET_CREDENTIALS = os.environ.get('GOOGLE_APPLICATION_CREDENTIALS')
PASS_CREDENTIALS =
service_account.Credentials.from_service_account_file(GET_CREDENTIALS)
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
DEVELOPER_KEY = "mykey"
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, credentials=PASS_CREDENTIALS,
developerKey=None)
# Call the search.list method to retrieve results matching the specified
# query term.
search_response = youtube.search().list(
q=track_name,
part="id,snippet",
#maxResults=track_name.max_results
).execute()
videos = []
videos_ids = []
channels = []
playlists = []
# Add each result to the appropriate list, and then display the lists of
# matching videos, channels, and playlists.
for search_result in search_response.get("items", []):
if search_result["id"]["kind"] == "youtube#video":
videos.append("%s (%s)" % (search_result["snippet"]["title"],
search_result["id"]["videoId"]))
videos_ids.append("%s" % (search_result["id"]["videoId"]))
elif search_result["id"]["kind"] == "youtube#channel":
channels.append("%s (%s)" % (search_result["snippet"]["title"],
search_result["id"]["channelId"]))
elif search_result["id"]["kind"] == "youtube#playlist":
playlists.append("%s (%s)" % (search_result["snippet"]["title"],
search_result["id"]["playlistId"]))
print ("Videos:n", "n".join(videos), "n")
print ("Channels:n", "n".join(channels), "n")
print ("Playlists:n", "n".join(playlists), "n")
ids=[]
for video in videos:
artist = re.split(r's*-s*', video)[0]
id = re.search(r'.*(([^)]+)', video)[1]
if id and artist == target_artist:
videos_ids.append(id)
print ('VIDEOS IDS', videos_ids)
return videos_ids[-1]
当您从音轨中拆分艺术家时,您就是在'-'
上进行拆分。如果您查看实际的字符串,您会发现连字符周围有空白,这将包含在拆分结果中。
解决方案是用.strip()
和artist
变量去掉空白。
您遇到的问题主要是由于比赛结束时出现了空格(因为-
在-
上拆分并留下了空格(。下面的代码应该对你有效。它使用re.split
在s*-s*
上进行拆分(任意数量的空格,后跟if
0,后跟任意数量的空间(。
我还清理了代码的其他部分。我在第二个正则表达式的开头添加了.*
,只捕获最后一个实例(并将[0]
更改为[1]
,以获得捕获的内容,而不是整个匹配(。
最后一部分在打印前检查id
是否存在以及artist == target
是否存在。
请参阅此处使用的代码
import re
target = 'Portishead'
videos = [
'Portishead - Roads (Vg1jyL3cr60)',
'Portishead - Roads - (WQYsGWh_vpE)',
'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)',
'Lawson - Roads (I-SOaSU0ieA)',
'Vargas & Lagola - Roads (Audio) (Kd3s20GmPVE)'
]
for video in videos:
artist = re.split(r's*-s*', video)[0]
id = re.search(r'.*(([^)]+)', video)[1]
if id and artist == target:
print(id)
结果:
Vg1jyL3cr60
WQYsGWh_vpE
正则表达式模式的解释:
s*-s*
此模式匹配-
及其周围的任何空白s*
多次匹配任何空白字符-
与该字符完全匹配s*
多次匹配任何空白字符
.*(([^)]+)
此模式匹配字符串中左括号的最后一个实例.*
多次匹配任何字符(这就是我们如何确保匹配最后一个括号的方法,因为它很贪婪,并且将匹配尽可能多的字符((
与(
完全匹配([^)]+)
捕获以下内容[^)]+
匹配除)
之外的任何字符中的一个或多个
您可以将代码更改为以下内容,修复拆分问题并获取ID(或括号之间的任何内容(:
import re
target = 'Portishead'
videos = ['Portishead - Roads (Vg1jyL3cr60)', 'Portishead - Roads - (WQYsGWh_vpE)', 'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)', 'Lawson - Roads (I-SOaSU0ieA)', 'Vargas & Lagola - Roads (Audio) (Kd3s20GmPVE)']
for item in videos:
artist = item.split(' - ')[0]
video_id = re.sub('(|)','',re.findall('(.*?)',item)[-1])
if artist == target:
print video_id
输出:
Vg1jyL3cr60
WQYsGWh_vpE
如果您想要的输出只是OP中所述的Vg1jyL3cr60
,则您希望在打印第一个ID 后中断循环
仔细观察数据,并不总是清楚艺术家的名字是什么时候出现的(比如林肯公园和拉戈拉(,所以目前的方法存在缺陷,的任何答案都没有解决
好吧,这里有一个完整的使用新正则表达式的示例。它提取了视频的id、名称/标题,就这样。我想避免对视频标题的格式做出一堆假设,因为它似乎没有遵循特定的模式或格式。
import re
vid_extract_re = re.compile(r"^(?P<video_name>.*)((?P<video_id>S+))$")
vid_str_list = ['Portishead - Roads (Vg1jyL3cr60)', 'i am a string which does not fit the pattern',
'Portishead - Roads - (WQYsGWh_vpE)',
'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)',
'Lawson - Roads (I-SOaSU0ieA)',
'Vargas & Lagola - Roads (Audio) (Kd3s20GmPVE)', 'i am also a string which does not fit the pattern']
vid_info_lst = []
for curr_vid_str in vid_str_list:
curr_match = vid_extract_re.fullmatch(curr_vid_str)
if curr_match is not None:
curr_vid_name, curr_vid_id = curr_match.groups()
vid_info_lst.append((curr_vid_name.strip(), curr_vid_id))
else:
print(f'Regex failed on video str: {curr_vid_str}')
print(vid_info_lst)
如果您还有任何问题,请告诉我!:(
方法1
也许,以下可能更接近:
import re
target = 'Portishead'
videos = ['Portishead - Roads (Vg1jyL3cr60)', 'Portishead - Roads - (WQYsGWh_vpE)', 'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)',
'Lawson - Roads (I-SOaSU0ieA)', 'Vargas & Lagola - Roads (Audio) (Kd3s20GmPVE)']
for item in videos:
artist = item.split('-')[0]
# here I get whats inside parenthesis, not always an id
video_id = re.findall(r'(?<=()[^)]+(?=))', item)
# and here the id, which is always the last split item
id_ = video_id
if artist.strip() == target:
print(video_id)
输出
['Vg1jyL3cr60']
['WQYsGWh_vpE']
如果你想简化/修改/探索表达式,regex101.com右上角的面板上已经对它进行了解释。如果你愿意,你也可以在这个链接中查看它与一些示例输入的匹配情况。
方法2
以防万一,您可能有未知数量的空间,那么我们将利用re.split()
:
import re
target = 'Portishead'
videos = ['Portishead - Roads (Vg1jyL3cr60)', 'Portishead - Roads - (WQYsGWh_vpE)', 'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)',
'Lawson - Roads (I-SOaSU0ieA)', 'Vargas & Lagola - Roads (Audio) (Kd3s20GmPVE)']
for item in videos:
artist = re.split(r's*-s*', item)[0]
# here I get whats inside parenthesis, not always an id
video_id = re.findall(r'(?<=()[^)]+(?=))', item)
# and here the id, which is always the last split item
if artist == target:
print(video_id[0])
输出
Vg1jyL3cr60
WQYsGWh_vpE