Python - split, regex and condition



我有一个目标艺术家,想获取它的对应id,如下所示:

import re
target = 'Portishead'
videos = ['Portishead - Roads (Vg1jyL3cr60)', 'Portishead - Roads - (WQYsGWh_vpE)', 'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)', 'Lawson - Roads (I-SOaSU0ieA)', 'Vargas & Lagola - Roads (Audio) (Kd3s20GmPVE)']
for item in videos:
artist = item.split('-')[0]
# here I get whats inside parenthesis, not always an id
video_id = re.findall('(([^)]+)', item)
# and here the id, which is always the last split item
id_ = (video_id[-1])
if artist == target:
print id_

但是我的CCD_ 1条件对目标艺术家不起作用。我没有打印任何结果。

考虑到实际列表非常大,使用for循环或其他方式实现这一点的最佳方法是什么?

我想获取以上"Vg1jyL3cr60">


编辑:@Alexandre Cécile。我在这里发布了调用youtube API的整个函数,如果你有兴趣完善缩小艺术家视频搜索范围的函数,一旦你传递了曲目标题和艺术家名称。不过,你需要一把钥匙。

from google.oauth2 import service_account

def youtube_id(track_name, target_artist):
GET_CREDENTIALS = os.environ.get('GOOGLE_APPLICATION_CREDENTIALS')
PASS_CREDENTIALS = 
service_account.Credentials.from_service_account_file(GET_CREDENTIALS)
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
DEVELOPER_KEY = "mykey"
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, credentials=PASS_CREDENTIALS,
developerKey=None)
# Call the search.list method to retrieve results matching the specified
# query term.
search_response = youtube.search().list(
q=track_name,
part="id,snippet",
#maxResults=track_name.max_results
).execute()
videos = []
videos_ids = []
channels = []
playlists = []
# Add each result to the appropriate list, and then display the lists of
# matching videos, channels, and playlists.
for search_result in search_response.get("items", []):
if search_result["id"]["kind"] == "youtube#video":
videos.append("%s (%s)" % (search_result["snippet"]["title"],
search_result["id"]["videoId"]))
videos_ids.append("%s" % (search_result["id"]["videoId"]))
elif search_result["id"]["kind"] == "youtube#channel":
channels.append("%s (%s)" % (search_result["snippet"]["title"],
search_result["id"]["channelId"]))
elif search_result["id"]["kind"] == "youtube#playlist":
playlists.append("%s (%s)" % (search_result["snippet"]["title"],
search_result["id"]["playlistId"]))
print ("Videos:n", "n".join(videos), "n")
print ("Channels:n", "n".join(channels), "n")
print ("Playlists:n", "n".join(playlists), "n")
ids=[]
for video in videos:
artist = re.split(r's*-s*', video)[0]
id = re.search(r'.*(([^)]+)', video)[1]
if id and artist == target_artist:
videos_ids.append(id)
print ('VIDEOS IDS',  videos_ids)
return videos_ids[-1] 

当您从音轨中拆分艺术家时,您就是在'-'上进行拆分。如果您查看实际的字符串,您会发现连字符周围有空白,这将包含在拆分结果中。

解决方案是用.strip()artist变量去掉空白。

您遇到的问题主要是由于比赛结束时出现了空格(因为--上拆分并留下了空格(。下面的代码应该对你有效。它使用re.splits*-s*上进行拆分(任意数量的空格,后跟if0,后跟任意数量的空间(。

我还清理了代码的其他部分。我在第二个正则表达式的开头添加了.*,只捕获最后一个实例(并将[0]更改为[1],以获得捕获的内容,而不是整个匹配(。

最后一部分在打印前检查id是否存在以及artist == target是否存在。

请参阅此处使用的代码

import re
target = 'Portishead'
videos = [
'Portishead - Roads (Vg1jyL3cr60)',
'Portishead - Roads - (WQYsGWh_vpE)',
'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)',
'Lawson - Roads (I-SOaSU0ieA)',
'Vargas & Lagola - Roads (Audio) (Kd3s20GmPVE)'
]
for video in videos:
artist = re.split(r's*-s*', video)[0]
id = re.search(r'.*(([^)]+)', video)[1]
if id and artist == target:
print(id)

结果:

Vg1jyL3cr60
WQYsGWh_vpE

正则表达式模式的解释:

  • s*-s*此模式匹配-及其周围的任何空白
    • s*多次匹配任何空白字符
    • -与该字符完全匹配
    • s*多次匹配任何空白字符
  • .*(([^)]+)此模式匹配字符串中左括号的最后一个实例
    • .*多次匹配任何字符(这就是我们如何确保匹配最后一个括号的方法,因为它很贪婪,并且将匹配尽可能多的字符(
    • ((完全匹配
    • ([^)]+)捕获以下内容
      • [^)]+匹配除)之外的任何字符中的一个或多个

您可以将代码更改为以下内容,修复拆分问题并获取ID(或括号之间的任何内容(:

import re
target = 'Portishead'
videos = ['Portishead - Roads (Vg1jyL3cr60)', 'Portishead - Roads - (WQYsGWh_vpE)', 'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)', 'Lawson - Roads (I-SOaSU0ieA)', 'Vargas & Lagola - Roads (Audio) (Kd3s20GmPVE)']
for item in videos:
artist = item.split(' - ')[0]
video_id = re.sub('(|)','',re.findall('(.*?)',item)[-1])
if artist == target:
print video_id

输出:

Vg1jyL3cr60
WQYsGWh_vpE

如果您想要的输出只是OP中所述的Vg1jyL3cr60,则您希望在打印第一个ID 后中断循环

仔细观察数据,并不总是清楚艺术家的名字是什么时候出现的(比如林肯公园和拉戈拉(,所以目前的方法存在缺陷,的任何答案都没有解决

好吧,这里有一个完整的使用新正则表达式的示例。它提取了视频的id、名称/标题,就这样。我想避免对视频标题的格式做出一堆假设,因为它似乎没有遵循特定的模式或格式。


import re
vid_extract_re = re.compile(r"^(?P<video_name>.*)((?P<video_id>S+))$")
vid_str_list = ['Portishead - Roads (Vg1jyL3cr60)', 'i am a string which does not fit the pattern',
'Portishead - Roads - (WQYsGWh_vpE)',
'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)',
'Lawson - Roads (I-SOaSU0ieA)',
'Vargas & Lagola - Roads (Audio) (Kd3s20GmPVE)', 'i am also a string which does not fit the pattern']
vid_info_lst = []
for curr_vid_str in vid_str_list:
curr_match = vid_extract_re.fullmatch(curr_vid_str)
if curr_match is not None:
curr_vid_name, curr_vid_id = curr_match.groups()
vid_info_lst.append((curr_vid_name.strip(), curr_vid_id))
else:
print(f'Regex failed on video str: {curr_vid_str}')
print(vid_info_lst)

如果您还有任何问题,请告诉我!:(

方法1

也许,以下可能更接近:

import re
target = 'Portishead'
videos = ['Portishead - Roads (Vg1jyL3cr60)', 'Portishead - Roads - (WQYsGWh_vpE)', 'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)',
'Lawson - Roads (I-SOaSU0ieA)', 'Vargas &amp; Lagola - Roads (Audio) (Kd3s20GmPVE)']
for item in videos:
artist = item.split('-')[0]
# here I get whats inside parenthesis, not always an id
video_id = re.findall(r'(?<=()[^)]+(?=))', item)
# and here the id, which is always the last split item
id_ = video_id
if artist.strip() == target:
print(video_id)

输出

['Vg1jyL3cr60']
['WQYsGWh_vpE']

如果你想简化/修改/探索表达式,regex101.com右上角的面板上已经对它进行了解释。如果你愿意,你也可以在这个链接中查看它与一些示例输入的匹配情况。


方法2

以防万一,您可能有未知数量的空间,那么我们将利用re.split():

import re
target = 'Portishead'
videos = ['Portishead - Roads (Vg1jyL3cr60)', 'Portishead - Roads - (WQYsGWh_vpE)', 'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)',
'Lawson - Roads (I-SOaSU0ieA)', 'Vargas &amp; Lagola - Roads (Audio) (Kd3s20GmPVE)']
for item in videos:
artist = re.split(r's*-s*', item)[0]
# here I get whats inside parenthesis, not always an id
video_id = re.findall(r'(?<=()[^)]+(?=))', item)
# and here the id, which is always the last split item
if artist == target:
print(video_id[0])

输出

Vg1jyL3cr60
WQYsGWh_vpE

最新更新