Python - split, regex and condition - Python - split, regex and condition 小贝子编程网

我有一个目标艺术家，想获取它的对应id，如下所示：

import re
target = 'Portishead'
videos = ['Portishead - Roads (Vg1jyL3cr60)', 'Portishead - Roads - (WQYsGWh_vpE)', 'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)', 'Lawson - Roads (I-SOaSU0ieA)', 'Vargas &amp; Lagola - Roads (Audio) (Kd3s20GmPVE)']
for item in videos:
artist = item.split('-')[0]
# here I get whats inside parenthesis, not always an id
video_id = re.findall('(([^)]+)', item)
# and here the id, which is always the last split item
id_ = (video_id[-1])
if artist == target:
print id_

但是我的CCD_ 1条件对目标艺术家不起作用。我没有打印任何结果。

考虑到实际列表非常大，使用for循环或其他方式实现这一点的最佳方法是什么？

我想获取以上"Vg1jyL3cr60">

编辑：@Alexandre Cécile。我在这里发布了调用youtube API的整个函数，如果你有兴趣完善缩小艺术家视频搜索范围的函数，一旦你传递了曲目标题和艺术家名称。不过，你需要一把钥匙。

from google.oauth2 import service_account

def youtube_id(track_name, target_artist):
GET_CREDENTIALS = os.environ.get('GOOGLE_APPLICATION_CREDENTIALS')
PASS_CREDENTIALS = 
service_account.Credentials.from_service_account_file(GET_CREDENTIALS)
YOUTUBE_API_SERVICE_NAME = "youtube"
YOUTUBE_API_VERSION = "v3"
DEVELOPER_KEY = "mykey"
youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, credentials=PASS_CREDENTIALS,
developerKey=None)
# Call the search.list method to retrieve results matching the specified
# query term.
search_response = youtube.search().list(
q=track_name,
part="id,snippet",
#maxResults=track_name.max_results
).execute()
videos = []
videos_ids = []
channels = []
playlists = []
# Add each result to the appropriate list, and then display the lists of
# matching videos, channels, and playlists.
for search_result in search_response.get("items", []):
if search_result["id"]["kind"] == "youtube#video":
videos.append("%s (%s)" % (search_result["snippet"]["title"],
search_result["id"]["videoId"]))
videos_ids.append("%s" % (search_result["id"]["videoId"]))
elif search_result["id"]["kind"] == "youtube#channel":
channels.append("%s (%s)" % (search_result["snippet"]["title"],
search_result["id"]["channelId"]))
elif search_result["id"]["kind"] == "youtube#playlist":
playlists.append("%s (%s)" % (search_result["snippet"]["title"],
search_result["id"]["playlistId"]))
print ("Videos:n", "n".join(videos), "n")
print ("Channels:n", "n".join(channels), "n")
print ("Playlists:n", "n".join(playlists), "n")
ids=[]
for video in videos:
artist = re.split(r's*-s*', video)[0]
id = re.search(r'.*(([^)]+)', video)[1]
if id and artist == target_artist:
videos_ids.append(id)
print ('VIDEOS IDS',  videos_ids)
return videos_ids[-1]

当您从音轨中拆分艺术家时，您就是在'-'上进行拆分。如果您查看实际的字符串，您会发现连字符周围有空白，这将包含在拆分结果中。

解决方案是用.strip()和artist变量去掉空白。

您遇到的问题主要是由于比赛结束时出现了空格(因为-在-上拆分并留下了空格(。下面的代码应该对你有效。它使用re.split在s*-s*上进行拆分(任意数量的空格，后跟if0，后跟任意数量的空间(。

我还清理了代码的其他部分。我在第二个正则表达式的开头添加了.*，只捕获最后一个实例(并将[0]更改为[1]，以获得捕获的内容，而不是整个匹配(。

最后一部分在打印前检查id是否存在以及artist == target是否存在。

请参阅此处使用的代码

import re
target = 'Portishead'
videos = [
'Portishead - Roads (Vg1jyL3cr60)',
'Portishead - Roads - (WQYsGWh_vpE)',
'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)',
'Lawson - Roads (I-SOaSU0ieA)',
'Vargas &amp; Lagola - Roads (Audio) (Kd3s20GmPVE)'
]
for video in videos:
artist = re.split(r's*-s*', video)[0]
id = re.search(r'.*(([^)]+)', video)[1]
if id and artist == target:
print(id)

结果：

Vg1jyL3cr60
WQYsGWh_vpE

正则表达式模式的解释：

s*-s*此模式匹配-及其周围的任何空白
- s*多次匹配任何空白字符
- -与该字符完全匹配
- s*多次匹配任何空白字符
.*(([^)]+)此模式匹配字符串中左括号的最后一个实例
- .*多次匹配任何字符(这就是我们如何确保匹配最后一个括号的方法，因为它很贪婪，并且将匹配尽可能多的字符(
- (与(完全匹配
- ([^)]+)捕获以下内容
  - [^)]+匹配除)之外的任何字符中的一个或多个

您可以将代码更改为以下内容，修复拆分问题并获取ID(或括号之间的任何内容(：

import re
target = 'Portishead'
videos = ['Portishead - Roads (Vg1jyL3cr60)', 'Portishead - Roads - (WQYsGWh_vpE)', 'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)', 'Lawson - Roads (I-SOaSU0ieA)', 'Vargas &amp; Lagola - Roads (Audio) (Kd3s20GmPVE)']
for item in videos:
artist = item.split(' - ')[0]
video_id = re.sub('(|)','',re.findall('(.*?)',item)[-1])
if artist == target:
print video_id

输出：

Vg1jyL3cr60
WQYsGWh_vpE

如果您想要的输出只是OP中所述的Vg1jyL3cr60，则您希望在打印第一个ID 后中断循环

仔细观察数据，并不总是清楚艺术家的名字是什么时候出现的(比如林肯公园和拉戈拉(，所以目前的方法存在缺陷，的任何答案都没有解决

好吧，这里有一个完整的使用新正则表达式的示例。它提取了视频的id、名称/标题，就这样。我想避免对视频标题的格式做出一堆假设，因为它似乎没有遵循特定的模式或格式。

import re
vid_extract_re = re.compile(r"^(?P<video_name>.*)((?P<video_id>S+))$")
vid_str_list = ['Portishead - Roads (Vg1jyL3cr60)', 'i am a string which does not fit the pattern',
'Portishead - Roads - (WQYsGWh_vpE)',
'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)',
'Lawson - Roads (I-SOaSU0ieA)',
'Vargas & Lagola - Roads (Audio) (Kd3s20GmPVE)', 'i am also a string which does not fit the pattern']
vid_info_lst = []
for curr_vid_str in vid_str_list:
curr_match = vid_extract_re.fullmatch(curr_vid_str)
if curr_match is not None:
curr_vid_name, curr_vid_id = curr_match.groups()
vid_info_lst.append((curr_vid_name.strip(), curr_vid_id))
else:
print(f'Regex failed on video str: {curr_vid_str}')
print(vid_info_lst)

如果您还有任何问题，请告诉我！：(

方法1

也许，以下可能更接近：

import re
target = 'Portishead'
videos = ['Portishead - Roads (Vg1jyL3cr60)', 'Portishead - Roads - (WQYsGWh_vpE)', 'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)',
'Lawson - Roads (I-SOaSU0ieA)', 'Vargas &amp; Lagola - Roads (Audio) (Kd3s20GmPVE)']
for item in videos:
artist = item.split('-')[0]
# here I get whats inside parenthesis, not always an id
video_id = re.findall(r'(?<=()[^)]+(?=))', item)
# and here the id, which is always the last split item
id_ = video_id
if artist.strip() == target:
print(video_id)

输出

['Vg1jyL3cr60']
['WQYsGWh_vpE']

如果你想简化/修改/探索表达式，regex101.com右上角的面板上已经对它进行了解释。如果你愿意，你也可以在这个链接中查看它与一些示例输入的匹配情况。

方法2

以防万一，您可能有未知数量的空间，那么我们将利用re.split():

import re
target = 'Portishead'
videos = ['Portishead - Roads (Vg1jyL3cr60)', 'Portishead - Roads - (WQYsGWh_vpE)', 'Need For Speed (Linkin Park - Roads Untraveled) Music Video (7Lkq7bf6kU8)',
'Lawson - Roads (I-SOaSU0ieA)', 'Vargas &amp; Lagola - Roads (Audio) (Kd3s20GmPVE)']
for item in videos:
artist = re.split(r's*-s*', item)[0]
# here I get whats inside parenthesis, not always an id
video_id = re.findall(r'(?<=()[^)]+(?=))', item)
# and here the id, which is always the last split item
if artist == target:
print(video_id[0])

输出

Vg1jyL3cr60
WQYsGWh_vpE

Python - split, regex and condition

方法1

输出

方法2

输出

相关内容

最新更新

热门标签：