如何在 Python 上的表达式中获取字母



>我有这个表达式:

<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>

我需要在"/dp/"(B01J5FGW66(旁边找到10个字母

如何制作一个执行此操作的函数?

使用正则表达式:

import re
s = '<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
print(re.search(r"dp/([A-Za-z0-9]{10})/", s)[1])

输出:B01J5FGW66

解释:

"dp/"开始:

dp/ 

捕获组由 (( 分隔,匹配 10(到 {10}(小写字母 (A-Z(、大写字母 (A-Z( 和数字 (0-9(:

([A-Za-z0-9]{10})

结束于"/"

/

使用re.search我们可以在您的字符串s中搜索该表达式,并使用[1]访问第一个捕获组的结果。

请注意,您可能需要添加额外的代码,以防找不到匹配项:

m = re.search(r"dp/([A-Za-z0-9]{10})/", s)
if m is not None:
print(m[1])
else:
# if nothing is found, search return None
print("No match")

我假设你总是只想要dp旁边的斜杠之间有什么(下一个路线(,而这10个字符有点无关紧要。有点笨拙,但这有效:

>>> x = '<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
>>> splits = x.split("/")
>>> dp_index = splits.index('dp')
>>> result = splits[dp_index+1] # Get the next one over
>>> result
'B01J5FGW66'

要将其放入功能中,您可以这样做:

def get_route_next_to_dp(html_str):
splits = html_str.split("/")
dp_index = splits.index('dp')
result = splits[dp_index+1] # Get the next one over
return result

用法可能如下所示:

html_str = '<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
route_next_to_dp = get_route_next_to_dp(html_str)
print(route_next_to_dp)

输出

'B01J5FGW66'

如愿以偿。

试试这个:它基本上使用正则表达式并计算接下来的 10 个字符串并检查是否找到它。

import re
my_string='<a class="a-link-normal" href="https://www.amazon.it/Philips-GC8735-PerfectCare-Generatore-Vapore/dp/B01J5FGW66/ref=gbph_img_s-3_7347_c3de3e94?smid=A11IL2PNWYJU7H&amp;pf_rd_p=82ae57d3-a26a-4d56-b221-3155eb797347&amp;pf_rd_s=slot-3&amp;pf_rd_t=701&amp;pf_rd_i=gb_main&amp;pf_rd_m=A11IL2PNWYJU7H&amp;pf_rd_r=MDQJBKEMGBX38XMPSHXB" id="dealImage"></a>'
m = re.search(r"dp/([A-Za-z0-9]{10})/", my_string)
if m.group(1):
print(m.group(1))

最新更新