提取子字符串，并在提取中包含引用字符

我在另一篇文章中发现了这部分代码，尽管我设法获得了正确的输出，但我非常确信有一种更干净的方法。

虽然有很多帖子都有类似的问题，但我没有发现任何要查找索引的字符也包含在提取中。。。

我本质上是想提取B08BY4V3NW。

有人能分享一种更简洁的方法来实现这一点吗？

s1 = "sp - disc - auto - B08BY4V3NW - 18cb mold"
s2 = "B0"
print (s1[s1.index(s2) - len(""):-12])

你可以试试这个：

from nltk.tokenize import word_tokenize
tokens = word_tokenize(s1)
str_match =  [w for w in tokens if s2 in w]

编辑：

s1 = "sp - disc - auto - B08BY4V3NW - 18cb mold"
s2 = "B0"
tokens = str.split(s1)

str.split(s1)将返回：

['sp', '-', 'disc', '-', 'auto', '-', 'B08BY4V3NW', '-', '18cb', 'mold']

然后，您可以从列表中找到字符串：

1.使用列表理解

str_match =  [w for w in tokens if s2 in w]
print(str_match)
['B08BY4V3NW']

2.使用过滤器

str_match = list(filter(lambda x: s2 in x, tokens))
print(str_match)
['B08BY4V3NW']

3.使用re

import re
str_match = [x for x in tokens if re.search(s2, x)]
print(str_match)
['B08BY4V3NW']

如果你想要B08BY4V3NW作为字符串格式，你可以使用：

str_ = ''.join(str_match)
type(str_)
str

相关内容