列表中的分割字符串



考虑一些具有不一致字符串或甚至长度的列表(循环)。该列表是来自email (. email)消息体的输出。

示例列表1

['Request 1',
'String example',
'Service:xyz Request Date Time: 4/7/2022 8:20:54 PMService: Sub Service:']

示例列表2

['Request 2',
'String example 1',
'String example 2',
'Service : xyzabc   Requested by : example   Request Date : 4/8/2022 7:31:17 AM   Service :   abcdefg   Sub Service :   abcdefg       Current Owner']

示例列表3

['Request 3',
'string example',
'Service : abcdefg     Requested by : example   Request Date : Thursday, 7 April 2022, 3:29:55 PM  Service :   abcdefg  Sub Service :   abcdefg        Current Owner','SSC :    abcdefg', 
'Jam']

该字符串需要被解析并分类为单独的DataFrame列:

请求
  • 字符串示例
  • 请求日期(*和时间)
  • <
  • 子服务/gh>
  • 当前所有者
  • SSC

问题是甚至没有一个确切的字符串模式可以用作参数来分割字符串。

这是我用来读取电子邮件文件的代码,但问题是有一个嵌套列表,因为if条件。

matches = ["Service", "Requested by", "Request Date"]
for file in eml_files:
with open(file, 'rb') as fp:
name = fp.name
msg = BytesParser(policy=policy.default).parse(fp)
text = msg.get_body(preferencelist=('plain')).get_content()
file_names.append(name)
texts.append(text)
fp.close()
text = text.split("n")
text = [j.strip('r') for j in text]
text = [j.strip('t') for j in text]
text = [j.strip() for j in text if j.strip()]
for idx, te in enumerate(text):
if any(x in te for x in matches):
text[idx] = re.split('Service :|Requested by : |Request Date : |Service : |Sub Service : | Current Owner|SSC : ', te)

df = pd.DataFrame(text).T

作为一般要点:

for string in list:
# Do stuff to the string, the string being list[string], stored as "string"

根据列表的性质,您可以使用以下命令:

if "Service " in string:
# Do something
else:
# Do something else, such as storing it as None or NULL

虽然只使用循环

就可以了

相关内容

  • 没有找到相关文章

最新更新