考虑一些具有不一致字符串或甚至长度的列表(循环)。该列表是来自email (. email)消息体的输出。
示例列表1
['Request 1',
'String example',
'Service:xyz Request Date Time: 4/7/2022 8:20:54 PMService: Sub Service:']
示例列表2
['Request 2',
'String example 1',
'String example 2',
'Service : xyzabc Requested by : example Request Date : 4/8/2022 7:31:17 AM Service : abcdefg Sub Service : abcdefg Current Owner']
示例列表3
['Request 3',
'string example',
'Service : abcdefg Requested by : example Request Date : Thursday, 7 April 2022, 3:29:55 PM Service : abcdefg Sub Service : abcdefg Current Owner','SSC : abcdefg',
'Jam']
该字符串需要被解析并分类为单独的DataFrame列:
请求- 字符串示例
- 请求日期(*和时间)
- <
- 子服务/gh>
- 当前所有者
- SSC
问题是甚至没有一个确切的字符串模式可以用作参数来分割字符串。
这是我用来读取电子邮件文件的代码,但问题是有一个嵌套列表,因为if条件。
matches = ["Service", "Requested by", "Request Date"]
for file in eml_files:
with open(file, 'rb') as fp:
name = fp.name
msg = BytesParser(policy=policy.default).parse(fp)
text = msg.get_body(preferencelist=('plain')).get_content()
file_names.append(name)
texts.append(text)
fp.close()
text = text.split("n")
text = [j.strip('r') for j in text]
text = [j.strip('t') for j in text]
text = [j.strip() for j in text if j.strip()]
for idx, te in enumerate(text):
if any(x in te for x in matches):
text[idx] = re.split('Service :|Requested by : |Request Date : |Service : |Sub Service : | Current Owner|SSC : ', te)
df = pd.DataFrame(text).T
作为一般要点:
for string in list:
# Do stuff to the string, the string being list[string], stored as "string"
根据列表的性质,您可以使用以下命令:
if "Service " in string:
# Do something
else:
# Do something else, such as storing it as None or NULL
虽然只使用循环
就可以了