如何从使用python的列表中删除特殊字符?



我有一个这样的列表。

z=[']'What type of humans arrived on the Indian subcontinent from Africa?', 'When did humans first arrive on the Indian subcontinent?', 'What subcontinent did humans first arrive on?', 'Between 73000 and what year ago did humans first arrive on the Indian subcontinent?',kingdoms were established in Southeast Asia?Indianized']']

我想把它转换成简单的2d列表。

z= [['What type of humans arrived on the Indian subcontinent from Africa?', 'When did humans first arrive on the Indian subcontinent?', 'What subcontinent did humans first arrive on?', 'Between 73000 and what year ago did humans first arrive on the Indian subcontinent?','kingdoms were established in Southeast Asia?Indianized']]

那么如何将这个列表转换成2D列表呢?

逻辑不完全清楚。我将使用一个正则表达式对2个或更多的非单词字符进行分割:

[[x for x in re.split(r'[^a-z0-9?]{2,}', s, flags=re.I) if x] for s in z]

输出:

[['What type of humans arrived on the Indian subcontinent from Africa?',
'When did humans first arrive on the Indian subcontinent?',
'What subcontinent did humans first arrive on?',
'Between 73000 and what year ago did humans first arrive on the Indian subcontinent?',
'kingdoms were established in Southeast Asia?Indianized']]

您可以使用库re。它将替换所有特殊字符的正则表达式。在末尾有空格(在9之后),它将保留空格。如果你不需要空格,删除它。

import re
re.sub('[^A-Za-z0-9 ]+', '', mystring)