我正在Python 2.7工作。我想从列表中的每个列表中删除非字符的字符,而不修改列表的结构。
启动示例列表:
csvarticles = [['[Beta-blockers]', 'Magic!', '1980', 'Presse medicale'],['Hypertension in the pregnant woman].', '', '2010', 'Medical'],['Arterial hypertension.', '', '1920', 'La Nouvelle']]
print (csvarticles[0])
所需的输出:
[['beta阻滞剂','魔术','1980','Presse Medicale'],["孕妇高血压",'','','2010','Medical'],["动脉高压",'','1920','la nouvelle']]
代码1:
csvarticles = [[word.lower().split() for word in nodeList] for nodeList in csvarticles]
print (csvarticles[0])
代码1输出:
['[beta-Blockers]','Magic!','1980','Presse Medicale'] 实际
代码2:
csvarticles = [[word.lower().split() for word in nodeList if word.isalpha()] for nodeList in csvarticles]
代码2输出:
[]
代码3:
articleTitle = []
for x, y in enumerate(csvarticles):
myString = simpleWords(csvarticles[x][0])
if myString is not '':
myString = myString.lower()
myString = re.sub('[W_]+', ' ', myString, flags=re.UNICODE)
myList = [word for word in myString.split() if len(word) > 3]
articleTitle = ' '.join(myList)
代码3输出:
['beta阻滞剂','魔术','1980','Presse Medicale',"高血压孕妇"," 2010","医学","动脉高血压"," 1920"," Nouvelle"," Nouvelle"]
代码3接近,但消除了嵌套列表的结构。
您想替换非空间或字母字符,然后修剪/小写字母。正则是对那些用str.strip
链接的替代品非常有效的。
重建双重列表中的嵌套列表:
import re
csvarticles = [['[Beta-blockers]', 'Magic!', '1980', 'Presse medicale'],['Hypertension in the pregnant woman].', '', '2010', 'Medical'],['Arterial hypertension.', '', '1920', 'La Nouvelle']]
result = [[re.sub("[^ w]"," ",x).strip().lower() for x in y] for y in csvarticles]
print(result)
打印:
[['beta blockers', 'magic', '1980', 'presse medicale'], ['hypertension in the pregnant woman', '', '2010', 'medical'], ['arterial hypertension', '', '1920', 'la nouvelle']]
如果您使用的是Python,请替换lower
casefold
来处理Speciale locale chars
如果您想在单线中执行此操作:
输入:
output = [[k.lower() for k in [' '.join(re.findall(r'[^][.!-][A-z0-9]+[^][.!-]', j)) for j in i]] for i in csvarticles]
输出:
[['beta blockers', 'magic', '1980', 'presse medicale'], ['hypertension in the pregnant woman', '', '2010', 'medical'], ['arterial hypertension', '', '1920', 'la nouvelle']]
REGEX:
[^][.!-][A-z0-9]+[^][.!-]
使用 string.isalnum((方法检查字符串是字母还是数字。
演示
csvarticles = [['[Beta-blockers]', 'Magic!', '1980', 'Presse medicale'],['Hypertension in the pregnant woman].', '', '2010', 'Medical'],['Arterial hypertension.', '', '1920', 'La Nouvelle']]
res = []
for i in csvarticles:
r = []
for j in i:
r.append("".join([k for k in j if (k.isalnum() or k.isspace())]).lower())
res.append(r)
print(res)
输出:
[['betablockers', 'magic', '1980', 'presse medicale'], ['hypertension in the pregnant woman', '', '2010', 'medical'], ['arterial hypertension', '', '1920', 'la nouvelle']]