我想删除一个文件(一个所有字符串都是引号的代码文件(中的字符串,如下所示:
text = "Hello,"+Tom+"have a nice day!"
text2 = "Thank"+"you."
我想要这个(不仅仅是配额,还有里面的所有东西(:
['text', 'Tom','text2']
我可以使用正则表达式获得每个字符串,并逐行读取:
readLine = re.findall("[a-zA-Z0-9]*", line)
# there is some trimming I didn't show
但结果是:
['text','Hello','Tom','have', 'a', 'nice', 'day', 'text2', 'Thank', 'you']
如果regex不合适,还有什么其他方法?感谢您的帮助。
您可以在正则表达式中使用正向前瞻,如下所示:[a-zA-z0-9]+(?=( = ))
使用
import re
expr = r'"(?:[^"\]|\[sS])*"|(w+)'
text = r'''text = "Hello,"+Tom+"have a nice day!"
text2 = "Thank"+"you."'''
print(list(filter(None,re.findall(expr, text))))
参见Python防
结果:['text', 'Tom', 'text2']
Regex解释
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^"\] any character except: '"', '\'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\ ''
--------------------------------------------------------------------------------
[sS] any character of: whitespace (n, r,
t, f, and " "), non-whitespace (all
but n, r, t, f, and " ")
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to 1:
--------------------------------------------------------------------------------
w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of 1
我试过
re.findall(r'".*"',line)
你可以简单地修剪开始和结束时的额外引号
编辑:要修剪它,可以使用
[ match[1:-1] for match in re.findall(r'".*"',line) ]
给你,这就是你所需要的:
re.findall('"(.*)"', sentence)