(正则表达式)如何在python中一起删除引号和里面的内容



我想删除一个文件(一个所有字符串都是引号的代码文件(中的字符串,如下所示:

text = "Hello,"+Tom+"have a nice day!"
text2 = "Thank"+"you."

我想要这个(不仅仅是配额,还有里面的所有东西(:

['text', 'Tom','text2']

我可以使用正则表达式获得每个字符串,并逐行读取:

readLine = re.findall("[a-zA-Z0-9]*", line)
# there is some trimming I didn't show

但结果是:

['text','Hello','Tom','have', 'a', 'nice', 'day', 'text2', 'Thank', 'you']

如果regex不合适,还有什么其他方法?感谢您的帮助。

您可以在正则表达式中使用正向前瞻,如下所示:
[a-zA-z0-9]+(?=( = ))

使用

import re

expr = r'"(?:[^"\]|\[sS])*"|(w+)'
text = r'''text = "Hello,"+Tom+"have a nice day!"
text2 = "Thank"+"you."'''
print(list(filter(None,re.findall(expr, text))))

参见Python防

结果:['text', 'Tom', 'text2']

Regex解释

--------------------------------------------------------------------------------
"                        '"'
--------------------------------------------------------------------------------
(?:                      group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
[^"\]                   any character except: '"', '\'
--------------------------------------------------------------------------------
|                        OR
--------------------------------------------------------------------------------
\                       ''
--------------------------------------------------------------------------------
[sS]                   any character of: whitespace (n, r,
t, f, and " "), non-whitespace (all
but n, r, t, f, and " ")
--------------------------------------------------------------------------------
)*                       end of grouping
--------------------------------------------------------------------------------
"                        '"'
--------------------------------------------------------------------------------
|                        OR
--------------------------------------------------------------------------------
(                        group and capture to 1:
--------------------------------------------------------------------------------
w+                      word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)                        end of 1

我试过

re.findall(r'".*"',line)

你可以简单地修剪开始和结束时的额外引号

编辑:要修剪它,可以使用

[ match[1:-1] for match in re.findall(r'".*"',line) ]

给你,这就是你所需要的:

re.findall('"(.*)"', sentence)

相关内容

最新更新