Python将不需要的制表符添加到string中

我使用的脚本基本上从HTML文件中捕获HTML元素并将它们发送到MySQL数据库。我使用

title = line.replace("",'').replace("<h1>",'').replace("</h1>",'')

用于捕获H1。现在，如果我运行

print title

一切正常。但是，如果我运行

print 'post_title = %(title)s'%locals()

则Python似乎总是在title的开头添加2个制表符。

有谁知道是什么原因造成的，我该如何预防?

对title字符串调用strip():

title = line.replace("<!--h1-->",'').replace("<h1>",'').replace("</h1>",'').strip()
print 'post_title = %(title)s' % locals()

没有必要这样使用locals();您已经有了所需的变量，因此:

print 'post_title = %s' % title

或

print 'post_title = {}'.format(title)

删除空格的方法是使用strip() string方法。

title = line.replace("<!--h1-->",'').replace("<h1>",'').replace("</h1>",'')
print 'post_title = %s' % title.strip()

或者，如果你知道字符串的开头总是有两个不想要的制表符，那就缩短title。下面的代码用字符串的前两个字符以外的所有字符替换title。

title = title[2:]

编辑

另一种方法是使用正则表达式。与字符串的replace方法类似，正则表达式替换方法可用于用空字符串('')替换双制表符(tt)。

import re
title = line.replace("<!--h1-->",'').replace("<h1>",'').replace("</h1>",'')
# Replace two consecutive tabs.
title = re.sub('tt', '', title)

使re模块如此强大的是，您甚至可以使用^(或$)字符将搜索限制在有问题的字符串的开始(或结束)。

title = re.sub('(^tt)', '', title)

相关内容