如何从字符串中删除某些字符？ .replace() 不起作用

我需要从XML文件中获得的字符串中摆脱抛光字符。我使用.replace（），但是在这种情况下，它不起作用。为什么？代码：

# -*- coding: utf-8
from prestapyt import PrestaShopWebService
from xml.etree import ElementTree
prestashop = PrestaShopWebService('http://localhost/prestashop/api', 
                              'key')
prestashop.debug = True
name = ElementTree.tostring(prestashop.search('products', options=
{'display': '[name]', 'filter[id]': '[2]'}), encoding='cp852',  
method='text')
print name
print name.replace('ł', 'l')

输出：

Naturalne mydło odświeżające
Naturalne mydło odświeżające

但是当我尝试替换非抛光特征时，它可以正常工作。

print name
print name.replace('a', 'o')

结果：

Naturalne mydło odświeżające
Noturolne mydło odświeżojące

这也很好：

name = "Naturalne mydło odświeżające"
print name.replace('ł', 'l')

任何建议？

如果我正确理解您的问题，则可以使用unidecode：

>>> from unidecode import unidecode
>>> unidecode("Naturalne mydło odświeżające")
'Naturalne mydlo odswiezajace'

您可能必须用name.decode('utf_8')首先解码CP852编码字符串。

您将编码与字节字符串混合。这是一个简短的工作示例，重现了问题。我假设您在Windows控制台中运行，该控制台默认为cp852的编码：

#!python2
# coding: utf-8
from xml.etree import ElementTree as et
name_element = et.Element('data')
name_element.text = u'Naturalne mydło odświeżające'
name = et.tostring(name_element,encoding='cp852', method='text')
print name
print name.replace('ł', 'l')

输出（无替换）：

Naturalne mydło odświeżające
Naturalne mydło odświeżające

原因是，name字符串是在cp852中编码的，但是字节字符串常数'ł'在utf-8的源代码编码中编码。

print repr(name)
print repr('ł')

输出：

'Naturalne mydx88o odx98wiexbeajxa5ce'
'xc5x82'

最好的解决方案是使用Unicode字符串：

#!python2
# coding: utf-8
from xml.etree import ElementTree as et
name_element = et.Element('data')
name_element.text = u'Naturalne mydło odświeżające'
name = et.tostring(name_element,encoding='cp852', method='text').decode('cp852')
print name
print name.replace(u'ł', u'l')
print repr(name)
print repr(u'ł')

输出（进行更换）：

Naturalne mydło odświeżające
Naturalne mydlo odświeżające
u'Naturalne mydu0142o odu015bwieu017caju0105ce'
u'u0142'

请注意，Python 3的et.tostring具有Unicode选项，默认情况下，字符串常数为Unicode。字符串的repr()版本也更可读，但是ascii()实现了旧行为。您还会发现，Python 3.6甚至会打印到不使用波兰代码页面的游戏机，因此也许您根本不需要替换字符。

#!python3
# coding: utf-8
from xml.etree import ElementTree as et
name_element = et.Element('data')
name_element.text = 'Naturalne mydło odświeżające'
name = et.tostring(name_element,encoding='unicode', method='text')
print(name)
print(name.replace('ł','l'))
print(repr(name),repr('ł'))
print(ascii(name),ascii('ł'))

输出：

Naturalne mydło odświeżające
Naturalne mydlo odświeżające
'Naturalne mydło odświeżające' 'ł'
'Naturalne mydu0142o odu015bwieu017caju0105ce' 'u0142'

相关内容

最新更新

热门标签：