Unicode角色比较无法正常工作

当前我正在处理一个程序，我有一条线需要将角色i与unicode字符进行比较"。看起来如下：

    i != "”"

我的整个代码看起来像：

#!/usr/bin/env python
# -*- coding: utf-8 -*- 

f = open('text.txt', "r")
g = open('write.txt', "w")

for word in f:
  for i in word:
    if all( [i != " ", i != "," ,i != "!", i != "?", i != ";",  
       i !=".", i != ":", i != "”", i != "”" ]):
      g.write(i.lower())
    else:
        g.write('n

想法是正在解析文本，并且所有字符（如点，点，问号等）都被取出。唯一的问题是Unicode字符不会从文本中删除。你们能给我一个人吗？谢谢！

有关您的信息，我使用Python 2.7.11

在表达式i != "”"中，i和"”"都不是Unicode字符串。如果要比较Unicode字符，并且您知道test.txt是在utf-8中编码的，请尝试以下操作：

for i in word.decode('utf-8'):
    if i != u"”":

与您的问题没有直接相关，使用in可能比all()更容易：

if i not in u" ,!?;.:”":

这是一个经过测试的示例程序：

#!/usr/bin/env python
# -*- coding: utf-8 -*- 

f = open('text.txt', "r")
g = open('write.txt', "w")

for word in f:
  for i in word.decode('utf-8'):
    if i not in u" ,!?;.:”":
      g.write(i.lower())
    else:
      g.write('n')

输入text.txt：

hello.zippy”
goodbye

输出write.txt：

hello
zippy
goodbye

ROB的答案尚未完成。我必须将其放在文件的开头：

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

现在一切都像魅力一样！：D

相关内容

最新更新

热门标签：