Unicode and urllib.open



我正在用python创建一个应用程序,可以从python中的yr.no解析天气数据。它可以很好地处理常规ASCII字符串,但当我使用unicode时失败。

def GetYRNOWeatherData(country, province, place):
    #Parse the XML file
    wtree = ET.parse(urllib.urlopen("http://www.yr.no/place/" + string.replace(country, ' ', '_').encode('utf-8') + "/" + string.replace(province, ' ', '_').encode('utf-8') + "/" + string.replace(place, ' ', '_').encode('utf-8') + "/forecast.xml"))

例如,当我输入

GetYRNOWeatherData("France", "Île-de-France", "Paris")

我得到这个错误

'charmap' codec can't encode character u'xce' in position 0: character maps to <undefined>

是真的,urllib不处理unicode很好吗?因为我使用Tkinter作为这个函数的前端,这会是问题的根源吗(Tkinter Entry小部件处理unicode很好吗?)

您可以通过将每个字符串保持为unicode直到您实际发出urllib.urlopen请求,此时您将encode变为utf-8来处理此问题:

#!/usr/bin/python
# -*- coding: utf-8 -*-
# This import makes all literal strings in the file default to
# type 'unicode' rather than type 'str'. You don't need to use this,
# but you'd need to do u"France" instead of just "France" below, and
# everywhere else you have a string literal.
from __future__ import unicode_literals
import urllib
import xml.etree.ElementTree as ET
def do_format(*args):
    ret = []
    for arg in args:
        ret.append(arg.replace(" ", "_"))
    return ret 

def GetYRNOWeatherData(country, province, place):
    country, province, place = do_format(country, province, place)
    url = "http://www.yr.no/place/{}/{}/{}/forecast.xml".format(country, province, place)
    wtree = ET.parse(urllib.urlopen(url.encode('utf-8')))
    return wtree

if __name__ == "__main__":
    GetYRNOWeatherData("France", "Île-de-France", "Paris")

最新更新