我正在尝试使用python中的geography3库从文本中提取位置。
import geograpy
address = 'Jersey City New Jersey 07306'
places = geograpy.get_place_context(text = address)
对此,我收到以下错误UnicodeDecodeError:
~Anacondalibsite-packagesgeograpyplaces.py in populate_db(self)
28 with open(cur_dir + "/data/GeoLite2-City-Locations.csv") as info:
29 reader = csv.reader(info)
---> 30 for row in reader:
31 print(row)
32 cur.execute("INSERT INTO cities VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?);", row)
~Anacondalibencodingscp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return
codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 276: character maps to <undefined>
经过一番调查,我尝试修改 places.py 文件,并在第 30 行中添加编码 ="utf-8"----->
with open(cur_dir + "/data/GeoLite2-City-Locations.csv", encoding="utf-8") as info:
但它仍然给了我同样的错误。我还尝试将GeoLite2-City-Locations.csv保存在我的桌面上,然后尝试使用相同的代码读取它。
with open("GeoLite2-City-Locations.csv", encoding="utf-8") as info:
reader = csv.reader(info)
for row in reader:
print(row)
它工作绝对正常,并打印GeoLite2-城市位置.csv的所有行。我不明白这个问题!
作为 geograpy3 的提交者来重现您的问题,我在最新的 geograpy3 https://github.com/somnathrakshit/geograpy3/blob/master/tests/test_extractor.py 中添加了一个测试:
结果是:
['Jersey', 'City'
因此,您可以简单地切换到最新版本。
def testStackoverflow54077973(self):
'''
see https://stackoverflow.com/questions/54077973/geograpy3-library-for-extracting-the-locations-in-the-text-gives-unicodedecodee
'''
address = 'Jersey City New Jersey 07306'
e=Extractor(text=address)
e.find_entities()
self.check(e.places,['Jersey','City'])
像以前一样指定编码encoding='utf-8'
,尽管在 places.py(49 行)correct_country_mispelling(self, s)
方法中<</p>
经过一些调查,在某些情况下,这是一个Windows vs Linux错误。即使使用
with open(cur_dir + "/data/GeoLite2-City-Locations.csv", encoding="utf-8") as info:
我无法解决我的Windows计算机上的错误。但是,完全相同的代码在我使用的 Linux 计算机上也运行良好。我在Linux上查看了City-Locations.csv
文件,它似乎LibreOffice自动编码和/或解析了所有字符。在Excel中查看同一文件时,我仍然会有导致错误的所有时髦字符。出于某种原因,Excel 坚持保留奇数字符。