问题:我想从用户描述中提取国家/地区信息。到目前为止,我正在尝试使用地理包。我喜欢输入不是很清楚时的行为,例如在 Evesham 或 Rochdale 中,但是,当用户清除说它的位置在西班牙时,该包会将一些字符串(如Zaragoza, Spain
(解释为两次提及。不过,我不知道为什么阿姆斯特丹不作为荷兰的产出......如何改进输出?我错过了什么重要的东西吗?有没有更好的一揽子计划来实现这一点?
数据:我的数据示例是:
user_location
2 Socialist Republic of Alachua
3 Hérault, France
4 Gwalior, India
5 Zaragoza,España
7 amsterdam
8 Evesham
9 Rochdale
我想得到这样的东西:
user_location country
2 Socialist Republic of Alachua ['USSR', 'United States']
3 Hérault, France ['France']
4 Gwalior, India ['India']
5 Zaragoza,España ['Spain']
7 amsterdam ['Holland']
8 Evesham ['United Kingdom']
9 Rochdale ['United Kingdom', 'United States']
雷普雷克斯:
import pandas as pd
import geograpy3
df = pd.DataFrame.from_dict({'user_location': {2: 'Socialist Republic of Alachua', 3: 'Hérault, France', 4: 'Gwalior, India', 5: 'Zaragoza,España', 7: 'amsterdam ', 8: 'Evesham', 9: 'Rochdale'}})
df['country'] = df['user_location'].apply(lambda x: geograpy.get_place_context(text=x).countries if pd.notnull(x) else x)
print(df)
#> user_location country
#> 2 Socialist Republic of Alachua [USSR, Union of Soviet Socialist Republics, Al...
#> 3 Hérault, France [France, Hérault]
#> 4 Gwalior, India [British Indian Ocean Territory, Gwalior, India]
#> 5 Zaragoza,España [Zaragoza, España, Spain, El Salvador]
#> 7 amsterdam []
#> 8 Evesham [Evesham, United Kingdom]
#> 9 Rochdale [Rochdale, United Kingdom, United States]
创建于 2020-06-02 由 reprexpy 软件包
geograpy3在国家/地区查找方面不再正确,因为它没有检查pycountry是否返回了None。作为提交者,我刚刚解决了这个问题。 我添加了您稍作修改的示例(以避免 pandas 导入(作为单元测试用例:
def testStackoverflow62152428(self):
'''
see https://stackoverflow.com/questions/62152428/extracting-country-information-from-description-using-geograpy?noredirect=1#comment112899776_62152428
'''
examples={2: 'Socialist Republic of Alachua', 3: 'Hérault, France', 4: 'Gwalior, India', 5: 'Zaragoza,España', 7: 'amsterdam ', 8: 'Evesham', 9: 'Rochdale'}
for index,text in examples.items():
places=geograpy.get_geoPlace_context(text=text)
print("example %d: %s" % (index,places.countries))
结果现在:
example 2: ['United States']
example 3: ['France']
example 4: ['British Indian Ocean Territory', 'India']
example 5: ['Spain', 'El Salvador']
example 7: []
example 8: ['United Kingdom']
example 9: ['United Kingdom', 'United States']
例如5,确实有改进的余地。我 https://github.com/somnathrakshit/geograpy3/issues/7 添加了一个问题 - 请继续关注...