在python中计算两个字典中值之间的平均绝对百分比误差



我有一个位置字典,然后是属性值对,如下所示:

{"Russia": 
{"/location/statistical_region/size_of_armed_forces": 65700.0,
"/location/statistical_region/gni_per_capita_in_ppp_dollars": 42530.0, 
"/location/statistical_region/gdp_nominal": 1736050505050.0,
"/location/statistical_region/foreign_direct_investment_net_inflows": 8683048195.0, 
"/location/statistical_region/life_expectancy": 80.929, ...

每个国家都是如此。

然后是一个包含单个数组的字典,数组中的每个值都是一个有3个键的字典:

{
"sentences": [
{
"location-value-pair": {
"Russia": 6.1
}, 
"parsedSentence": "On Tuesday , the Federal State Statistics Service -LRB- Rosstat -RRB- reported that consumer price inflation in LOCATION_SLOT hit a historic post-Soviet period low of NUMBER_SLOT percent in 2011 , citing final data .", 
"sentence": "On Tuesday , the Federal State Statistics Service -LRB- Rosstat -RRB- reported that consumer price inflation in Russia hit a historic post-Soviet period low of 6.1 percent in 2011 , citing final data ."
}, 
{
"location-value-pair": {
"Russia": 8.8
}, 
"parsedSentence": "In 2010 , annual inflation in LOCATION_SLOT hit NUMBER_SLOT percent due to the summer drought , exceeding forecasts and equalling the figure for 2009 , the year of the global financial meltdown .", 
"sentence": "In 2010 , annual inflation in Russia hit 8.8 percent due to the summer drought , exceeding forecasts and equalling the figure for 2009 , the year of the global financial meltdown ."
}, ...

我想做的是比较每个句子,以及该句子中的每个位置和值,计算第一个字典中与该位置-值对匹配的最接近的匹配值,然后返回它对应的顶部统计属性,并将其添加为句子字典的新关键字。

例如:

对于第1句,我看到我们看到的是俄罗斯和6.1的值。我想索引到第一本字典中,找到"Russia",并浏览所有存在的值,例如65700.042530.01736050505050.08683048195.0。然后,我想找到每个属性的平均绝对误差,例如size_of_armed_forces值为23%,gni_per_capital属性为10%等。然后,我想要找到最小的一个,比如说,并将其作为第二个字典的关键字添加,因此:

{
"location-value-pair": {
"Russia": 6.1
}, 
"predictedRegion": "/location/statistical_region/gni_in_ppp_dollars"
"meanabserror": 2%
"parsedSentence": "On Tuesday , the Federal State Statistics Service -LRB- Rosstat -RRB- reported that consumer price inflation in LOCATION_SLOT hit a historic post-Soviet period low of NUMBER_SLOT percent in 2011 , citing final data .", 
"sentence": "On Tuesday , the Federal State Statistics Service -LRB- Rosstat -RRB- reported that consumer price inflation in Russia hit a historic post-Soviet period low of 6.1 percent in 2011 , citing final data ."
}, 

当我想到写这篇文章时,我的困惑只是如何访问另一本词典的键值作为另一本字典的条件。我目前的想法是:

def predictRegion(sentenceArray,trueDict):
absPercentageErrors = {}
for location, property2value in trueDict.items():
print location
absPercentageErrors['location'] = {}
for property,trueValue in property2value.iteritems():
print property
absError = abs(sentenceArray['sentences']['location-value-pair'].key() - trueValue)
absPercentageErrors['location']['property'] = absError/numpy.abs(trueValue)
for index, dataTriples in enumerate(sentenceArray["sentences"]):
for location, trueValue in dataTriples['location-value-pair'].items():
print location

然而,很明显,我无法访问此行中的sentenceArray['sentences']['location-value-pair'].key()absError = abs(sentenceArray['sentences']['location-value-pair'].key() - trueValue),因为它在循环之外。

如何从引用完全不同变量的循环中访问此键?

在未来阅读如何制定一个好的问题:https://stackoverflow.com/help/mcve

最小、完整且可验证


我想这就是你想要的。

countries = {'Canada': {'a': 10, 'b': 150, 'c': 1000},
'Russia': {'d': 9, 'e': 5, 'f': 1e5}}
sentences = [
{"location-value-pair": {"Russia": 6.1}, 
"parsedSentence": "bob loblaw", 
"sentence": "lobs law bomb"
}, 
{"location-value-pair": {"Russia": 8.8}, 
"parsedSentence": "some sentence", 
"sentence": "lorem ipsum test"
}]

def absError(numer,denom):
return abs(numer-denom)/float(denom)
def findMatch(target, country):
return min(country, key= lambda x: absError(target, country.get(x)))
def update(sentence):
(c,target), = sentence.get("location-value-pair").items()
country = countries[c]
matched = findMatch(target,country)
error = absError(target, country.get(matched))
res = sentence.copy()
res.update({'predictedRegion': matched, 'meanabserror': "{:.2f}%".format(100*error)})
return res
updated = [update(sentence) for sentence in sentences]    
updated 

输出:

[{'location-value-pair': {'Russia': 6.1},
'meanabserror': '22.00%',
'parsedSentence': 'bob loblaw',
'predictedRegion': 'e',
'sentence': 'lobs law bomb'},
{'location-value-pair': {'Russia': 8.8},
'meanabserror': '2.22%',
'parsedSentence': 'some sentence',
'predictedRegion': 'd',
'sentence': 'lorem ipsum test'}]

最新更新