我目前有一个字典dicts
,看起来像这样(代码片段):
{'Axa': w x y z
0 9.597307 8.533429 43.4 Axa
6 0.000000 4.631714 32.0 Axa
17 0.662168 6.271585 37.7 Axa
.. ... ... ... ...
171 4.023485 9.104185 28.2 Axa
172 0.846931 5.703871 38.8 Axa
174 20.063263 6.436114 27.7 Axa
[66 rows x 4 columns]}
{'Bxa': w x y z
1 0.454497 5.443401 43.6 Bxa
3 0.086371 4.869583 42.3 Bxa
4 2.264084 7.330367 36.6 Bxa
5 7.312782 12.418908 38.0 Bxa
8 10.935617 1.474324 43.5 Bxa
[29 rows x 4 columns]}
它是一个包含keys = {Axa, Bxa, Cxa, Dxa}
的字典,值将是w
到z
列。
type(dicts)
给出
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
我想检索一个数据帧,看起来像这样:
| w | x | w | z |
| -------- | ----------| --------| -----|
0 9.597307 | 8.533429 | 43.4 | Axa|
1 0.000000 | 4.631714 | 32.0 | Axa |
2 0.662168 | 6.271585 | 37.7 | Axa |
...
63 4.023485 | 9.104185 | 28.2 | Axa|
64 0.846931| 5.703871| 38.8 | Axa |
65 20.063263| 6.436114| 27.7 | Axa |
67 0.454497 | 5.443401| 43.6 | Bxa|
68 0.086371 | 4.869583 | 42.3 | Bxa |
69 2.264084 | 7.330367| 36.6 | Bxa |
我已经试过了:
df = pd.DataFrame(list(dicts.values()), columns = ['w', 'x', 'y', 'z'])
但是我得到了这个:
ValueError: Must pass 2-d input. shape=(1, 66, 4)
通常我们使用key
作为列名和values
作为列值,但在这种情况下,我希望我的values
是两者。我该怎么做呢?
这是我的完整代码:
for ph in data.model.unique():
dicts = {}
"This loop aims to extract outliers from the dataset using Gaussian mixture models for each phone model and create new df"
data = data[data.model==ph]
data = data[['r_var', 'b_var', 'SPAD', 'model']]
data = data[['r_var', 'b_var']].values
probs = gmm.score_samples(data)
probs_mean, probs_sd = mean(probs), std(probs)
cut_off = probs_sd * 2
lower, upper = probs_mean - cut_off, probs_mean + cut_off
not_outliers = data[probs > lower]
# append to dicts
dicts[ph] = not_outliers
df = pd.concat(dicts).reset_index(drop=True)
print(df)
如果字典的值是数据帧,则可以使用concat
:
df = pd.concat(list(dicts.values()),ignore_index=True)
使用concat
和DataFrame.reset_index
:
df = pd.concat(dicts).reset_index(drop=True)
编辑:
你的解决方案是有必要改变dicts = {}
和concat
外循环:
dicts = {}
for ph in data.model.unique():
"This loop aims to extract outliers from the dataset using Gaussian mixture models for each phone model and create new df"
data = data[data.model==ph]
data = data[['r_var', 'b_var', 'SPAD', 'model']]
data = data[['r_var', 'b_var']].values
probs = gmm.score_samples(data)
probs_mean, probs_sd = mean(probs), std(probs)
cut_off = probs_sd * 2
lower, upper = probs_mean - cut_off, probs_mean + cut_off
not_outliers = dataf[probs > lower]
# append to dicts
dicts[ph] = not_outliers
df = pd.concat(dicts).reset_index(drop=True)
print(len(df))