如何将我的字典值转换为数据框中的列?



我目前有一个字典dicts,看起来像这样(代码片段):

{'Axa':          w      x     y     z
0     9.597307   8.533429  43.4  Axa
6     0.000000   4.631714  32.0  Axa
17    0.662168   6.271585  37.7  Axa
..         ...        ...   ...        ...
171   4.023485   9.104185  28.2  Axa
172   0.846931   5.703871  38.8  Axa
174  20.063263   6.436114  27.7  Axa
[66 rows x 4 columns]}
{'Bxa':         w      x    y         z
1     0.454497   5.443401  43.6  Bxa
3     0.086371   4.869583  42.3  Bxa
4     2.264084   7.330367  36.6  Bxa
5     7.312782  12.418908  38.0  Bxa
8    10.935617   1.474324  43.5  Bxa
[29 rows x 4 columns]} 

它是一个包含keys = {Axa, Bxa, Cxa, Dxa}的字典,值将是wz列。

type(dicts)给出

<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>
<class 'dict'>

我想检索一个数据帧,看起来像这样:

| w          |    x      |   w     |    z | 
| --------   | ----------| --------| -----|
0  9.597307  | 8.533429  |   43.4  |   Axa|
1  0.000000  |  4.631714 |   32.0  |  Axa |
2  0.662168  |  6.271585 |   37.7  |  Axa |
...
63  4.023485 |  9.104185 |  28.2   |   Axa|
64   0.846931|   5.703871|  38.8   |  Axa |
65  20.063263|   6.436114| 27.7    | Axa  |
67  0.454497 |   5.443401|   43.6  |   Bxa|
68  0.086371 |  4.869583 | 42.3    | Bxa  |
69  2.264084 |   7.330367|  36.6   | Bxa  |

我已经试过了:

df = pd.DataFrame(list(dicts.values()), columns = ['w', 'x', 'y', 'z'])

但是我得到了这个:

ValueError: Must pass 2-d input. shape=(1, 66, 4)

通常我们使用key作为列名和values作为列值,但在这种情况下,我希望我的values是两者。我该怎么做呢?

这是我的完整代码:

for ph in data.model.unique():

dicts = {}
"This loop aims to extract outliers from the dataset using Gaussian mixture models for each phone model and create new df"
data = data[data.model==ph]
data = data[['r_var',  'b_var', 'SPAD', 'model']]
data = data[['r_var', 'b_var']].values
probs = gmm.score_samples(data)
probs_mean, probs_sd = mean(probs), std(probs)
cut_off = probs_sd * 2
lower, upper = probs_mean - cut_off, probs_mean + cut_off
not_outliers = data[probs > lower]
# append to dicts
dicts[ph] = not_outliers
df = pd.concat(dicts).reset_index(drop=True)
print(df)

如果字典的值是数据帧,则可以使用concat:

df = pd.concat(list(dicts.values()),ignore_index=True)

使用concatDataFrame.reset_index:

df = pd.concat(dicts).reset_index(drop=True)

编辑:

你的解决方案是有必要改变dicts = {}concat外循环:

dicts = {}
for ph in data.model.unique():
"This loop aims to extract outliers from the dataset using Gaussian mixture models for each phone model and create new df"
data = data[data.model==ph]
data = data[['r_var',  'b_var', 'SPAD', 'model']]
data = data[['r_var', 'b_var']].values
probs = gmm.score_samples(data)
probs_mean, probs_sd = mean(probs), std(probs)
cut_off = probs_sd * 2
lower, upper = probs_mean - cut_off, probs_mean + cut_off
not_outliers = dataf[probs > lower]
# append to dicts
dicts[ph] = not_outliers

df = pd.concat(dicts).reset_index(drop=True)
print(len(df))

相关内容

  • 没有找到相关文章