我使用pandas
和numpy
库来计算两个简单列表的pearson相关性。以下代码的输出是相关矩阵:
import numpy as np
import pandas as pd
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([2, 1, 4, 5, 8, 12, 18, 25, 96, 48])
z = np.array([5, 3, 2, 1, 0, -2, -8, -11, -15, -16])
x, y, z = pd.Series(x), pd.Series(y), pd.Series(z)
xyz = pd.DataFrame({'dist-values': x, 'uptime-values': y, 'speed-values': z})
matrix = xyz.corr(method="pearson")
在输出中使用.unstack()
和.to_dict()
函数后,我们可以获得以下格式的词典,根据本文的答案,我们可以将输出转换为词典列表:
result = (matrix.unstack().rename_axis(['f1', 'f2'])
.reset_index(name='value').to_dict('records')
)
# the output format after printing
[{'f1': 'dist-values', 'f2': 'dist-values', 'value': 1.0},
{'f1': 'dist-values', 'f2': 'uptime-values', 'value': 0.7586402890911869},
{'f1': 'dist-values', 'f2': 'speed-values', 'value': -0.9680724198337364},
{'f1': 'uptime-values', 'f2': 'dist-values', 'value': 0.7586402890911869},
{'f1': 'uptime-values', 'f2': 'uptime-values', 'value': 1.0},
{'f1': 'uptime-values', 'f2': 'speed-values', 'value': -0.8340792243486527},
{'f1': 'speed-values', 'f2': 'dist-values', 'value': -0.9680724198337364},
{'f1': 'speed-values', 'f2': 'uptime-values', 'value': -0.8340792243486527},
{'f1': 'speed-values', 'f2': 'speed-values', 'value': 1.0}]
但我需要一个更复杂的格式,输出应该是这样的:
[
{'name': 'dist-values', 'data': [{'x': 'dist-values', 'y': 1.0}, {'x': 'uptime-values', 'y': 0.7586402890911869}, {'x': 'speed-values', 'y': -0.9680724198337364}]},
{'name': 'uptime-values', 'data': [{'x': 'dist-values', 'y': 0.7586402890911869}, {'x': 'uptime-values', 'y': 1.0}, {'x': 'speed-values', 'y': -0.8340792243486527}]},
{'name': 'speed-values', 'data': [{'x': 'dist-values', 'y': -0.9680724198337364}, {'x': 'uptime-values', 'y': -0.8340792243486527}, {'x': 'speed-values', 'y': 1.0}]},
]
这个代码只有三个特征,相关矩阵只有9个元素,但在一个更大的矩阵中,我们如何实现这种转换?有有效的方法吗?谢谢
您可以尝试列表理解来获得您的输出:
out = [
{"name": i, "data": [{"x": c, "y": row[c]} for c in row.index]}
for i, row in matrix.iterrows()
]
print(out)
打印:
[
{
"name": "dist-values",
"data": [
{"x": "dist-values", "y": 1.0},
{"x": "uptime-values", "y": 0.7586402890911869},
{"x": "speed-values", "y": -0.9680724198337364},
],
},
{
"name": "uptime-values",
"data": [
{"x": "dist-values", "y": 0.7586402890911869},
{"x": "uptime-values", "y": 1.0},
{"x": "speed-values", "y": -0.8340792243486527},
],
},
{
"name": "speed-values",
"data": [
{"x": "dist-values", "y": -0.9680724198337364},
{"x": "uptime-values", "y": -0.8340792243486527},
{"x": "speed-values", "y": 1.0},
],
},
]
第一个答案是更好的
from collections import defaultdict
lst1 = [
{'f1': 'dist-values', 'f2': 'dist-values', 'value': 1.0},
{'f1': 'dist-values', 'f2': 'uptime-values', 'value': 0.7586402890911869},
{'f1': 'dist-values', 'f2': 'speed-values', 'value': -0.9680724198337364},
{'f1': 'uptime-values', 'f2': 'dist-values', 'value': 0.7586402890911869},
{'f1': 'uptime-values', 'f2': 'uptime-values', 'value': 1.0},
{'f1': 'uptime-values', 'f2': 'speed-values', 'value': -0.8340792243486527},
{'f1': 'speed-values', 'f2': 'dist-values', 'value': -0.9680724198337364},
{'f1': 'speed-values', 'f2': 'uptime-values', 'value': -0.8340792243486527},
{'f1': 'speed-values', 'f2': 'speed-values', 'value': 1.0}
]
dct2 = defaultdict(list)
for row in lst1:
dct2[row['f1']].append({'x':row['f2'], 'y':row['value']})
lst2 = [{'name':k, 'data':v} for k, v in dct2.items()]
print(lst2)
输出:
[
{'name': 'dist-values', 'data': [
{'x': 'dist-values', 'y': 1.0},
{'x': 'uptime-values', 'y': 0.7586402890911869},
{'x': 'speed-values', 'y': -0.9680724198337364}]
},
{'name': 'uptime-values', 'data': [
{'x': 'dist-values', 'y': 0.7586402890911869},
{'x': 'uptime-values', 'y': 1.0},
{'x': 'speed-values', 'y': -0.8340792243486527}]
},
{'name': 'speed-values', 'data': [
{'x': 'dist-values', 'y': -0.9680724198337364},
{'x': 'uptime-values', 'y': -0.8340792243486527},
{'x': 'speed-values', 'y': 1.0}]
}
]