如何用一个小的平均值得到有顺序的值

我有一个像

这样的数据帧

x  y    cluster
0  112  4
0  113  4
1  111  4

，我将从这段代码中获取位置:

for n in range(0,9): 
...
location = np.array(cluseter_location )

我想按照列'cluster'与小列'y'的顺序进行排序，所以我尝试:

for n in range(0,9):
cluster_ = data2[data2['cluster_id']== n]
...

修改代码对元组列表进行排序

在您的代码中，不是附加cluster_int，而是附加元组(n, cluster_int)，然后在排序时，使用lambda按每个元组的第二个值进行排序。

for n in range(0,9):
cluster_ = data2[data2['cluster_id']== n]
cluster_list = cluster_['y'].tolist()
cluster_avg = sum(cluster_list)/len(cluster_list)
cluster_int = int(cluster_avg)
print("cluster_id : %d" %n ,"average : %d" %cluster_int)
lst.append((n,cluster_int))                              #<-------
a = sorted(lst, key = lambda x:x[1])                     #<-------

print(a)                                                     #<-------
ordered_average = [average for cluster, average in a]        #<-------
ordered_clusters = [cluster for cluster, average in a]       #<-------
print(ordered_average)                                       #<-------
print(ordered_clusters)                                      #<-------

#cluster and average together
[(4,112),(8,121,(1,127),(6,139),(5,149)]
#averages sorted
[112, 121, 127, 139, 149]
#clusters sorted
[4,8,1,6,5]

使用pandas的替代方法

一种更快的方法是直接按groupby对pandas数据框进行排序。

print(df.groupby('cluster')['y'].mean().reset_index().sort_values('y'))

修改代码对元组列表进行排序

使用pandas的替代方法

相关内容

最新更新

热门标签：