将字符串从具有公共列值的数据帧追加到列表时重复



这里的初学者,我试图根据我分配给他们的集群值,将社区的名称从多伦多的数据帧中隔离出来。我没有列出3个唯一的项目,而是列出了2363个项目。

Neigh_List = []
for n in toronto_merged['Cluster Labels']:
if n == 7 :
x = toronto_merged['Neighborhood']
Neigh_List.append(x) if x not in Neigh_List else None      



Neigh_List
[0                                                                                                Parkwoods
1                                                                                                Parkwoods
2                                                                                         Victoria Village
3                                                                                         Victoria Village
4                                                                                         Victoria Village
...                                                
2359    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
2360    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
2361    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
2362    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
2363    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
Name: Neighborhood, Length: 2364, dtype: object]

通常,对于较大的数据集(~1000+(,应避免在Pandas数据帧上循环,因为Pandas内置的矢量化函数通常更快(请参阅另一篇stackoverflow文章(。

你可以试试这样的东西:

neigh_list = list(toronto_merged.loc[toronto_merged['Neighborhood'] == 7]]['Neighborhood'].unique())

此外,如果您想避免列表中的重复,可以使用python集(在撰写本文时请参阅5.4(。

unique_elements = set()
for x in some_iterable:
unique_elements.add(x)

或者,使用集合理解:

unique_elements = {unique_item for unique_item in some_iterable}

你试过使用熊猫自己的力量吗。选择"簇标签"等于7的所有行,获得唯一的邻域?


...
Neigh_List = toronto_merged.loc[lambda d: d['Cluster Labels'].eq(7)]['Neighborhood'].unique().tolist()
# instead of .unique(), you can also do .drop_duplicates() which is faster

最新更新