name_region
bahia [10, 11, 12, 1, 2, 3, 4]
distrito_federal [9, 10, 11, 12, 1, 2, 3, 4]
goias [9, 10, 11, 12, 1, 2, 3, 4]
maranhao [10, 11, 12, 1, 2, 3, 4]
mato_grosso [9, 10, 11, 12, 1, 2, 3, 4]
mato_grosso_do_sul [8, 9, 10, 11, 12, 1, 2, 3]
我上面有一个熊猫系列,是从分组操作中获得的。第二列表示一年中的月份。如何构建月份的超集,即 [8, 9, 10, 11, 12, 1, 2, 3, 4]
因为这代表了数据集
--注意:我确实想保留 order
itertools 配方unique_everseen
(保留顺序(,如下所示:
>>> [i for i in unique_everseen([z for z in y['months'] for x,y in df.iterrows()])]
[9, 10, 11, 12, 1, 2, 3, 4]
unique_everseen
的定义:
import itertools as it
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in it.ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
我似乎误解了问题中的数据结构,但由于它可能对类似情况有用,我将把这个答案留在这里以供将来参考。
您可以使用numpy的独特功能。
import pandas as pd
import numpy as np
df = pd.DataFrame({"x": [1,3,5], "y": [3,4,5]})
print np.unique(df) # prints [1 3 4 5]
是否有办法在熊猫中更干净地做到这一点,所以如果其他人知道,请回答......查看类型,这似乎是折叠该列的时间。
我没有在熊猫中看到折叠操作,所以也许只是一个累积的 for 循环,即
all_months = []
for row in df.iterrows():
months = row['months']
all_months += [e for e in months if not e in all_months]
再三考虑..会使用set
而不是复杂的理解
all_months = set()
for row in df.iterrows():
months = set(row['months'])
all_months = all_months.union(months)
嗯,刚刚看到其他人的回答,还没有测试过..但是看起来更好! 选择那个:)。发布此内容以防万一它对某人有所帮助...