我编写了一个代码来从数据帧中提取索引,但我不知道如何使用这些索引从原始数据帧创建另一个数据帧。
是否可以缩短我当前的代码?它很长。
已编辑==
import pandas as pd
a = pd.DataFrame({"a":["I have something", "I have nothing", "she has something", "she is nice", "she is not nice","Me", "He"],
"b":[["man"], ["man", "eating"], ["cat"], ["man"], ["cat"], ["man"], ["cat"]]})
a = a[a.b.apply(lambda x:len(x)) == 1] # is it possible to shorten the code from here
c = a.explode("b").groupby("b")
k = ["man", "cat"]
bb = a
for x in k:
bb = c.get_group(x).head(2).index # to here?.... this part is supposed to take the first 2 indexes of each element in k
当前结果:
a b
4 she is not nice [cat]
Expected results:
a b
0 I have something [man]
2 she has something [cat]
3 she is nice [man]
4 she is not nice [cat]
首先按Series.str.len
过滤,然后将一个元素字符串转换为字符串,因此可能通过Series.duplicated
测试重复性。按~
反转布尔掩码并按boolean indexing
过滤:
a = a[a.b.str.len() == 1]
b = a[~a['b'].str[0].duplicated()]
print (b)
a b
3 she is nice [man]
4 she is not nice [cat]
编辑:对于多个值,请使用GroupBy.head
:
b1 = a.groupby(a['b'].str[0]).head(2)
print (b1)
a b
0 I have something [man]
2 she has something [cat]
3 she is nice [man]
4 she is not nice [cat]