我有一个三列的数据集,看起来像这样:
X1 X5 X1
X2 X9 X2
X3 X3 X5
X4 X8 X3
X5 X1 X4
我想搜索所有列中存在的变量。在这种情况下,输出将是
X1
X3
X5
有人可以在python中帮助这样做吗?
如果您 .apply
value_counts
到列,您将获得以下内容:
In [25]: df
Out[25]:
a b c
0 X1 X5 X1
1 X2 X9 X2
2 X3 X3 X5
3 X4 X8 X3
4 X5 X1 X4
In [26]: df.apply(pd.Series.value_counts)
Out[26]:
a b c
X1 1.0 1.0 1.0
X2 1.0 NaN 1.0
X3 1.0 1.0 1.0
X4 1.0 NaN 1.0
X5 1.0 1.0 1.0
X8 NaN 1.0 NaN
X9 NaN 1.0 NaN
所以,您想要所有不为空的行...
In [28]: result = df.apply(pd.Series.value_counts).notnull().all(axis=1)
In [29]: result
Out[29]:
X1 True
X2 False
X3 True
X4 False
X5 True
X8 False
X9 False
dtype: bool
,您可以获得值为True
的 list
:
In [30]: [i for i, x in result.iteritems() if x]
Out[30]: ['X1', 'X3', 'X5']
和一种略有不同的方法:
In [50]: df
Out[50]:
a b c
0 X1 X5 X1
1 X2 X9 X2
2 X3 X3 X5
3 X4 X8 X3
4 X5 X1 X4
In [51]: uniq = pd.Series(np.unique(df.values))
In [52]: uniq
Out[52]:
0 X1
1 X2
2 X3
3 X4
4 X5
5 X8
6 X9
dtype: object
In [53]: result = df.apply(uniq.isin).all(axis=1)
In [54]: result.index = uniq
In [55]: result
Out[55]:
X1 True
X2 False
X3 True
X4 False
X5 True
X8 False
X9 False
dtype: bool
我可以想象的最简单解决方案:1.在每一列中制作一组值2.在上一个步骤中获得的所有集合
上设置交集df = pd.DataFrame(
{'a': ['x1', 'x2', 'x3', 'x4', 'x5'], 'b': ['x5', 'x9', 'x3', 'x8', 'x1'], 'c': ['x1', 'x1', 'x5', 'x3', 'x4']})
sets = [set(df[column]) for column in df.columns]
result = list(set.intersection(*sets))