level column DataFrame,看起来像这样:
df
Solid Liquid Gas
pen paper pipe water juice milk oxygen nitrogen helium
5 2 1 4 3 1 7 8 10
5 2 1 4 3 1 7 8 10
5 2 1 4 3 1 7 8 10
4 4 7 3 2 0 6 7 9
3 7 9 4 6 5 3 3 4
我想要的是在"固体"、"液体"one_answers"气体"中随机选择2列。有3个子列。
例如,如果随机选择固体和气体,预期结果应该是:Solid Gas
pen paper pipe oxygen nitrogen helium
5 2 1 7 8 10
5 2 1 7 8 10
5 2 1 7 8 10
4 4 7 6 7 9
3 7 9 3 3 4
我已经试过这个代码,但它没有给我相同的结果。
result = df.sample(n=5, axis=1)
result
[output]
Solid Gas
pipe oxygen
1 7
1 7
1 7
1 7
7 6
9 3
有谁能帮我弄明白这个吗?谢谢你:)
您可以对第一层列进行采样,然后选择采样的列:
df[pd.Series(df.columns.levels[0]).sample(2)]
或者使用random.sample
函数:
import random
df[random.sample(df.columns.levels[0].tolist(),2)]
import itertools
import pandas as pd
import numpy as np
from pandas import DataFrame as df
from itertools import zip_longest
arrays = [np.array(['Liquid', 'Liquid','Liquid', 'Solid', 'Solid','Solid', 'Gas', 'Gas', 'Gas']),
np.array(['water', 'nitrogen', 'juice', 'pen', 'paper', 'nitrogen', 'oxygen', 'helium','nitrogen'])]
df = pd.DataFrame(np.random.randn(3, 9), columns=arrays)
print(df.to_string())
"""
Liquid Solid Gas
water nitrogen juice pen paper nitrogen oxygen helium nitrogen
0 0.778774 0.243654 0.823253 -0.608256 -0.415255 1.472267 1.474572 -0.002190 0.712878
1 -0.648450 -0.801950 -2.100596 -0.627754 -0.060161 -0.691433 1.170950 0.023768 -0.613677
2 0.901922 0.069219 1.919909 -1.460708 -0.216709 -1.922276 1.045664 0.528569 0.779230
"""
l0 = ['Liquid','Solid','Gas']
l1 = [['water','juice'],['pen'],['helium','nitrogen']]
aa = [pd.DataFrame({'a': a,'b':b}) for a,b in zip(l0,l1)]
print(aa)
"""
[ a b
0 Liquid water
1 Liquid juice, a b
0 Solid pen, a b
0 Gas helium
1 Gas nitrogen]
"""
bb = pd.concat(aa)
print(bb)
"""
a b
0 Liquid water
1 Liquid juice
0 Solid pen
0 Gas helium
1 Gas nitrogen
"""
cc = pd.concat(aa).values
print(cc)
"""
[['Liquid' 'water']
['Liquid' 'coke']
['Solid' 'pen']
['Gas' 'helium']
['Gas' 'nitrogen']]
"""
dd = df[cc]
print(dd)
"""
Liquid Solid Gas
water juice pen helium nitrogen
0 -1.484977 -1.202752 0.048415 -0.054465 -0.355568
1 0.906612 1.355189 1.653327 1.184810 -0.934969
2 0.091918 -0.737838 0.610323 -2.164317 -1.529826
"""
"""
In a similar way, if we want only 2 columns.
selected 2 items from Liquid and from Gas. Then :
"""
l2 = ['Liquid','Gas']
l3 = [['water','juice'],['helium','nitrogen']]
p = pd.concat([pd.DataFrame({'a':a,'b':b})for a,b in zip(l2,l3)]).values
print(p)
p1 = df[p]
print(p1)
"""
Liquid Gas
water juice helium nitrogen
0 -1.484977 -1.202752 -0.054465 -0.355568
1 0.906612 1.355189 1.184810 -0.934969
2 0.091918 -0.737838 -2.164317 -1.529826
"""
"""
If you want only the information of nitrogen.
"""
aa = df.iloc[ : , df.columns.get_level_values(1) =='nitrogen' ]
print(aa)
"""
Liquid Solid Gas
nitrogen nitrogen nitrogen
0 0.369143 1.762105 -0.887656
1 2.035025 0.317349 -0.896609
2 -1.570745 0.208936 0.979549
"""