我有dx
:
City Val1 Val2
x1 N1 0.0000 1.0
x2 N1 790.9980 6.0
x3 N2 110.3755 4.0
x4 N1 191.8150 2.0
x5 N2 574.3100 3.0
x6 N1 725.4000 4.0
x7 N3 362.7000 2.0
x8 N1 196.4800 1.0
x9 N3 230.8200 2.0
x10 N1 259.8830 1.0
可复制示例:
dx = pd.DataFrame({'City': {'x1': 'N1',
'x2': 'N1',
'x3': 'N2',
'x4': 'N1',
'x5': 'N2',
'x6': 'N1',
'x7': 'N3',
'x8': 'N1',
'x9': 'N3',
'x10': 'N1'},
'Val1': {'x1': 0.0,
'x2': 790.998,
'x3': 110.3755,
'x4': 191.81500000000003,
'x5': 574.31,
'x6': 725.4,
'x7': 362.7,
'x8': 196.48,
'x9': 230.82,
'x10': 259.883},
'Val2': {'x1': 1.0,
'x2': 6.0,
'x3': 4.0,
'x4': 2.0,
'x5': 3.0,
'x6': 4.0,
'x7': 2.0,
'x8': 1.0,
'x9': 2.0,
'x10': 1.0}})
dx
我想按City
对它们进行分组,并计算[0,200,400,600,800,1000]
范围内的Val1
的数量。
预期输出:
(0, 200] (200, 400] (400, 600] (600, 800] (800, 1000]
City
N1 2 1 0 2 0
N2 1 0 1 0 0
N3 0 2 0 0 0
将crosstab
与cut
一起使用,并通过DataFrame.reindex
与cat.categories
:添加缺失类别
bins = [0,200,400,600,800,1000]
s = pd.cut(dx['Val1'], bins=bins)
df = pd.crosstab(dx['State'], s).reindex(s.cat.categories, axis=1, fill_value=0)
print (df)
Val1 (0, 200] (200, 400] (400, 600] (600, 800] (800, 1000]
State
N1 2 1 0 2 0
N2 1 0 1 0 0
N3 0 2 0 0 0
如果使用DataFrame.pivot_table
或GroupBy.size
和Series.unstack
,则添加所有类别:
df = dx.pivot_table(index='State', columns=pd.cut(dx['Val1'], bins=bins), aggfunc='size')
df = dx.groupby(['State', pd.cut(dx['Val1'], bins=bins)]).size().unstack()
print (df)
Val1 (0, 200] (200, 400] (400, 600] (600, 800] (800, 1000]
State
N1 2 1 0 2 0
N2 1 0 1 0 0
N3 0 2 0 0 0
或者您可以尝试:
df = df.assign(bins=pd.cut(df.Val1, bins=[0, 200, 400, 600, 800, 1000])).pivot_table(
index='City', columns='bins', values='Val2', aggfunc='count')
OUTPUT
:
bins (0, 200] (200, 400] (400, 600] (600, 800] (800, 1000]
City
N1 2 1 0 2 0
N2 1 0 1 0 0
N3 0 2 0 0 0