如何计算由整数和空格组成的列的频率



我有一个csv文件,如下所示,它有10000多行在此处输入链接描述ID参考(_r(235R23 356982B 362C879空白625478 119S4284 11985U12 11524555 5899L852 601024T4 58102W49 3258q34空白….


I'd like to calculate the frequency for col Ref_r (1 to 99), where consists integers and blanks:

df=pd.DataFrame(数据(df1=df[‘Ref_r’]。value_counts((

however it doesn't work....
Expected result would be:
```none
Ref_r    Frequency
1           0
2           0
3           3
...
11          3
...
58          2
59          0
60          1
99          0
blank       2

IIUC,您可以将panda与一起使用

import pandas as pd
(pd.read_csv('input.csv', sep='s+', dtype='str')['Ref_r']
.value_counts(sort=False)
.reindex(list(map(str, range(1,100)))+['blank'], fill_value=0)
.rename_axis('Ref_r')
.reset_index(name='Frequency')
.to_csv('out.csv', index=False, sep='t')
)

示例输出:

Ref_r   Frequency
1   0
2   0
3   3
4   0
5   0
6   0
7   0
8   0
9   0
10  0
11  3
12  0
...
98  0
99  0
blank   2

最新更新