我已经选择了提及和未提及"Korona"的行,并按日期对它们进行了计数。有些日期没有Korona True。数据帧看起来像:
表1
Published_date | Korona | 计数||
---|---|---|---|
242 | 2020-06-01 | 错误13 | [/tr>|
243 | 2020-06-01 | 真 | 3|
244 | 2020-06-02 | 错误 | 7|
245 | 2020-06-02 | 真 | 1|
246 | 2020-06-03 | 错误 | 11 |
247 | 2020-06-04 | 错误 | 8|
248 | 2020-06-04 | 真 | 1|
249 | 2020-06-05 | 错误 | 10 |
250 | 2020-06-06 | 错误5 | [/tr>|
251 | 2020-06-07 | 错误5 | |
252 | 2020-06-08 | 错误 | 14 |
尝试使用数据透视表
d = ''' Published_date Korona Count
242 2020-06-01 False 13
243 2020-06-01 True 3
244 2020-06-02 False 7
245 2020-06-02 True 1
246 2020-06-03 False 11
247 2020-06-04 False 8
248 2020-06-04 True 1
249 2020-06-05 False 10
250 2020-06-06 False 5
251 2020-06-07 False 5
252 2020-06-08 False 14'''
df = pd.read_csv(io.StringIO(d), sep='s+', engine='python')
# pivot the data and reset the index
df1 = pd.pivot_table(df, values='Count', index=['Published_date'],
columns=['Korona'], aggfunc=np.sum, fill_value=0).reset_index()
# rename the columns to what you want
df1.columns = ['Published_date', 'Count-NoKorona', 'Count-Korona']
# sum the values into a new column
df1['Count-All'] = df1[['Count-NoKorona', 'Count-Korona']].sum(axis=1)
输出:
Published_date Count-NoKorona Count-Korona Count-All
0 2020-06-01 13 3 16
1 2020-06-02 7 1 8
2 2020-06-03 11 0 11
3 2020-06-04 8 1 9
4 2020-06-05 10 0 10
5 2020-06-06 5 0 5
6 2020-06-07 5 0 5
7 2020-06-08 14 0 14