如何获得每个国家每月的总病例列表



我想了一个小时,我尝试了很多不同的方法,但由于这个csv文件对我来说太复杂了(我是初学者(,我做不到。我需要每个月每个国家的总病例的多维数组。解决这个问题后,我将使用这些数组来代替值来创建热图。

更清楚地说,例如法国我需要这样的总病例数_法国=(12月总病例数,1月总病例,……11月总病例(我需要为每个国家做这件事,并获得多维数组。从这里访问CSV文件

我不知道你是不是指月份的总和

  • 所有病例的总和仅在12月
  • 仅1月份所有病例的总和
  • 2月份所有病例的总和,等等

或累计总和

  • 12月份所有病例的总和
  • 12月+1月所有病例的总和
  • 12月+1月+2月所有病例的总和等等

对于这两种情况,我将使用列new cases和列pandas以及列groupby(['location', 'year-month'])

首先,我需要从year-month-day创建year-month

df['year-month'] = df['date'].str[:7]

接下来我可以对进行分组

groups = df.groupby(['location', 'year-month'])

和CCD_ 6求和得到仅新情况的和

df_sum = groups.sum().reset_index()[['location', 'year-month', 'new_cases']]

结果

location year-month  new_cases
0     Afghanistan    2019-12        0.0
1     Afghanistan    2020-01        0.0
2     Afghanistan    2020-02        1.0
3     Afghanistan    2020-03      140.0
4     Afghanistan    2020-04     1808.0
...           ...        ...        ...
2112     Zimbabwe    2020-07     2518.0
2113     Zimbabwe    2020-08     3320.0
2114     Zimbabwe    2020-09     1425.0
2115     Zimbabwe    2020-10      525.0
2116     Zimbabwe    2020-11      858.0

我可以将其与cumsum()一起使用,以获得总/累积和

df_sum['total_cases'] = df_sum.groupby('location')['new_cases'].cumsum()

结果

location year-month  new_cases  total_cases
0     Afghanistan    2019-12        0.0          0.0
1     Afghanistan    2020-01        0.0          0.0
2     Afghanistan    2020-02        1.0          1.0
3     Afghanistan    2020-03      140.0        141.0
4     Afghanistan    2020-04     1808.0       1949.0
...           ...        ...        ...          ...
2112     Zimbabwe    2020-07     2518.0       3092.0
2113     Zimbabwe    2020-08     3320.0       6412.0
2114     Zimbabwe    2020-09     1425.0       7837.0
2115     Zimbabwe    2020-10      525.0       8362.0
2116     Zimbabwe    2020-11      858.0       9220.0

我只能得到一个国家的

df_sum[ df_sum['location'] == 'France' ]
df_sum[ df_sum['location'] == 'Germany' ]

结果

location year-month  new_cases  total_cases
671   France    2019-12        0.0          0.0
672   France    2020-01        6.0          6.0
673   France    2020-02       51.0         57.0
674   France    2020-03    44493.0      44550.0
675   France    2020-04    83892.0     128442.0
676   France    2020-05    23054.0     151496.0
677   France    2020-06    12764.0     164260.0
678   France    2020-07    22313.0     186573.0
679   France    2020-08    91370.0     277943.0
680   France    2020-09   272747.0     550690.0
681   France    2020-10   781294.0    1331984.0
682   France    2020-11   808224.0    2140208.0
location year-month  new_cases  total_cases
722  Germany    2019-12        0.0          0.0
723  Germany    2020-01        5.0          5.0
724  Germany    2020-02       52.0         57.0
725  Germany    2020-03    61856.0      61913.0
726  Germany    2020-04    97206.0     159119.0
727  Germany    2020-05    22363.0     181482.0
728  Germany    2020-06    12777.0     194259.0
729  Germany    2020-07    14439.0     208698.0
730  Germany    2020-08    33683.0     242381.0
731  Germany    2020-09    46838.0     289219.0
732  Germany    2020-10   229534.0     518753.0
733  Germany    2020-11   410380.0     929133.0

或者,我可以使用groupby(['locations'])来拆分到分离的列表数据帧,并创建字典{"France": df_frances, "Germany": df_germany, ...}

data = {}
for country, values in groups:
data[country] = values

全代码

import pandas as pd
df = pd.read_csv('ex1.csv')
print(df.columns)
df['year-month'] = df['date'].str[:7]
#print(df['year-month'].head())
groups = df.groupby(['location', 'year-month'])
df_sum = groups.sum().reset_index()[['location', 'year-month', 'new_cases']]
#print(df_sum)
df_sum['total_cases'] = df_sum.groupby('location')['new_cases'].cumsum()
print(df_sum)

print(df_sum[ df_sum['location'] == 'France' ])
print(df_sum[ df_sum['location'] == 'Germany' ])
data = {}
for country, values in groups:
data[country] = values
print(data)

相关内容

  • 没有找到相关文章

最新更新