如何计算季度差异并在python中添加缺少的季度计数



我有一个这样的数据帧,我必须得到缺失的季度值,并在它们之间计数与季度缺失计数相同,填充数据框为

year    Data    Id       number
2019Q4   57170   A     1
2019Q3   55150   A     1
2019Q2   51109   A     1
2019Q1   51109   A     1
2018Q1   57170   B     5
2018Q4   55150   B     5
2017Q4   51109   C     7
2017Q2   51109   C     7
2017Q1   51109   C     7 

Id开始年度结束年度计数

B  2018Q2    2018Q3        2  5
B  2017Q3    2018Q3        1  5

我如何使用python panda 实现这一点

使用:

#convert datetimes to quarter period
df['year'] = pd.to_datetime(df['year']).dt.to_period('Q')
#resample by start of months with asfreq
df1 = (df.set_index('year')
.groupby('Id')['Id']
.resample('Q')
.asfreq()
.rename('val')
.reset_index())
print (df1)
Id   year  val
0   A 2019Q1    A
1   A 2019Q2    A
2   A 2019Q3    A
3   A 2019Q4    A
4   B 2018Q1    B
5   B 2018Q2  NaN
6   B 2018Q3  NaN
7   B 2018Q4    B
8   C 2017Q1    C
9   C 2017Q2    C
10  C 2017Q3  NaN
11  C 2017Q4    C

m = df1['val'].notnull().rename('g')
#create index by cumulative sum for unique groups for consecutive NaNs
df1.index = m.cumsum()
#filter only NaNs row and aggregate first, last and count.
df2 = (df1[~m.values].groupby(['Id', 'g'])['year']
.agg(['first','last','size'])
.reset_index(level=1, drop=True)
.reset_index())
print (df2)
Id  first   last  size
0  B 2018Q2 2018Q3     2
1  C 2017Q3 2017Q3     1

编辑:

对于具有相同值的新列,将其添加到groupby:

#convert datetimes to quarter period
df['year'] = pd.to_datetime(df['year']).dt.to_period('Q')
#resample by start of months with asfreq
df1 = (df.set_index('year')
.groupby(['Id','number'])['Id'] <- added number columns
.resample('Q')
.asfreq()
.rename('val')
.reset_index())
print (df1)
Id  number   year  val
0   A       1 2019Q1    A
1   A       1 2019Q2    A
2   A       1 2019Q3    A
3   A       1 2019Q4    A
4   B       5 2018Q1    B
5   B       5 2018Q2  NaN
6   B       5 2018Q3  NaN
7   B       5 2018Q4    B
8   C       7 2017Q1    C
9   C       7 2017Q2    C
10  C       7 2017Q3  NaN
11  C       7 2017Q4    C

m = df1['val'].notnull().rename('g')
#create index by cumulative sum for unique groups for consecutive NaNs
df1.index = m.cumsum()
#filter only NaNs row and aggregate first, last and count.
df2 = (df1[~m.values].groupby(['Id', 'number', 'g'])['year']
.agg(['first','last','size'])  <- added number columns
.reset_index(level=1, drop=True)
.reset_index())
print (df2)
Id  g  first   last  size
0  B  5 2018Q2 2018Q3     2
1  C  8 2017Q3 2017Q3     1

最新更新