如何使用Pandas.value_counts来计算(a列)中事件发生的次数,并按(b列)中所述的年份分组



我已经预处理了一个包含美国紧急情况和灾难历史信息的df,现在包含1960-2017年的``[地点,灾难类型,开始日期,结束日期灾难长度,年份']。

现在,我想创建2个新的dfs。

  1. =每年发生灾害的次数
  2. =每年每种类型的灾害发生的次数

这是我目前试图计算每年发生的灾难数量并创建一个新的df,但我不知道如何具体计算每年的灾难数量。

#Number of each Disaster each year
df_yearly_dcount=df_time.groupby(df_time['Start_year']).count()

第二,我不太确定每年每场灾难发生多少次,因为我需要弄清楚第一次,然后才能继续前进,分开更多。

这是完整的代码:

import numpy as np
import matplotlib.pyplot as plt 
import pandas as pd 
import seaborn as sns 
from scipy.stats import zscore
#Import Datased
df = pd.read_csv('database.csv')
df_time = (df[['County','Disaster Type','Start Date', 'End Date']][0: :])
#Preprocessing      

#Number of NaN values          
df_nan = df[['County','Disaster Type','Start Date', 'End Date']].isna().sum()
#NaN values as a percentage as total 
df_nan_number = [(df_nan.sum(axis=0)), str((((539/45330)*100))) +'%']
#Remove NaN values
df_time.dropna(subset = ["County", 'End Date'], inplace=True)
#Set Date Format
df_time['Start_Date_A'] = pd.to_datetime(df['Start Date'], format='%m/%d/%Y')
df_time['End_Date_A'] = pd.to_datetime(df['End Date'], format='%m/%d/%Y')
#Create new column == Disaster Length
df_time['Disaster_Length'] = (df_time.Start_Date_A - df_time.End_Date_A).dt.days
#Create new column == start year
df_time['Start_year'] = df_time['Start_Date_A'].dt.year
#Dropped  Old Date Formats from df
df_time = df_time.drop(columns=['Start Date', 'End Date'], axis=1)
#Replace 0 day values with 1 to indicate a Disaster length of 1 Day
df_time['Disaster_Length'] = df_time['Disaster_Length'].replace({0:1})
#Replace all values with absolute values so all days are represented as positive numeric values
df_time['Disaster_Length'] = df_time['Disaster_Length'].abs()

# Locating man-made and and non 'natural' disasters, sorting Disaster types, and analyzing value counts
df_DTypes= df_time['Disaster Type'].values
df_DTypes=pd.DataFrame(df_DTypes)

df_DType_VCounts=(df_DTypes.value_counts()).sort_values(ascending=True)

Df_DType_Natural=(df_DType_VCounts.drop(['Human Cause', 'Chemical', 'Dam/Levee Break', 'Terrorism','Other'],axis=0)).sort_values(ascending=True)
df_time = df_time.rename(columns={'Disaster Type': 'Disaster_Type'})
#Removing non-natural disasters from main df_time
df_time = df_time[(df_time.Disaster_Type != 'Human Cause') & (df_time.Disaster_Type != 'Chemical') & (df_time.Disaster_Type != 'Dam/Levee Break') & (df_time.Disaster_Type != 'Terrorism') & (df_time.Disaster_Type != 'Other') ]
#Analysis 
#Dataframe with mean disaster length for each year
df_yearly_mean = df_time.groupby(['Start_year']).mean()

#Number of Disasters per year
df_yearly_dcount=df_time.groupby(df_time['Start_year']).count().reset_index(name='Disaster_Type')

#Number of each Disaster each year

这是df:的可复制样本


,County,Disaster_Type,Start_Date_A,End_Date_A,Disaster_Length,Start_year
89,Clay County,Flood,1959-01-29,1959-01-29,1,1959
181,Alpine County,Flood,1964-12-24,1964-12-24,1,1964
182,Amador County,Flood,1964-12-24,1964-12-24,1,1964
183,Butte County,Flood,1964-12-24,1964-12-24,1,1964
184,Colusa County,Flood,1964-12-24,1964-12-24,1,1964
185,Del Norte County,Flood,1964-12-24,1964-12-24,1,1964
186,El Dorado County,Flood,1964-12-24,1964-12-24,1,1964
187,Glenn County,Flood,1964-12-24,1964-12-24,1,1964
188,Humboldt County,Flood,1964-12-24,1964-12-24,1,1964
189,Lake County,Flood,1964-12-24,1964-12-24,1,1964
190,Lassen County,Flood,1964-12-24,1964-12-24,1,1964
191,Marin County,Flood,1964-12-24,1964-12-24,1,1964
192,Mendocino County,Flood,1964-12-24,1964-12-24,1,1964
193,Modoc County,Flood,1964-12-24,1964-12-24,1,1964
194,Napa County,Flood,1964-12-24,1964-12-24,1,1964
195,Nevada County,Flood,1964-12-24,1964-12-24,1,1964
196,Placer County,Flood,1964-12-24,1964-12-24,1,1964
197,Plumas County,Flood,1964-12-24,1964-12-24,1,1964
198,Sacramento County,Flood,1964-12-24,1964-12-24,1,1964
199,San Joaquin County,Flood,1964-12-24,1964-12-24,1,1964
200,Shasta County,Flood,1964-12-24,1964-12-24,1,1964
201,Sierra County,Flood,1964-12-24,1964-12-24,1,1964
202,Siskiyou County,Flood,1964-12-24,1964-12-24,1,1964
203,Solano County,Flood,1964-12-24,1964-12-24,1,1964
204,Sonoma County,Flood,1964-12-24,1964-12-24,1,1964
205,Stanislaus County,Flood,1964-12-24,1964-12-24,1,1964
206,Sutter County,Flood,1964-12-24,1964-12-24,1,1964
207,Tehama County,Flood,1964-12-24,1964-12-24,1,1964
208,Trinity County,Flood,1964-12-24,1964-12-24,1,1964
209,Tuolumne County,Flood,1964-12-24,1964-12-24,1,1964
210,Yolo County,Flood,1964-12-24,1964-12-24,1,1964
211,Yuba County,Flood,1964-12-24,1964-12-24,1,1964
212,Baker County,Flood,1964-12-24,1964-12-24,1,1964
213,Benton County,Flood,1964-12-24,1964-12-24,1,1964
214,Clackamas County,Flood,1964-12-24,1964-12-24,1,1964
215,Clatsop County,Flood,1964-12-24,1964-12-24,1,1964
216,Columbia County,Flood,1964-12-24,1964-12-24,1,1964
217,Coos County,Flood,1964-12-24,1964-12-24,1,1964
218,Crook County,Flood,1964-12-24,1964-12-24,1,1964
219,Curry County,Flood,1964-12-24,1964-12-24,1,1964
220,Deschutes County,Flood,1964-12-24,1964-12-24,1,1964
221,Douglas County,Flood,1964-12-24,1964-12-24,1,1964
222,Gilliam County,Flood,1964-12-24,1964-12-24,1,1964
223,Grant County,Flood,1964-12-24,1964-12-24,1,1964
224,Harney County,Flood,1964-12-24,1964-12-24,1,1964
225,Hood River County,Flood,1964-12-24,1964-12-24,1,1964
226,Jackson County,Flood,1964-12-24,1964-12-24,1,1964
227,Jefferson County,Flood,1964-12-24,1964-12-24,1,1964
228,Josephine County,Flood,1964-12-24,1964-12-24,1,1964
229,Klamath County,Flood,1964-12-24,1964-12-24,1,1964
230,Lake County,Flood,1964-12-24,1964-12-24,1,1964
231,Lane County,Flood,1964-12-24,1964-12-24,1,1964
232,Lincoln County,Flood,1964-12-24,1964-12-24,1,1964
233,Linn County,Flood,1964-12-24,1964-12-24,1,1964
234,Malheur County,Flood,1964-12-24,1964-12-24,1,1964
235,Marion County,Flood,1964-12-24,1964-12-24,1,1964
236,Morrow County,Flood,1964-12-24,1964-12-24,1,1964
237,Multnomah County,Flood,1964-12-24,1964-12-24,1,1964
238,Polk County,Flood,1964-12-24,1964-12-24,1,1964
239,Sherman County,Flood,1964-12-24,1964-12-24,1,1964
240,Tillamook County,Flood,1964-12-24,1964-12-24,1,1964
241,Umatilla County,Flood,1964-12-24,1964-12-24,1,1964
242,Union County,Flood,1964-12-24,1964-12-24,1,1964
243,Wallowa County,Flood,1964-12-24,1964-12-24,1,1964
244,Wasco County,Flood,1964-12-24,1964-12-24,1,1964
245,Washington County,Flood,1964-12-24,1964-12-24,1,1964
246,Wheeler County,Flood,1964-12-24,1964-12-24,1,1964
247,Yamhill County,Flood,1964-12-24,1964-12-24,1,1964
248,Asotin County,Flood,1964-12-29,1964-12-29,1,1964
249,Benton County,Flood,1964-12-29,1964-12-29,1,1964
250,Clark County,Flood,1964-12-29,1964-12-29,1,1964
251,Columbia County,Flood,1964-12-29,1964-12-29,1,1964
252,Cowlitz County,Flood,1964-12-29,1964-12-29,1,1964
253,Garfield County,Flood,1964-12-29,1964-12-29,1,1964
254,Grays Harbor County,Flood,1964-12-29,1964-12-29,1,1964
255,King County,Flood,1964-12-29,1964-12-29,1,1964
256,Kittitas County,Flood,1964-12-29,1964-12-29,1,1964
257,Klickitat County,Flood,1964-12-29,1964-12-29,1,1964
258,Lewis County,Flood,1964-12-29,1964-12-29,1,1964
259,Mason County,Flood,1964-12-29,1964-12-29,1,1964
260,Pacific County,Flood,1964-12-29,1964-12-29,1,1964
261,Pierce County,Flood,1964-12-29,1964-12-29,1,1964
262,Skamania County,Flood,1964-12-29,1964-12-29,1,1964
263,Snohomish County,Flood,1964-12-29,1964-12-29,1,1964
264,Spokane County,Flood,1964-12-29,1964-12-29,1,1964
265,Wahkiakum County,Flood,1964-12-29,1964-12-29,1,1964
266,Walla Walla County,Flood,1964-12-29,1964-12-29,1,1964
267,Whitman County,Flood,1964-12-29,1964-12-29,1,1964
268,Yakima County,Flood,1964-12-29,1964-12-29,1,1964
269,Ada County,Flood,1964-12-31,1964-12-31,1,1964
270,Bannock County,Flood,1964-12-31,1964-12-31,1,1964
271,Benewah County,Flood,1964-12-31,1964-12-31,1,1964
272,Blaine County,Flood,1964-12-31,1964-12-31,1,1964
273,Boise County,Flood,1964-12-31,1964-12-31,1,1964
274,Bonneville County,Flood,1964-12-31,1964-12-31,1,1964
275,Butte County,Flood,1964-12-31,1964-12-31,1,1964
276,Camas County,Flood,1964-12-31,1964-12-31,1,1964
277,Caribou County,Flood,1964-12-31,1964-12-31,1,1964
278,Cassia County,Flood,1964-12-31,1964-12-31,1,1964
279,Clearwater County,Flood,1964-12-31,1964-12-31,1,1964

您可以在groupby上调用size来获取计数。

#Number of Disasters each year.
df.groupby('Start_year').size()
Start_year
1959     1
1964    99
dtype: int64
#Number of each disasters for each year.
df.groupby(['Start_year', 'Disaster_Type']).size()
Start_year  Disaster_Type
1959        Flood             1
1964        Flood            99
dtype: int64

最新更新