我正在尝试使用在不同数据帧上执行的计算结果填充数据帧。 当在两个单独的序列中满足条件时,这些计算应在序列上运行。
这是我尝试过的。 我已经构建了一个数据帧,rswcapacity,应该在其上运行计算,然后创建了另一个数据帧年容量,我希望在其中存储条件计算。
#First DataFrame
d = {'technology': ['EAF', 'EAF', 'EAF', 'BOF', 'BOF', 'BOF'], 'equip_detail1': [150, 130, 100, 200, 200, 150], 'equip_number' : [1, 2, 3, 1, 2, 3], 'capacity_actual': [2400, 2080, 1600, 3200, 3200, 2400], 'start_year': [1992, 1993, 1994, 1989, 1990, 1991], 'closure_year': [ '', 2002, '', '', 2001, 2011] }
rswcapacity = pd.DataFrame(data = d)
rswcapacity['closure_year'].replace('', np.nan, inplace = True)
#Second DataFrame
annualcapacity = pd.DataFrame(columns=['years', 'capacity'])
annualcapacity ['years'] = [1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
#Neither of the attempts below yields the desired results:
for y in years:
annualcapacity['capacity'].append(rswcapacity['capacity_actual'].apply(lambda x : x['capacity_actual'].sum() (x['start_year'] >= y & (x['closure_year'] <= y | x['closure_year'].isnull()))).sum())
annualcapacity
#other attempt:
for y in years:
if (rswcapacity['start_year'] >= y).any() & ((rswcapacity['closure_year'].isnull()).any() | (rswcapacity['closure_year'] <= y).any()):
annualcapacity['capacity'].append(rswcapacity['capacity_actual'].sum())
annualcapacity
我想得到的结果是每年执行的总和。 例如: 1985 应该返回 NaN,因为 1985 年比 1992 年的任何年份都小start_year 1992 应该返回 14880,因为 1992 比任何start_year都大,比任何closure_year 2001 应返回 7200,因为它大于所有start_year,大于所有closure_years。 相反,我的所有三次尝试都只在年份列表中返回 NaN。
我设置条件有问题,但还没有设法弄清楚是什么。
任何见解都非常感谢!
您可以按如下方式执行此操作:
# start with an empty dataframe for the summed capacity
# with int32 as type of the year and float32 as type for the capacity
annualcapacity = pd.DataFrame({'years': pd.Series(dtype='int32'), 'capacity': pd.Series(dtype='float32')})
# use your list of years
years= [1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]
for y in years:
# create a sum for each year
indexer= (rswcapacity['start_year'] <= y) & ((rswcapacity['closure_year'].isnull()) | (rswcapacity['closure_year'] >= y))
capa= rswcapacity.loc[indexer, 'capacity_actual'].sum()
# and append it to the result frame
annualcapacity= annualcapacity.append(dict(years=y, capacity=capa), ignore_index=True)
annualcapacity
结果如下所示:
years capacity
0 1980 0.0
1 1981 0.0
2 1982 0.0
3 1983 0.0
4 1984 0.0
5 1985 0.0
6 1986 0.0
7 1987 0.0
8 1988 0.0
9 1989 3200.0
10 1990 6400.0
11 1991 8800.0
12 1992 11200.0
13 1993 13280.0
14 1994 14880.0
15 1995 14880.0
16 1996 14880.0
17 1997 14880.0
18 1998 14880.0
19 1999 14880.0
20 2000 14880.0
21 2001 14880.0
22 2002 11680.0
23 2003 9600.0
24 2004 9600.0
25 2005 9600.0
26 2006 9600.0
27 2007 9600.0
28 2008 9600.0
29 2009 9600.0
30 2010 9600.0
31 2011 9600.0
32 2012 7200.0
33 2013 7200.0
34 2014 7200.0
35 2015 7200.0
36 2016 7200.0
37 2017 7200.0
38 2018 7200.0
39 2019 7200.0
40 2020 7200.0
注意:总和始终是数字,因此如果一年没有容量,则值为 0.0 而不是NaN
。如果出于某种原因需要NaN
,可以将其替换为以下行。 第二点是,我改变了你的条件,
(rswcapacity['start_year'] >= y) & ((rswcapacity['closure_year'].isnull()) | (rswcapacity['closure_year'] <= y))
所以>=
变得<=
因为我想,你想把当年所有可用的容量相加,对吧?
因此,如果您需要NaN
条目而不是 0.0(如果根本没有可用容量(,则可以执行以下操作:
annualcapacity.loc[annualcapacity['capacity] == 0, 'capacity']= np.NaN
为此,您需要在标头中添加import numpy as np
。