如何填充数据框与另一个数据框的条件和总和,当它在同一单元格?



如下图所示,有两个数据帧df1和df2。

df1说明每个列名的数量数据。因此,每一列将包含从Floor到Under的数量。

Column name    Floor    Under  ConcType   ConcTotal     RebarTotal     SteelTotal
0   CB1            ACP      GF     C80        21.6          7954.8         2265.0
1   CB1A           B4F      LGF    C60        1.2           318.8          0.0
...   
201 CT12B          GF       8F     C60        4.5           757.6         840.0
202 CT13A          GF       8F     C80        1.4           429.3         420.0

df2为隔层充填量数据表。但是df2在fl上不同,name_h (column) mean将包含df1中承载该Floor的所有列的数量。

H表示承载该层的列的高度。按标高楼层-前一楼层标高计算。

例如F8层H = 48400 - 42800 = 5600

Fl.Name_Elev    Elevation    Fl.Name_h   H         C60_Conc  C80_Conc   Rebar   Steel
0   8F              48400        8F          5600.0    NaN       NaN        NaN     NaN 
1   7F              42800        7F          6000.0    NaN       NaN        NaN     NaN
2   6F              36800        6F          6000.0    NaN       NaN        NaN     NaN
3   5F              30800        5F          5600.0    NaN       NaN        NaN     NaN
4   4F              25200        4F          5600.0    NaN       NaN        NaN     NaN
5   3F              19600        3F          5600.0    NaN       NaN        NaN     NaN
6   2F              14000        2F          6000.0    NaN       NaN        NaN     NaN
7   1F              8000         1F          6000.0    NaN       NaN        NaN     NaN
8   GF              2000         GF          3300.0    NaN       NaN        NaN     NaN
9   LGF MZ          -1300        LGF MZ      3150.0    NaN       NaN        NaN     NaN 
10  LGF             -4450        LGF         4650.0    NaN       NaN        NaN     NaN
11  B1F             -9100        B1F         3000.0    NaN       NaN        NaN     NaN
12  B2F             -12100       B2F         3000.0    NaN       NaN        NaN     NaN
13  B3F             -15100       B3F         4650.0    NaN       NaN        NaN     NaN 
14  B4F             -19750       B4F         8250.0    NaN       NaN        NaN     NaN
15  ACP             -28000       NaN         NaN       NaN       NaN        NaN     NaN

我想从df1填充到NaN的"C60_Conc", "C80_Conc","钢筋","钢"在df2。

结果如下:

Fl.Name_Elev    Elevation    Fl.Name_h       H         C60_Conc C80_Conc    Rebar    Steel
0   8F              48400        8F          5600.0    4.5      1.4         1186.9   1260   
1   7F              42800        7F          6000.0    4.5      1.4         1186.9   1260
2   6F              36800        6F          6000.0    4.5      1.4         1186.9   1260
3   5F              30800        5F          5600.0    4.5      1.4         1186.9   1260
4   4F              25200        4F          5600.0    4.5      1.4         1186.9   1260
5   3F              19600        3F          5600.0    4.5      1.4         1186.9   1260
6   2F              14000        2F          6000.0    4.5      1.4         1186.9   1260
7   1F              8000         1F          6000.0    4.5      1.4         1186.9   1260
8   GF              2000         GF          3300.0    0        21.6        7954.8   2265
9   LGF MZ          -1300        LGF MZ      3150.0    0        21.6        7954.8   2265   
10  LGF             -4450        LGF         4650.0    1.2      21.6        8276.6   2265
11  B1F             -9100        B1F         3000.0    1.2      21.6        8276.6   2265
12  B2F             -12100       B2F         3000.0    1.2      21.6        8276.6   2265
13  B3F             -15100       B3F         4650.0    1.2      21.6        8276.6   2265   
14  B4F             -19750       B4F         8250.0    0        21.6        7954.8   2265
15  ACP             -28000       NaN         NaN       NaN      NaN         NaN      NaN

注:抱歉我的沟通。英语不是我的母语。

解决方案:

floors = df2["Fl.Name_Elev"].tolist()[::-1]
df1["Floors"] = df1.apply(lambda x: floors[floors.index(x["Floor"]):floors.index(x["Under"])+1], axis=1)
df1 = df1.explode(column=["Floors"])
mapper = df1.pivot(index=df1.columns.difference(["ConcType", "ConcTotal"]), 
columns="ConcType", 
values="ConcTotal").reset_index().groupby("Floors").sum()
output = df2.merge(mapper, left_on="Fl.Name_Elev", right_index=True, how="left")
output = output.rename(columns={"C60": "C60_Conc", "C80": "C80_Conc"})
>>> output
Fl.Name_Elev  Elevation Fl.Name_h  ...  SteelTotal  C60_Conc  C80_Conc
0            8F      48400        8F  ...        1260       4.5       1.4
1            7F      42800        7F  ...        1260       4.5       1.4
2            6F      36800        6F  ...        1260       4.5       1.4
3            5F      30800        5F  ...        1260       4.5       1.4
4            4F      25200        4F  ...        1260       4.5       1.4
5            3F      19600        3F  ...        1260       4.5       1.4
6            2F      14000        2F  ...        1260       4.5       1.4
7            1F       8000        1F  ...        1260       4.5       1.4
8            GF       2000        GF  ...        3525       4.5      23.0
9        LGF MZ      -1300    LGF MZ  ...        2265       0.0      21.6
10          LGF      -4450       LGF  ...        2265       1.2      21.6
11          B1F      -9100       B1F  ...        2265       1.2      21.6
12          B2F     -12100       B2F  ...        2265       1.2      21.6
13          B3F     -15100       B3F  ...        2265       1.2      21.6
14          B4F     -19750       B4F  ...        2265       1.2      21.6
15          ACP     -28000       NaN  ...        2265       0.0      21.6

解释首先,我们使用pandas.explode包含df1中从Floor到Under的所有楼层:

floors = df2["Fl.Name_Elev"].tolist()[::-1]
df1["Floors"] = df1.apply(lambda x: floors[floors.index(x["Floor"]):floors.index(x["Under"])+1], axis=1)
df1 = df1.explode(column=["Floors"])
>>> df1.head()
Column name Floor Under ConcType  ConcTotal  RebarTotal  SteelTotal  Floors
0         CB1   ACP    GF      C80       21.6      7954.8        2265     ACP
0         CB1   ACP    GF      C80       21.6      7954.8        2265     B4F
0         CB1   ACP    GF      C80       21.6      7954.8        2265     B3F
0         CB1   ACP    GF      C80       21.6      7954.8        2265     B2F
0         CB1   ACP    GF      C80       21.6      7954.8        2265     B1F
0         CB1   ACP    GF      C80       21.6      7954.8        2265     LGF
0         CB1   ACP    GF      C80       21.6      7954.8        2265  LGF MZ
0         CB1   ACP    GF      C80       21.6      7954.8        2265      GF
1        CB1A   B4F   LGF      C60        1.2       318.8           0     B4F
1        CB1A   B4F   LGF      C60        1.2       318.8           0     B3F

然后,我们通过使用pandas.pivot来创建一个映射器,以获得C60和C80的单独列,并使用pandas.groupbysum来获得每个楼层的总数:

mapper = df1.pivot(index=df1.columns.difference(["ConcType", "ConcTotal"]),
columns="ConcType",
values="ConcTotal") 
.reset_index() 
.groupby("Floor").sum()
>>> mapper
ConcType  RebarTotal  SteelTotal  C60   C80
Floors                                     
1F            1186.9        1260  4.5   1.4
2F            1186.9        1260  4.5   1.4
3F            1186.9        1260  4.5   1.4
4F            1186.9        1260  4.5   1.4
5F            1186.9        1260  4.5   1.4
6F            1186.9        1260  4.5   1.4
7F            1186.9        1260  4.5   1.4
8F            1186.9        1260  4.5   1.4
ACP           7954.8        2265  0.0  21.6
B1F           8273.6        2265  1.2  21.6
B2F           8273.6        2265  1.2  21.6
B3F           8273.6        2265  1.2  21.6
B4F           8273.6        2265  1.2  21.6
GF            9141.7        3525  4.5  23.0
LGF           8273.6        2265  1.2  21.6
LGF MZ        7954.8        2265  0.0  21.6

最后,我们在df2和mapper上使用pandas.merge得到输出:

output = df2.merge(mapper, left_on="Fl.Name_Elev", right_index=True, how="left")
output = output.rename(columns={"C60": "C60_Conc", "C80": "C80_Conc"})

相关内容

最新更新