如何将新列添加到数据框与值从另一个数据框?



我有两个数据框架:df1.head ():

nazwa województwa   gmina nazwa gminy  rodzaj gminy
0  Zachodniopomorskie  320101   Białogard             1
1  Zachodniopomorskie  320101   Białogard             1
2  Zachodniopomorskie  320101   Białogard             1
3  Zachodniopomorskie  320101   Białogard             1
4  Zachodniopomorskie  320101   Białogard             1

和kts_df.head ():

name         type        KTS_code TERYT_code
0            Polska      COUNTRY  10000000000000       None
1           Bochnia  RURAL_GMINA  10011212001022    1201022
2           Drwinia  RURAL_GMINA  10011212001032    1201032
3         Iwanowice  RURAL_GMINA  10011212006032    1206032
4  Lipnica Murowana  RURAL_GMINA  10011212001042    1201042

当前要添加新列到df1,我使用

df['kts'] = df.apply(lambda row: self.get_kts_code(row, kts_df), axis=1)其中

def get_kts_code(self, row, kts_df: DataFrame) -> str:
"""Get the KTS code of each miasto/wieś."""
gmina_types = {
'1': AdministrativeUnitType.URBAN_GMINA,
'2': AdministrativeUnitType.RURAL_GMINA,
'4': AdministrativeUnitType.MIXED_GMINA,
'5': AdministrativeUnitType.RURAL_AREA,
'8': AdministrativeUnitType.DISTRICT,
'9': AdministrativeUnitType.DELEGATION,
}
nazwa_gminy = row['nazwa gminy']
gmina_type = gmina_types[str(row['rodzaj gminy'])]
teryt = kts_df['TERYT_code'].str.contains(str(row['gmina']))
kts_code = kts_df.loc[
(kts_df['name'] == nazwa_gminy) & (kts_df['type'] == gmina_type) & (teryt)
]
kts_code = kts_code['KTS_code'].values[0]
return kts_code

这段代码工作得很好,但是处理大约200k行的df1需要大约一个小时,太慢了。也许可以有另一种方法来快速找到正确的kts_code从kts_df为每一行的df1?

我不确定我是否满足了您的需求,但您可以尝试以下方法:

  • 用对应的gmina_type映射创建df
  • 连接kts_df和gmina_types_df来获取gmina id
  • 将df1与富集的kts_df连接

代码例子:

gmina_types = {
'id': [1, 2, 3, 4, 5],
'type': ['URBAN_GMINA', 'RURAL_GMINA', 'MIXED_GMINA', 'RURAL_AREA', 'DISTRICT']
}
gmina_types_df = pd.DataFrame.from_dict(gmina_types)
kts_df = kts_df.join(gmina_types_df.set_index('type'), on='type')
df1 = df1.join(kts_df.set_index('id'), on='rodzaj gminy')

最新更新