我有两个数据框架:df1.head ():
nazwa województwa gmina nazwa gminy rodzaj gminy
0 Zachodniopomorskie 320101 Białogard 1
1 Zachodniopomorskie 320101 Białogard 1
2 Zachodniopomorskie 320101 Białogard 1
3 Zachodniopomorskie 320101 Białogard 1
4 Zachodniopomorskie 320101 Białogard 1
和kts_df.head ():
name type KTS_code TERYT_code
0 Polska COUNTRY 10000000000000 None
1 Bochnia RURAL_GMINA 10011212001022 1201022
2 Drwinia RURAL_GMINA 10011212001032 1201032
3 Iwanowice RURAL_GMINA 10011212006032 1206032
4 Lipnica Murowana RURAL_GMINA 10011212001042 1201042
当前要添加新列到df1,我使用
df['kts'] = df.apply(lambda row: self.get_kts_code(row, kts_df), axis=1)
其中
def get_kts_code(self, row, kts_df: DataFrame) -> str:
"""Get the KTS code of each miasto/wieś."""
gmina_types = {
'1': AdministrativeUnitType.URBAN_GMINA,
'2': AdministrativeUnitType.RURAL_GMINA,
'4': AdministrativeUnitType.MIXED_GMINA,
'5': AdministrativeUnitType.RURAL_AREA,
'8': AdministrativeUnitType.DISTRICT,
'9': AdministrativeUnitType.DELEGATION,
}
nazwa_gminy = row['nazwa gminy']
gmina_type = gmina_types[str(row['rodzaj gminy'])]
teryt = kts_df['TERYT_code'].str.contains(str(row['gmina']))
kts_code = kts_df.loc[
(kts_df['name'] == nazwa_gminy) & (kts_df['type'] == gmina_type) & (teryt)
]
kts_code = kts_code['KTS_code'].values[0]
return kts_code
这段代码工作得很好,但是处理大约200k行的df1需要大约一个小时,太慢了。也许可以有另一种方法来快速找到正确的kts_code从kts_df为每一行的df1?
我不确定我是否满足了您的需求,但您可以尝试以下方法:
- 用对应的gmina_type映射创建df
- 连接kts_df和gmina_types_df来获取gmina id
- 将df1与富集的kts_df连接
代码例子:
gmina_types = {
'id': [1, 2, 3, 4, 5],
'type': ['URBAN_GMINA', 'RURAL_GMINA', 'MIXED_GMINA', 'RURAL_AREA', 'DISTRICT']
}
gmina_types_df = pd.DataFrame.from_dict(gmina_types)
kts_df = kts_df.join(gmina_types_df.set_index('type'), on='type')
df1 = df1.join(kts_df.set_index('id'), on='rodzaj gminy')