是否可以将字典应用于具有多个值的列?
Unit Code
0 243 A
1 106 B
2 678 C
3 852 A, B
4 239 A, C
5 217 A, B, C
6 642 D
7 932 E, F, G, H
8 213 A, C, E, H
如果我的字典如下。。。
dictex = {
'A' : 'Alpha',
'B' : 'Bravo',
'C' : 'Charlie',
'D' : 'Delta',
'E' : 'Echo',
'F' : 'Foxtrot',
'G' : 'Golf',
'H' : 'Hotel',
None : 'NULL'
}
在我应用它之后…
df['Phonetic'] = df['Code'].map(dictex)
我得到了以下内容(这不是我想要的(。
Unit Code Phonetic
0 243 A Alpha
1 106 B Bravo
2 678 C Charlie
3 852 A, B NaN
4 239 A, C NaN
5 217 A, B, C NaN
6 642 D Delta
7 932 E, F, G, H NaN
8 213 A, C, E, H NaN
我想要的是以下内容。
Unit Code Phonetic
0 243 A Alpha
1 106 B Bravo
2 678 C Charlie
3 852 A, B Alpha, Bravo
4 239 A, C Alpha, Charlie
5 217 A, B, C Alpha, Bravo, Charlie
6 642 D Delta
7 932 E, F, G, H Echo, Foxtrot, Golf, Hotel
8 213 A, C, E, H Alpha, Charlie, Echo, Hotel
如何将字典映射到列中的每个值?
尝试:
df["Phonetic"] = df["Code"].apply(
lambda x: ", ".join(dictex.get(v) for v in map(str.strip, x.split(",")))
)
print(df)
打印:
Unit Code Phonetic
0 243 A Alpha
1 106 B Bravo
2 678 C Charlie
3 852 A, B Alpha, Bravo
4 239 A, C Alpha, Charlie
5 217 A, B, C Alpha, Bravo, Charlie
6 642 D Delta
7 932 E, F, G, H Echo, Foxtrot, Golf, Hotel
8 213 A, C, E, H Alpha, Charlie, Echo, Hotel
映射前需要拆分数据:
df['Phonetic'] = (df['Code'].str.split(', ')
.explode().map(dictex)
.groupby(level=0).agg(', '.join)
)
或与str.extractall
:类似
df['Phonetic'] = (df['Code'].str.extractall('([^s,]+)')
.replace(dictex)
.groupby(level=0).agg(', '.join)
)
输出:
Unit Code Phonetic
0 243 A Alpha
1 106 B Bravo
2 678 C Charlie
3 852 A, B Alpha, Bravo
4 239 A, C Alpha, Charlie
5 217 A, B, C Alpha, Bravo, Charlie
6 642 D Delta
7 932 E, F, G, H Echo, Foxtrot, Golf, Hotel
8 213 A, C, E, H Alpha, Charlie, Echo, Hotel
让我们执行get_dummies
s = df.Code.str.get_dummies(', ').rename(dictex,axis=1)
df['Phonetic'] = s.dot(s.columns+',').str[:-1]
df
Out[68]:
Code Phonetic
0 A Alpha
1 B Bravo
2 C Charlie
3 A, B Alpha,Bravo
4 A, B, C Alpha,Bravo,Charlie