关于合并和完整列的答案很多,但找不到更有效的方法。我的处境。
当前版本的python、pandas、numpy和文件格式为拼花
简单地说,如果col1==x col10=1,col11=2,col…等等
look1 = 'EMPLOYEE'
look2 = 'CHESTER'
look3 = "TONY'S"
look4 = "VICTOR'S"
tgt1 = 'inv_group'
tgt2 = 'acc_num'
for x in range(len(df['ph_name'])):
df[tgt1][x] = 'MEMORIAL'
df[tgt2][x] = 12345
elif df['ph_name'][x] == look2:
df[tgt1][x] = 'WALMART'
df[tgt2][x] = 45678
elif df['ph_name'][x] == look3:
df[tgt1][x] = 'TONYS'
df[tgt2][x] = 27359
elif df['ph_name'][x] == look4:
df[tgt1][x] = 'VICTOR'
df[tgt2][x] = 45378
basic sample:
unit_name tgt1 tgt2
0 EMPLOYEE Nan Nan
1 EMPLOYEE Nan Nan
2 TONY'S Nan Nan
3 CHESTER Nan Nan
4 VICTOR'S Nan Nan
5 EMPLOYEE Nan Nan
GOAL:
unit_name tgt1 tgt2
0 EMPLOYEE MEMORIAL 12345
1 EMPLOYEE MEMORIAL 12345
2 TONY'S TONYS 27359
3 CHESTER WALMART 45678
4 VICTOR'S VICTOR 45378
5 EMPLOYEE MEMORIAL 12345
所以这是有效的。。。我添加了自定义列的值,这不是最快的,但它很有效。
28896行需要6.2429744。我担心当我把它付诸实践时,它会开始拖累我。
另一个缺点是我很烦恼。。。是的,我可以沉默,但我觉得这可能是由于一种糟糕的做法,我应该知道如何减少。
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
基本上。。。
- 有办法优化吗
- 这个警告是因为我的坏习惯,我的无知,还是我只需要沉默
给定:(拥有所有NaN
列是愚蠢的(
unit_name
0 EMPLOYEE
1 EMPLOYEE
2 TONY'S
3 CHESTER
4 VICTOR'S
5 EMPLOYEE
df = pd.DataFrame({'unit_name': {0: 'EMPLOYEE', 1: 'EMPLOYEE', 2: "TONY'S", 3: 'CHESTER', 4: "VICTOR'S", 5: 'EMPLOYEE'}})
做:(让我们使用pd.Series.map
并创建一个字典,以便将来更容易修改(
looks = ['EMPLOYEE', 'CHESTER', "TONY'S", "VICTOR'S"]
new_cols = {
'inv_group': ["MEMORIAL", "WALMART", "TONYS", "VICTOR"],
'acc_num': [12345, 45678, 27359, 45378]
}
for col, values in new_cols.items():
df[col] = df['unit_name'].map(dict(zip(looks, values)))
print(df)
输出:(我以为你输入的列名是错误的(
unit_name inv_group acc_num
0 EMPLOYEE MEMORIAL 12345
1 EMPLOYEE MEMORIAL 12345
2 TONY'S TONYS 27359
3 CHESTER WALMART 45678
4 VICTOR'S VICTOR 45378
5 EMPLOYEE MEMORIAL 12345
在这里盲目飞行,因为我看不到你的数据:
cond_list = [df["ph_name"] == look for look in [look1, look2, look3, look4]]
# Rows ph_name outside of the list will keep their original values
df[tgt1] = np.select(cond_list, ["MEMORIAL", "WALMART", "TONY'S", "VICTOR"])
df[tgt2] = np.select(cond_list, [12345, 45678, 27359, 45378])