根据相邻列值 Pandas 从列中提取第一个或第二个空格的字符串



我有一个包含产品代码和产品类型的数据框。

material_description            component_type_or_status
SF 1243545gbe ff ee rr oo       SF
LF 2324344ire ff ee rr oo       LF
BF 3434333fre ff gg hh 23       BF
IA SF 3434333fre ff gg 22 re    IA
ZZ LF 34391r33b ff tn 33        ZZ

我想创建一个名为材料代码的新列,该列根据产品类型的值从产品代码列的左侧提取第二个字符串或第三个字符串

如果SF、BFLF在左起第一个空格之后返回字符串

如果IAZZ在左起第二个空格后返回字符串

这是我的函数。它陷入了一个循环,我不确定我的逻辑是否正确。 与熊猫一起做到这一点的最佳方法是什么?

def parse_material_description(x):
df = infile.parse(sheet_name='Unit of Measure')
df['component_type_or_status'] = df['Material Description'].str[:2]
try:
if x['component_type_or_status'] == 'SF':
df['material_code'] = df['Material Description'].str.split(" ",1)
elif x['component_type_or_status'] == 'LF':
df['material_code'] = df['Material Description'].str.split(" ",1)
elif x['component_type_or_status'] == 'BF':
df['material_code'] = df['Material Description'].str.split(" ",1)
elif x['component_type_or_status'] == 'IA':
df['material_code'] = df['Material Description'].str.split(" ",2)            
elif x['component_type_or_status'] == 'ZZ':
df['material_code'] = df['Material Description'].str.split(" ",2)            
elif x['component_type_or_status'] == None:
return ''
except: IndexError
df['component_type_or_status'] = df.apply(parse_material_description, axis=1) 

我认为在一行中使用np.where可能会更容易一些:

import pandas as pd
import numpy as np
data = {'material':['SF 1243545gbe ff ee rr oo','LF 2324344ire ff ee rr oo','ZZ LF 34391r33b ff tn 33'],'type':['SF','LF','ZZ']}
df = pd.DataFrame(data)
df['material_code'] = np.where(df['type'].isin(['SF','LF','BF']),df['material'].str.split(" ").str.get(1),df['material'].str.split(" ").str.get(2))
print(df)

输出:

material type material_code
0  SF 1243545gbe ff ee rr oo   SF    1243545gbe
1  LF 2324344ire ff ee rr oo   LF    2324344ire
2   ZZ LF 34391r33b ff tn 33   ZZ     34391r33b

如果你的material_codes具有代表性,你可以这样做;

df['material_code'] = df['material_description'].str.extract(r's+([a-z0-9]+)s+')

这里有一种方法可以做到这一点:

法典:

df['material code']=''
for i in range(0,len(df['component_type_or_status'])):
if (df['component_type_or_status'][i] == 'SF') or (df['component_type_or_status'][i] == 'LF') or (df['component_type_or_status'][i] == 'BF'):
df['material code'][i]=' '.join(df['material_description'][i].split()[1:2])
else:
df['material code'][i]=' '.join(df['material_description'][i].split()[2:3])

输出:

material_description            component_type_or_status    material code
0   SF 1243545gbe ff ee rr oo       SF                          1243545gbe
1   LF 2324344ire ff ee rr oo       LF                          2324344ire
2   BF 3434333fre ff gg hh 23       BF                          3434333fre
3   IA SF 3434333fre ff gg 22 re    IA                          3434333fre
4   ZZ LF 34391r33b ff tn 33        ZZ                          34391r33b

df这里是您的初始数据帧。

相关内容

最新更新