熊猫数据帧列表列，执行 str.split('\.[a-zA-Z]'， 1).str[0].str.strip() A

data = {
'IDs':  ['G1','G2','G3','G4','G5','G6'],
'hostname': [[ 'Gp.xyz.com', 'Gp.wxyz.com'],['GSS'],['CS.xyz.com', 'CS_B.wxyz.com'],['GS191'], ['C_P.g.com'][10.10.1.10]]
} 
df = pd.DataFrame.from_dict(data)
df
Out[107]: 
IDs                     hostname
0  G1    [Gp.xyz.com, Gp.wxyz.com]
1  G2                        [GSS]
2  G3  [CS.xyz.com, CS_B.wxyz.com]
3  G4                      [GS191]
4  G5                  [C_P.g.com]
5  g6                 [10.10.1.10]
df['hostname'].apply(lambda el: [ x.str.split('.[a-zA-Z]', 1).str[0].str.strip() for x in el])

我尝试了以下内容，用于具有单个元素的列，它起作用了。但是上面给了我一个错误，

line 1, in <listcomp>
df['hostname'].apply(lambda el: [ x.str.split('.[a-zA-Z]', 1).str[0].str.strip() for x in el])
AttributeError: 'str' object has no attribute 'str'

预期输出应采用以下格式，

data1 = {
'IDs':  ['G1','G2','G3','G4','G5'],
'hostname': [[ 'Gp', 'Gp'],['GSS'],['CS', 'CS_B'],['GS191'], ['C_P']]
} 
df1 = pd.DataFrame.from_dict(data)
df1
Out[108]: 
IDs    hostname
0  G1    [Gp, Gp]
1  G2       [GSS]
2  G3  [CS, CS_B]
3  G4     [GS191]
4  G5       [C_P]
5  G6 [10.10.1.10]

我们先做explode，然后再做split

s=df.explode('hostname')
s['hostname']=s.hostname.str.split('.[a-zA-Z]', 1).str[0]
s.groupby(level=0).agg({'IDs':'first','hostname':list})
IDs    hostname
0  G1    [Gp, Gp]
1  G2       [GSS]
2  G3  [CS, CS_B]
3  G4     [GS191]
4  G5       [C_P]

对于您的错误，由于您使用apply，每个x都是一个python字符串，例如'Gp.xyz.com'。因此，它没有.str访问权限。你可以只做x.split('.')，但我怀疑你能在那里使用正则表达式拆分。解决方法是

df['hostname'].apply(lambda el: [x.split('.')[0] for x in el])

Ben 解决方案的另一个修改是使用extract，节省了一些.str访问：

s=df.explode('hostname')
s['hostname']=s.hostname.str.extract('^([^.]*)')
s.groupby('IDs').agg(list)

输出：

hostname
IDs     
G1  [Gp, Gp]
G2  [GSS]
G3  [CS, CS_B]
G4  [GS191]
G5  [C_P]

相关内容

最新更新

热门标签：