Python:只保留第二个短划线之后和第三个短划线之前的字符串部分



我有一个列"Id",它的数据如下:

10020-100-700-800-2'

如何为每一行创建一个只包含第三个数字(在本例中为700(的新列?

以下是一个示例数据帧:

d={'id':{0:‘10023_11_762553_762552_11',1:‘10023_14_325341_359865_14',2:‘10023_17_771459_771453_17',3:‘10023_20_440709_359899_20',4:‘10023_24_773107_625033_24',5:‘10023_27_771462_771463_27',6:‘10023_30_771262_771465_30',7:‘10023_33_761971_762470_33'’,‘values’:{0:10023,1:{}

使用str.split并获取列表的第三个参数:

df = pd.DataFrame({'Col': ['10020-100-700-800-2']})
df['NewCol'] = df['Col'].str.split('-').str[2].astype(int)
print(df)
# Output
Col  NewCol
0  10020-100-700-800-2     700

使用您的样本更新

data = {'Id': ['10020-100-700-800-2',
'10022-400-900-900-2',
'10045-600-800-900-3',
'10000-300-400-300-3',
'10020-200-200-200-2'],
'Employed': [1, 0, 0, 1, 1],
'Name': ['Alan', 'Joe', 'Sam', 'Amy', 'Chloe']}
df = pd.DataFrame(data)
df['Id2'] = df['Id'].str.split('-').str[2].astype(int)
print(df)
# Output
Id  Employed   Name  Id2
0  10020-100-700-800-2         1   Alan  700
1  10022-400-900-900-2         0    Joe  900
2  10045-600-800-900-3         0    Sam  800
3  10000-300-400-300-3         1    Amy  400
4  10020-200-200-200-2         1  Chloe  200

使用新数据更新2

data = {'id': ['10023_11_762553_762552_11',
'10023_14_325341_359865_14',
'10023_17_771459_771453_17',
'10023_20_440709_359899_20',
'10023_24_773107_625033_24',
'10023_27_771462_771463_27',
'10023_30_771262_771465_30',
'10023_33_761971_762470_33'],
'values': [10023, 10023, 10023, 10023, 10023, 10023, 10023, 10023]}
df = pd.DataFrame(data)
df['id2'] = df['id'].str.split('_').str[2].astype(int)
print(df)
# Output
id  values     id2
0  10023_11_762553_762552_11   10023  762553
1  10023_14_325341_359865_14   10023  325341
2  10023_17_771459_771453_17   10023  771459
3  10023_20_440709_359899_20   10023  440709
4  10023_24_773107_625033_24   10023  773107
5  10023_27_771462_771463_27   10023  771462
6  10023_30_771262_771465_30   10023  771262
7  10023_33_761971_762470_33   10023  761971

相关内容

最新更新