我有一个列"Id",它的数据如下:
10020-100-700-800-2'
如何为每一行创建一个只包含第三个数字(在本例中为700(的新列?
以下是一个示例数据帧:
d={'id':{0:‘10023_11_762553_762552_11',1:‘10023_14_325341_359865_14',2:‘10023_17_771459_771453_17',3:‘10023_20_440709_359899_20',4:‘10023_24_773107_625033_24',5:‘10023_27_771462_771463_27',6:‘10023_30_771262_771465_30',7:‘10023_33_761971_762470_33'’,‘values’:{0:10023,1:{}
使用str.split
并获取列表的第三个参数:
df = pd.DataFrame({'Col': ['10020-100-700-800-2']})
df['NewCol'] = df['Col'].str.split('-').str[2].astype(int)
print(df)
# Output
Col NewCol
0 10020-100-700-800-2 700
使用您的样本更新:
data = {'Id': ['10020-100-700-800-2',
'10022-400-900-900-2',
'10045-600-800-900-3',
'10000-300-400-300-3',
'10020-200-200-200-2'],
'Employed': [1, 0, 0, 1, 1],
'Name': ['Alan', 'Joe', 'Sam', 'Amy', 'Chloe']}
df = pd.DataFrame(data)
df['Id2'] = df['Id'].str.split('-').str[2].astype(int)
print(df)
# Output
Id Employed Name Id2
0 10020-100-700-800-2 1 Alan 700
1 10022-400-900-900-2 0 Joe 900
2 10045-600-800-900-3 0 Sam 800
3 10000-300-400-300-3 1 Amy 400
4 10020-200-200-200-2 1 Chloe 200
使用新数据更新2
data = {'id': ['10023_11_762553_762552_11',
'10023_14_325341_359865_14',
'10023_17_771459_771453_17',
'10023_20_440709_359899_20',
'10023_24_773107_625033_24',
'10023_27_771462_771463_27',
'10023_30_771262_771465_30',
'10023_33_761971_762470_33'],
'values': [10023, 10023, 10023, 10023, 10023, 10023, 10023, 10023]}
df = pd.DataFrame(data)
df['id2'] = df['id'].str.split('_').str[2].astype(int)
print(df)
# Output
id values id2
0 10023_11_762553_762552_11 10023 762553
1 10023_14_325341_359865_14 10023 325341
2 10023_17_771459_771453_17 10023 771459
3 10023_20_440709_359899_20 10023 440709
4 10023_24_773107_625033_24 10023 773107
5 10023_27_771462_771463_27 10023 771462
6 10023_30_771262_771465_30 10023 771262
7 10023_33_761971_762470_33 10023 761971