为pandas数据框架列添加字符串,其中包含多个逗号分隔值



我试图在包含字符串前缀和来自另一列的值的pandas数据框中创建一个新列。包含值的列具有多个以逗号分隔的值的实例。例如:

MIMNumber      
102610
114080,601079 

我想让数据帧看起来像这样:

MIMNumber       OMIM_Link
102610  https://www.omim.org/entry/102610
114080,601079   https://www.omim.org/entry/114080,https://www.omim.org/entry/601079

我试过了:

df['OMIM_Link'] = df['MIMNumber'].map('https://www.omim.org/entry/{}'.format)  

但是,这并不会将字符串前缀添加到有多个逗号分隔值的所有实例中:

MIMNumber       OMIM_Link
102610  https://www.omim.org/entry/102610
114080,601079   https://www.omim.org/entry/114080,601079

我也试过这个:

url = 'https://www.omim.org/entry/'
df['OMIM_Link'] = df['MIMNumber'].apply(url.join)

但是字符串前缀在每个值之间连接:

MIMNumber       OMIM_Link
102610  1https://www.omim.org/entry/0https://www.omim.org/entry/2https://www.omim.org/entry/6https://www.omim.org/entry/1https://www.omim.org/entry/0
114080,601079   1https://www.omim.org/entry/1https://www.omim.org/entry/4https://www.omim.org/entry/0https://www.omim.org/entry/8https://www.omim.org/entry/0https://www.omim.org/entry/,https://www.omim.org/entry/6https://www.omim.org/entry/0https://www.omim.org/entry/1https://www.omim.org/entry/0https://www.omim.org/entry/7https://www.omim.org/entry/9

有什么建议吗?

你可以尝试用regex替换

df['out'] = df['MIMNumber'].replace(r'(d+)', r'https://www.omim.org/entry/1', regex=True)
print(df)
MIMNumber  
0         102610
1  114080,601079
out
0                                    https://www.omim.org/entry/102610
1  https://www.omim.org/entry/114080,https://www.omim.org/entry/601079

将逗号替换为,https://www.omim.org/entry/,并在开头添加https://www.omim.org/entry/

df['OMIM_Link'] = 'https://www.omim.org/entry/' + df['MIMNumber'].str.replace(',', ',https://www.omim.org/entry/')

把这个放在这里以防你有各种各样的域/路径:

import pandas as pd
df = pd.DataFrame({'MIMNumber': ['102610', '114080,601079'],
'OMIM_Link': ['https://www.omim.org/entry/',
'https://www.omim.org/entry/,https://www.omim.org/entry/']})
for i in range(len(df)):
mim = df['MIMNumber'][i]
if "," in mim:
mim = mim.split(",")
link = df['OMIM_Link'][i].split(",")
df['OMIM_Link'][i] = ",".join(['{o}{m}'.format(o=link[i], m=mim[i])
for i in range(len(link))])
else:
link = df['OMIM_Link'][i]
df['OMIM_Link'][i] = '{o}{m}'.format(o=link, m=mim)
print(df)

做你想做的事:

MIMNumber                                          OMIM_Link
0         102610                  https://www.omim.org/entry/102610
1  114080,601079  https://www.omim.org/entry/114080,https://www....

相关内容

最新更新