如何迭代地追加到文本



我有一个数据帧,需要在python 中追加到第/9/页

df:
/soccer/england/premier-league-2020-2021/results/
/soccer/england/premier-league-2019-2020/results/
/soccer/england/premier-league-2018-2019/results/

对于df中的每一行,我必须将page/#/page/2/page/3/page/4/等附加到page/9/,如下所示

我如何在python中做到这一点?

预期df:

/soccer/england/premier-league-2020-2021/results/#/
/soccer/england/premier-league-2020-2021/results/#/page/2/
/soccer/england/premier-league-2020-2021/results/#/page/3/
.
.
/soccer/england/premier-league-2020-2021/results/#/page/9/
/soccer/england/premier-league-2019-2020/results/#/
/soccer/england/premier-league-2019-2020/results/#/page/2/
/soccer/england/premier-league-2019-2020/results/#/page/3/
.
.
/soccer/england/premier-league-2019-2020/results/#/page/9/
/soccer/england/premier-league-2018-2019/results/#/
/soccer/england/premier-league-2018-2019/results/#/page/2/
/soccer/england/premier-league-2018-2019/results/#/page/3/
.
.
/soccer/england/premier-league-2018-2019/results/#/page/9/

您只需运行一个简单的循环:

import pandas as pd
df = pd.read_csv('data.csv')
liks_with_pages = []
for lid,link in enumerate(df['Duration'].tolist()):
page_num = lid%9 + 1
if page_num == 1:
suffix = '#/'
else:
suffix = '#/page/' + str(page_num) + '/'
liks_with_pages.append(str(link)+suffix)

我使用的示例数据帧:

df=pd.DataFrame({'col': {0: '/soccer/england/premier-league-2020-2021/results/',
1: '/soccer/england/premier-league-2019-2020/results/',
2: '/soccer/england/premier-league-2018-2019/results/',
3: '/soccer/england/premier-league-2020-2021/results/',
4: '/soccer/england/premier-league-2019-2020/results/',
5: '/soccer/england/premier-league-2018-2019/results/',
6: '/soccer/england/premier-league-2020-2021/results/',
7: '/soccer/england/premier-league-2019-2020/results/',
8: '/soccer/england/premier-league-2018-2019/results/',
9: '/soccer/england/premier-league-2020-2021/results/',
10: '/soccer/england/premier-league-2019-2020/results/',
11: '/soccer/england/premier-league-2018-2019/results/'}})

你可以试试:

df['h']=df.index%9+1
#created a helper column
df['col']=df['col']+("#/page/"+df['h'].astype(str)+'/').mask(df['h'].eq(1),"#/")
#conditionally adding '"/#/page/pagenumber/"' and '#/'
df=df.drop('h',1)
#remove that helper column

现在,如果你打印df,你会得到你想要的输出

更新:

IIUC每个唯一的url需要9个url,所以:

out=pd.DataFrame(df['col'].unique(),columns=['col'])
#created a dataframe from the unique values of 'col' column
out=out.reindex(out.index.repeat(9)).reset_index(drop=True)
#repeated values of each row 9 times
out['h']=out.index%9+1
#created a helper column
out['col']=out['col']+("#/page/"+out['h'].astype(str)+'/').mask(out['h'].eq(1),"#/")
#conditionally adding '"/#/page/pagenumber/"' and '#/'
out=out.drop('h',1)
#remove that helper column

现在,如果你打印out,你会得到你想要的输出

processed = dict()
for URL in URLS:
if URL not in processed:
processed[URL] = 1
print(url)
else:
processed[URL] = processed[URL]+1
print(url+f'/page/{processed[URL]}')
year_results = [
"/soccer/england/premier-league-2020-2021/results",
"/soccer/england/premier-league-2019-2020/results",
"/soccer/england/premier-league-2018-2019/results",
]
sub_pages = []
for year_res in year_results:
for page in range(1, 10):
if page != 1:
page = f"page/{page}/"
else:
page = ''
sub_pages.append(f"{year_res}/#/{page}")

或等效地:

sub_pages = [f"{year_res}/#/{f'page/{page}/' if page!=1 else ''}" for year_res in year_results for page in range(1, 10)]

最新更新