从字符串列表中删除不必要的文本



现有Df:

Id                                            dates
01         ['ATIVE 04/2018 to 03/2020',' XYZ mar 2020 – Jul 2022','June 2021 - 2023 XYZ']

期望Df:

Id                                            dates
01         ['04/2018 to 03/2020','mar 2020 – Jul 2022','June 2021 - 2023']

我希望清理日期列下的列表。我尝试了下面的功能,但没有达到目的。有相同的线索吗?

def clean_dates_list(dates_list):
cleaned_dates_list = []
for date_str in dates_list:
cleaned_date_str = re.sub(r'[^A-Za-zsd]+', '', date_str)
cleaned_dates_list.append(cleaned_date_str)
return cleaned_dates_list
ls = ['ATIVE 04/2018 to 03/2020', ' XYZ mar 2020 – Jul 2022', 'June 2021 - 2023 XYZ']
ls_to_remove = ['ATIVE', 'XYZ']
for item in ls:
ls_str = item.split()
new_item = str()
for item in ls_str:
if item in ls_to_remove:
continue
new_item += " " + item
print(new_item)

我不知道你要删除的单词列表,这不是一个好的做法。但在你的情况下,它是有效的。

最新更新