假设我有以下数据帧
import pandas as pd
data = [['Mallika', 23, 'Student'], ['Yash', 25, 'Tutor'], ['Abc', 14, 'Clerk']]
data_frame = pd.DataFrame(data, columns=['Student.first.name.word', 'Student.Current.Age.word', 'Student.Current.Profession.word'])
Student.first.name.word Student.Current.Age.word Student.Current.Profession.word
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk
我将如何将常见的列标题词";学生;以及";单词";
这样您就可以得到以下数据帧:
first.name Current.Age Current.Profession
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk
您可以使用正则表达式从列中删除这些单词和.
s并将其赋值回:
data_frame.columns = data_frame.columns.str.replace(r"(Student|word|.)", "")
获取
>>> data_frame
name Age Profession
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk
更新后
您可以split - slice - join
:
data_frame.columns = data_frame.columns.str.split(r".").str[1:-1].str.join(".")
即在文字点上拆分,首先取出&最后一个元素,最后用点将它们连接起来
获取
first.name Current.Age Current.Profession
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk
以下是我的答案的扩展,用于删除常见前缀。这种方法的好处是,它以通用的方式查找前缀和后缀,因此无需对任何模式进行硬编码。
cols = data_frame.columns
common_prefix = os.path.commonprefix(cols.tolist())
common_suffix = os.path.commonprefix([col[::-1] for col in cols])[::-1]
data_frame.columns = cols.str.replace(f"{common_prefix}|{common_suffix}", "", regex=True)
name Age Profession
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk
更新,对于更新的问题,相同的解决方案以通用方式工作:
first.name Current.Age Current.Profession
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk
要删除所有单词,而不仅仅是硬编码的单词,您可以尝试
df = data_frame
from functools import reduce
common_words = [i.split(".") for i in df.columns.tolist()]
common_words =reduce(lambda x,y : set(x).intersection(y) ,common_words)
pat = r'b(?:{})b'.format('|'.join(common_words))
df.columns = df.columns.str.replace(pat, "").str[1:-1]
输出:
print(df)
first.name Current.Age Current.Profession
0 Mallika 23 Student
1 Yash 25 Tutor
2 Abc 14 Clerk