编辑包含重复特殊字符的列名



我有一些列名在不同的空格处包含两个问号,例如"你多大了?"?你什么时候开始上大学的我需要确定哪些列中有两个问号。欢迎使用任何提示!感谢

数据

df = pd.DataFrame(data={'id': [1, 2, 3, 4, 5], 'how old were you? when you started university?': [1,2,3,4,5], 'how old were you when you finished university?': [1,2,3,4,5], 'at what age? did you start your first job?': [1,2,3,4,5]})

预期输出

df1 = pd.DataFrame(data={'id': [1, 2, 3, 4, 5], 'how old were you when you finished university?': [1,2,3,4,5]})

一个具有列表理解能力的想法:

df = df[[c for c in df.columns if c.count("?") < 2]]
print (df)
id  how old were you when you finished university?
0   1                                               1
1   2                                               2
2   3                                               3
3   4                                               4
4   5                                               5

您可以使用布尔索引:

x = df.loc[:, df.columns.str.count(r"?") < 2]
print(x)

打印:

id  how old were you when you finished university?
0   1                                               1
1   2                                               2
2   3                                               3
3   4                                               4
4   5                                               5

如果您想获得所有有多个问号的列,可以使用以下方法:

[c for c in df.columns if c.count("?")>1]

编辑:如果要替换多余的"quot;但是保持结尾"&";,使用这个:

df.rename(columns = {c: c.replace("?", "")+"?" for c in df.columns if c.find("?")>0})

df = df.drop([col for col in df.columns if col.count("?")>1], axis=1)

最新更新