如果列中的行在不知道列名的情况下包含"url"或"http"，如何删除列？

如何删除列A，因为它有以下"https://"；在python中？

背景故事：我有一个500列的数据帧，其中250列是"；https://"；解释先前变量的行中的链接。

目标是循环遍历df以删除具有"的列；http://"；

A	B
https://mywebsite	25
https://mywebsite	42

以下代码片段应该可以工作，删除任何包含url的列：

to_drop = []
for column in df:
try:
has_url = df[column].str.startswith('https://').any()
except AttributeError:
pass  # dtype is not string

if has_url:
to_drop.append(column)
df.drop(columns=to_drop, inplace=True)

对于每一列，它检查每一行是否以"https://"开头。如果它们中有任何一个这样做，那么它们将被添加到要删除的列的"to_drop"列表中。然后将此列表中的列放置在适当位置。

只有中至少50%的值是URL:时，以下版本才会删除一列

to_drop = []
for column in df:
try:
has_url = df[column].str.startswith('https://').mean() > 0.5
except AttributeError:
pass  # dtype is not string

if has_url:
to_drop.append(column)
df.drop(columns=to_drop, inplace=True)

您可以将0.5更改为0和1之间的另一个数字，以更改URL的百分比大小，以便删除列。

相关内容

最新更新

热门标签：