删除具有特定条件的列



我有一个来自python脚本的数据帧输出,它给出了以下输出

日期时间5462021-06-15 14:30:0015891.049805>15868.04980514:30:0005472021-06-15 14:45:005482021-06-15 15:00:0015881.500000>td style="ext-align:center;">15866.50000015:00:005492021-06-15 15:15:0015877.75000015854.54980515:15:005502021-06-15 15:30:0015869.250000>

由于Pandas还提供了跨数据帧的矢量化字符串操作,因此很容易获得包含字符串的行:

数据帧

>>> df
Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00
4  6/15/2021 15:30  15869.25000  15869.25000  15:30:00

结果:

方法一:

正在使用str.contains。。。

>>> df[~df['Time'].str.contains('15:30:00')]
Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

如果您是基于Datetime进行查找

>>> df[~df['Datetime'].str.contains('15:30')]
Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

>>> df[~df.Time.str.contains("15:30") == True]
Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

>>> df[df['Time'].str.contains('15:30') == False]
Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

>>> df[df['Time'].str.contains('15:30') == 0]
Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

方法二:

正在使用isin。。。

>>> df[~df['Time'].isin(['15:30:00'])]
Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

方法三:

使用Not equal to of dataframe and other, element-wise (binary operator ne).

>>> df[df.Time != '15:30:00']
Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

>>> df[df['Time'] != '15:30:00']
Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

>>> df[df['Time'].ne('15:30:00')]
Datetime         High          Low      Time
0  6/15/2021 14:30  15891.04981  15868.04981  14:30:00
1  6/15/2021 14:45  15883.00000  15869.90039  14:45:00
2  6/15/2021 15:00  15881.50000  15866.50000  15:00:00
3  6/15/2021 15:15  15877.75000  15854.54981  15:15:00

我的做法如下,

首先,我们得到要从数据集中删除的时间,在本例中为15:30:00。

由于Datetime列采用的是Datetime格式,因此我们无法将时间作为字符串进行比较。因此,我们将给定的时间转换为datetime.time((格式。

rm_time = dt.time(15,30)

有了这个,我们可以开始使用DataFrame.drop()

df.drop(df[df.Datetime.dt.time == rm_time].index)

你可以试试这个:

import pandas as pd
test_data=pd.read_csv("test.csv")
test_data=test_data[test_data["Time"]!="15:30:00"]
print(test_data)

只需根据条件选择行。

最新更新