Python Pandas read_csv()不必要地获取与我的笔记本电脑相关的信息



我正在为一些音频数据处理一些标签。当我在csv中读取时,我会更改列的名称。然而,出于某种原因,似乎有两个数据帧正在被读入,一个包含我从csv中关心的信息,另一个包含了我的用户名、我正在使用的笔记本电脑类型以及我电脑上的当前时间。

代码:

# initializing the output dataframe that will contain all of the labels and the relevant metadata
# across each audio clip in a dataset. Should be in the format to work with it in the Python package
manual_df = pd.DataFrame()
# the ground truth labels lack column names, so I am filling them in closer to the end product
column_names = ["OFFSET","MANUAL ID"]
for clip_annotations in os.listdir(label_path):
# isolating the name of the clip from the csv file
# will be used to extract the metadata from the equivelant wav file
x = clip_annotations.split('.')
clip_name = x[0]
# taking in the labels for the audio clip
clip_df = pd.read_csv(label_path+clip_annotations,names=column_names)
print(clip_df)
# removing the annotations that occur over the same interval in the clip
# first step in converting multi-class classifier into binary classifier.
clip_df = clip_df.drop_duplicates(subset = ["OFFSET"])
# second step to converting multi-class classifier to binary classifier
# Isn't all that necessary since we don't use the MANUAL ID Column that much yet
clip_df["MANUAL ID"] = "bird"
# splitting the time into OFFSET and DURATION
new = clip_df["OFFSET"].str.split("-", n = 1, expand = True)
clip_df["OFFSET"] = new[0]
clip_df["DURATION"] = 5
#print(clip_df)
# converting hours minutes seconds format into seconds
new = clip_df["OFFSET"].str.split(":", n = 2, expand = True)
#print(new)
#new = new.rename(columns={"Hours","Minutes","Seconds"})
#seconds_offset = new[0]*3600 + new[1]*60 + new[2]
#print(seconds_offset)
new
output: 
OFFSET  
NaN jacob jacob-Aspire-E5-575  26.03.2021 13:49   
MANUAL ID  
NaN jacob jacob-Aspire-E5-575  file:///home/jacob/.config/libreoffice/4;  
OFFSET MANUAL ID
0    00:00:00-00:00:05   cintin1
1    00:00:05-00:00:10   cintin1
2    00:00:05-00:00:10   citwoo1
3    00:00:10-00:00:15   butwoo1
4    00:00:10-00:00:15   cintin1
..                 ...       ...
319  00:09:50-00:09:55    meapar
320  00:09:50-00:09:55   strwoo2
321  00:09:55-00:10:00   butwoo1
322  00:09:55-00:10:00   hauthr1
323  00:09:55-00:10:00    meapar
[324 rows x 2 columns]

我的目标是让它不再收集与我的笔记本电脑相关的不必要信息

我回去打印了clip_annotations,结果发现我感兴趣的文件有一些重复的"锁定";看起来像这样的文件:~锁PER49_20190131.csv#不确定为什么会发生这种情况,但对于我的情况,这个脚本不需要是通用的,所以我只是在循环开始时用这个条件进行编码:


if clip_annotations.startswith(".~lock."):
continue

最新更新