Pandas列在使用bfill后消失



我有一个数据帧,它使用了一个更大的数据帧中的一组列

我使用bfill函数来填充某些列中缺少的日期值。然而,在一个经典的场景中,其中一列只有null值,并且在bfill之后,该列将消失

import pandas as pd
import warnings
import shutil
import xlrd
import xlwt
import glob
from datetime import datetime
from datetime import timedelta
import os
from pandas import ExcelWriter
for f in glob.glob("Raw/*.xlsx"):
xls = pd.ExcelFile(f)
df1 = xls.parse(sheet_name=0)
#print(df1.shape)
new = df1.filter(['Actual Start',
'FOR CLIENT SUBMISSION / APPROVAL-Issued for Self Discipline Check (SDC)(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Issued for Inter-disciplinary Check  (IDC)(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Submission to client(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Reviewed by Client(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Approved by Client (FAC)(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Issue for Construction(Actual Finish Date)',
'FOR PR with offer Evaluation-Discipline Completed (SDC)(Actual Finish Date)',
'FOR PR with offer Evaluation-Issuance to procurement team (FRB)(Actual Finish Date)','FOR PR with offer Evaluation-Tech. Evaluation Completion(Actual Finish Date)'], axis=1)
#print(new.shape)

cols = new.columns
new[cols] = new[cols].apply(pd.to_datetime).bfill(axis=1)
print(cols)

输出:数据帧中不再有列:客户提交/批准(FAC((实际完成日期(

Index(['Actual Start',
'FOR CLIENT SUBMISSION / APPROVAL-Issued for Self Discipline Check (SDC)(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Issued for Inter-disciplinary Check  (IDC)(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Submission to client(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Reviewed by Client(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Issue for Construction(Actual Finish Date)',
'FOR PR with offer Evaluation-Discipline Completed (SDC)(Actual Finish Date)',
'FOR PR with offer Evaluation-Issuance to procurement team (FRB)(Actual Finish Date)',
'FOR PR with offer Evaluation-Tech. Evaluation Completion(Actual Finish Date)'],
dtype='object')

我没有看到与bfill直接相关的问题。如果没有样本数据,要完全理解这个问题有点困难。但是,选择列的方式是不习惯的。以下内容对你有用吗?

# Columns of interest
cols = ['Actual Start',
'FOR CLIENT SUBMISSION / APPROVAL-Issued for Self Discipline Check (SDC)(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Issued for Inter-disciplinary Check  (IDC)(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Submission to client(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Reviewed by Client(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Approved by Client (FAC)(Actual Finish Date)',
'FOR CLIENT SUBMISSION / APPROVAL-Issue for Construction(Actual Finish Date)',
'FOR PR with offer Evaluation-Discipline Completed (SDC)(Actual Finish Date)',
'FOR PR with offer Evaluation-Issuance to procurement team (FRB)(Actual Finish Date)','FOR PR with offer Evaluation-Tech. Evaluation Completion(Actual Finish Date)']
for f in glob.glob("Raw/*.xlsx"):
xls = pd.ExcelFile(f)
df1 = xls.parse(sheet_name=0)
# Select the columns of interest
new1 = df1[cols]
new2 = new1.apply(pd.to_datetime)
new3 = new2.bfill(axis=0)

更新:bfill表示向后填充,它通过使用沿轴的下一个有效观测来填充nan个间隙。如果要沿列轴填充nan,则应调用df.bfill(axis=0)。(我相信代码中的axis=1不是您想要的(请注意,在bfill之后,一个空列(只有nan值(将保持为空。bfill不可能删除列。

最新更新