我有一个数据帧,我想插入一个带有输入日期的记录,如果输入日期不存在,则保留其余列NA。如果输入日期确实存在,则不执行
InputDate="5/1/2022"
if inputdate does not exist as a record insert new record
In
Invoice Date ... Check
1 2022-04-01 ... 1.30
2 2022-03-01 ... 1.19
Out
Invoice Date ... Check
0 2022-05-01 ... NaN
1 2022-04-01 ... 1.30
2 2022-03-01 ... 1.19
您可以使用:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Invoice Date': ['2022-04-01', '2022-03-01'], 'Check': [1.30, 1.19]})
df['Invoice Date'] = pd.to_datetime(df['Invoice Date'])
input_date="5/1/2022"
if pd.to_datetime(input_date) not in df['Invoice Date']:
row = {'Invoice Date': pd.to_datetime(input_date)}
row.update({col: np.nan for col in df.columns if col not in row})
df.loc[len(df)] = row
输出
Invoice Date Check
0 2022-04-01 1.30
1 2022-03-01 1.19
2 2022-05-01 NaN
如果你想按照上面所示的逆时间顺序对它们进行排序,你可以这样做:
df.sort_values(by='Invoice Date', ascending=False).reset_index(drop=True)
输出
Invoice Date Check
0 2022-05-01 NaN
1 2022-04-01 1.30
2 2022-03-01 1.19
您也可以将日期列设置为索引,然后将append
方法与verify_integrity=True
一起使用。
如果索引存在,它将抛出一个错误,如果不存在,则将NaN值添加到其余列中。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Invoice Date': ['2022-04-01', '2022-03-01'], 'Check': [1.30, 1.19], 'Check2': [1.30, 1.19]})
df = df.set_index('Invoice Date')
new_row = pd.DataFrame(np.nan, columns=df.columns, index=['2023-04-01'])
df.append(new_row, verify_integrity=True)
输出:
Check Check2
2022-04-01 1.30 1.30
2022-03-01 1.19 1.19
2023-04-01 NaN NaN
以下代码将引发错误ValueError: Indexes have overlapping values: Index(['2022-03-01'], dtype='object')
。所以你可以抓住它!
new_row = pd.DataFrame(np.nan, columns=df.columns, index=['2022-03-01'])
df.append(new_row, verify_integrity=True)