背景我从文本到数据应用程序创建的csv/xlsx文件将数据加载到Panda中。在节省时间的同时,自动读取也非常准确。下面我简化了一个负载,以说明我难以排序的一个特定问题:
import pandas as pd
from tabulate import tabulate
df_is = {"Var":["Sales","Gogs","Op prof","Depreciation","Net fin","PBT","Tax","PAT"],
"2021":[100,-50,50,-10,-5,35,"",""],
"2022":[125,-55,70,-15,-10,45,-10,25],
"":["","","","","","",-15,30]}
df_want = {"Var":["Sales","Gogs","Op prof","Depreciation","Net fin","PBT","Tax","PAT"],
"2021":[100,-50,50,-10,-5,35,-10,25],
"2022":[125,-55,70,-15,-10,45,-15,30]}
print(tabulate(df_is))
print()
print(tabulate(df_want))
问题通过运行代码可以看出,应用程序没有正确读取第一个表中的数据,导致第二列和第三列的最后两个数据点分别出现在第三列和最后一列。
第二张表显示了我希望它如何显示。实际问题更为复杂和普遍,因此重写值的局部解决方案是不可行的。一个解决方案,比如在Excel中,我会删除第二列中的空单元格,同时将行中的所有其他数据向左/向右移动(取决于任务(,这将是很好的。
尝试过作为一名新手,我尝试过搜索解决方案,但我的搜索标准似乎都没有找到相关的解决方案。
我还使用df.iloc((创建了一个变量,该变量包含四个越界的数据单元,然后尝试将它们附加到第1列和第2列。只添加了最后两行的副本。
伟大的建议!
版本conda 4.11.0Python 3.9.7
Pandas 1.3.4
请尝试以下操作:
import pandas as pd
import numpy as np
f_is = {"Var":["Sales","Gogs","Op prof","Depreciation","Net fin","PBT","Tax","PAT"],
"2021":[100,-50,50,-10,-5,35,"",""],
"2022":[125,-55,70,-15,-10,45,-10,25],
"":["","","","","","",-15,30]}
input_df = pd.DataFrame(f_is)
output_df = input_df.T.replace('', np.nan).apply(lambda x: pd.Series(x.dropna().to_numpy())).T
output_df.columns = ['Var','2021','2022']
output_df