用不相等的列垂直附加数据帧



我有两个这样的数据帧,

>>>df1
A   B
1   3   4
2   6   7
>>>df2
C      D      E      F
1  20.0   30.0   61.2   29.1
2  40.0   50.0   33.8   36.4

现在我想做的是将df2垂直地附加到df1的末尾,这样它看起来像这样:-

A      B
1      3      4
2      6      7
3      C      D    E    F
4    20.0   30.0   61.2   29.1
5    40.0   50.0   33.8   36.4

到目前为止,我已经尝试了pd.concat([df1, df2]) with axis = 0 and axis = 1pd.append(),但都没有成功。pd.concatdf2水平地附加到df1上,这不符合我的目的。

pd.concat([df1, df2], axis = 0, ignore_index = True)输出:-

A    B    C     D     E     F
1  3    4
2  6    7
3           20.0  30.0  61.2  29.1
4           40.0  50.0  33.8  36.4

pd.concat([df1, df2], axis = 1)输出这个:-

A    B    C     D     E     F
1  3    4   20.0  30.0  61.2  29.1
2  6    7   40.0  50.0  33.8  36.4

关于如何做到这一点,有什么想法或建议吗?这是给Python3 的

  • 选项1:

在评论中,您提到:

[I]实际上正在从不同的csv中收集特定的重要数据。

如果可能的话,我会利用添加header=Noneindex_col[0]作为pd.read_csv的参数的优势。这样,你可以很容易地实现以下目标:

import pandas as pd
from io import StringIO
# imitating the csv files here
file1 = StringIO("""
,A,B
1,3,4
2,6,7
""")
file2 = StringIO("""
,C,D,E,F
1,20.0,30.0,61.2,29.1
2,40.0,50.0,33.8,36.4
""")
list_files = [file1, file2]
list_dfs = list()
for file in list_files:
list_dfs.append(pd.read_csv(file, sep=',', header=None, index_col=[0]))
df_new = pd.concat(list_dfs, axis=0, ignore_index=True)
print(df_new)
1     2     3     4
0     A     B   NaN   NaN
1     3     4   NaN   NaN
2     6     7   NaN   NaN
3     C     D     E     F
4  20.0  30.0  61.2  29.1
5  40.0  50.0  33.8  36.4

现在,在这一点上,您当然可以将df.columns更改为df_new.iloc[0](即['A', 'B', nan, nan](,但这将留下重复的NaN值作为列名:

df_new.columns = df_new.iloc[0].values.tolist()
df_new = df_new.iloc[1:]
print(df_new)
A     B   NaN   NaN
1     3     4   NaN   NaN
2     6     7   NaN   NaN
3     C     D     E     F
4  20.0  30.0  61.2  29.1
5  40.0  50.0  33.8  36.4

这既非常不切实际,而且在以后想要基于column (index)引用操作数据时也很可能导致错误。


  • 选项2:

如果第一个选项不可行(例如无法访问原始CSV(,您可以获得如下相同的结果:

data1 = {'A': {1: 3, 2: 6}, 'B': {1: 4, 2: 7}}
df1 = pd.DataFrame(data1)
data2 = {'C': {1: 20.0, 2: 40.0},
'D': {1: 30.0, 2: 50.0},
'E': {1: 61.2, 2: 33.8},
'F': {1: 29.1, 2: 36.4}}
df2 = pd.DataFrame(data2)
list_dfs = [df1,df2]
for i, item in enumerate(list_dfs):
item.loc[-1] = item.columns
item.index = item.index + 1
item = item.sort_index()
item.columns = [i for i in range(1, len(item.columns)+1)] # or start at 0
list_dfs[i] = item
df_new = pd.concat(list_dfs, axis=0, ignore_index=True)
print(df_new)
1     2     3     4
0     A     B   NaN   NaN
1     3     4   NaN   NaN
2     6     7   NaN   NaN
3     C     D     E     F
4  20.0  30.0  61.2  29.1
5  40.0  50.0  33.8  36.4

最新更新