我有两个这样的数据帧,
>>>df1
A B
1 3 4
2 6 7
>>>df2
C D E F
1 20.0 30.0 61.2 29.1
2 40.0 50.0 33.8 36.4
现在我想做的是将df2垂直地附加到df1的末尾,这样它看起来像这样:-
A B
1 3 4
2 6 7
3 C D E F
4 20.0 30.0 61.2 29.1
5 40.0 50.0 33.8 36.4
到目前为止,我已经尝试了pd.concat([df1, df2]) with axis = 0 and axis = 1
、pd.append()
,但都没有成功。pd.concat
将df2
水平地附加到df1
上,这不符合我的目的。
pd.concat([df1, df2], axis = 0, ignore_index = True)
输出:-
A B C D E F
1 3 4
2 6 7
3 20.0 30.0 61.2 29.1
4 40.0 50.0 33.8 36.4
pd.concat([df1, df2], axis = 1)
输出这个:-
A B C D E F
1 3 4 20.0 30.0 61.2 29.1
2 6 7 40.0 50.0 33.8 36.4
关于如何做到这一点,有什么想法或建议吗?这是给Python3 的
- 选项1:
在评论中,您提到:
[I]实际上正在从不同的csv中收集特定的重要数据。
如果可能的话,我会利用添加header=None
和index_col[0]
作为pd.read_csv
的参数的优势。这样,你可以很容易地实现以下目标:
import pandas as pd
from io import StringIO
# imitating the csv files here
file1 = StringIO("""
,A,B
1,3,4
2,6,7
""")
file2 = StringIO("""
,C,D,E,F
1,20.0,30.0,61.2,29.1
2,40.0,50.0,33.8,36.4
""")
list_files = [file1, file2]
list_dfs = list()
for file in list_files:
list_dfs.append(pd.read_csv(file, sep=',', header=None, index_col=[0]))
df_new = pd.concat(list_dfs, axis=0, ignore_index=True)
print(df_new)
1 2 3 4
0 A B NaN NaN
1 3 4 NaN NaN
2 6 7 NaN NaN
3 C D E F
4 20.0 30.0 61.2 29.1
5 40.0 50.0 33.8 36.4
现在,在这一点上,您当然可以将df.columns
更改为df_new.iloc[0]
(即['A', 'B', nan, nan]
(,但这将留下重复的NaN
值作为列名:
df_new.columns = df_new.iloc[0].values.tolist()
df_new = df_new.iloc[1:]
print(df_new)
A B NaN NaN
1 3 4 NaN NaN
2 6 7 NaN NaN
3 C D E F
4 20.0 30.0 61.2 29.1
5 40.0 50.0 33.8 36.4
这既非常不切实际,而且在以后想要基于column (index)
引用操作数据时也很可能导致错误。
- 选项2:
如果第一个选项不可行(例如无法访问原始CSV(,您可以获得如下相同的结果:
data1 = {'A': {1: 3, 2: 6}, 'B': {1: 4, 2: 7}}
df1 = pd.DataFrame(data1)
data2 = {'C': {1: 20.0, 2: 40.0},
'D': {1: 30.0, 2: 50.0},
'E': {1: 61.2, 2: 33.8},
'F': {1: 29.1, 2: 36.4}}
df2 = pd.DataFrame(data2)
list_dfs = [df1,df2]
for i, item in enumerate(list_dfs):
item.loc[-1] = item.columns
item.index = item.index + 1
item = item.sort_index()
item.columns = [i for i in range(1, len(item.columns)+1)] # or start at 0
list_dfs[i] = item
df_new = pd.concat(list_dfs, axis=0, ignore_index=True)
print(df_new)
1 2 3 4
0 A B NaN NaN
1 3 4 NaN NaN
2 6 7 NaN NaN
3 C D E F
4 20.0 30.0 61.2 29.1
5 40.0 50.0 33.8 36.4