我试图与pd.concat([a, b], axis=1)
连接两个系列,但结果是一个充满NaN
s的数据帧,这就是我的意思:
生成两个Series
by_status = odr.set_index('order_status')
g = by_status.groupby(['dt', 'product_id'])
payed_orders = g.size()
payed_orders.name = 'payed_orders'
refund_g = by_status.loc[[1,2,3], :].groupby(['dt', 'product_id'])
refund_orders = refund_g.size()
refund_orders.name = 'refund_orders'
# I'm going to concat refund_orders and payed_orders
>>>payed_orders.head()
dt product_id
2015-01-15 10001 1
10007 1
10016 14
10022 1
10023 1
Name: payed_orders, dtype: int64
>>>refund_orders.head()
dt product_id
2015-01-15 10007 1
10016 4
10030 1
2015-01-16 10007 3
10008 1
Name: refund_orders, dtype: int64
>>>pd.concat([payed_orders.head(), refund_orders.head()], axis=1, ignore_index=False)
payed_orders refund_orders
dt product_id
2015-01-15 10001 NaN NaN
10007 NaN NaN
10016 NaN NaN
10022 NaN NaN
10023 NaN NaN
10030 NaN NaN
2015-01-16 10007 NaN NaN
10008 NaN NaN
我想我一定是犯了一些明显的错误,但我真的想不出来,请帮忙。
注:代码是从ipython笔记本中复制的,不要对格式感到奇怪。
更新尝试通过ignore_index=True
,结果如下:
>>>pd.concat([payed_orders.tail(), refund_orders.tail()], axis=1, ignore_index=True)
0 1
dt product_id
2015-09-07 1000081 NaN NaN
1000084 NaN NaN
1000094 NaN NaN
1000096 NaN NaN
1000124 NaN NaN
1000131 NaN NaN
1000132 NaN NaN
1000133 NaN NaN
1000134 NaN NaN
1000137 NaN NaN
关于索引格式问题
所以这里有两个序列不能很好地连接在一起:
>>>a4.head().to_dict()
{'actual_suborders': {(datetime.date(2015, 1, 15), 10001): 1,
(datetime.date(2015, 1, 15), 10016): 10,
(datetime.date(2015, 1, 15), 10022): 1,
(datetime.date(2015, 1, 15), 10023): 1,
(datetime.date(2015, 1, 15), 10024): 1}}
>>>a5.head().to_dict()
{'refund_suborders': {(datetime.date(2015, 1, 15), 10007): 1,
(datetime.date(2015, 1, 15), 10016): 4,
(datetime.date(2015, 1, 15), 10030): 1,
(datetime.date(2015, 1, 16), 10007): 4,
(datetime.date(2015, 1, 16), 10008): 1}}
>>>pd.concat([a4.head(), a5.head()], axis=1)
actual_suborders refund_suborders
dt product_id
2015-01-15 10001 NaN NaN
10007 NaN NaN
10016 NaN NaN
10022 NaN NaN
10023 NaN NaN
10024 NaN NaN
10030 NaN NaN
2015-01-16 10007 NaN NaN
10008 NaN NaN
最后感谢所有决定看一下这个的人,伟大的社区。
我已经序列化了上述系列的头部,上传到evernote,包含代码来加载和连接它们
https://www.evernote.com/l/AH4AdfgOJJROuZSfGfDR_jZvA0zEpIHgyq0为了使其工作,我必须从每个Series上的旧索引的连接中创建唯一的值。然后在连接时将其作为参数传递给join_axes
:
import datetime
import pandas as pd
s1 = pd.Series([1, 10, 1, 1, 1],
name='actual_suborders',
index=[(dt.date(2015, 1, 15), 10001),
(dt.date(2015, 1, 15), 10016),
(dt.date(2015, 1, 15), 10022),
(dt.date(2015, 1, 15), 10023),
(dt.date(2015, 1, 15), 10024)])
s2 = pd.Series([1, 4, 1, 4, 1],
name='refund_suborders',
index=[(dt.date(2015, 1, 15), 10007),
(dt.date(2015, 1, 15), 10016),
(dt.date(2015, 1, 15), 10030),
(dt.date(2015, 1, 16), 10007),
(dt.date(2015, 1, 16), 10008)])
idx = set(pd.concat([s1.reset_index()['index'],
s2.reset_index()['index']],
ignore_index=True))
>>> pd.concat([s1, s2], axis=1, join_axes=[idx])
actual_suborders refund_suborders
(2015-01-15, 10022) 1 NaN
(2015-01-15, 10001) 1 NaN
(2015-01-15, 10023) 1 NaN
(2015-01-16, 10008) NaN 1
(2015-01-15, 10030) NaN 1
(2015-01-15, 10016) 10 4
(2015-01-15, 10007) NaN 1
(2015-01-16, 10007) NaN 4
(2015-01-15, 10024) 1 NaN
而且,您的索引似乎在某处发生了变化。您的by_status.groupby(['dt', 'product_id'])
操作应该导致MultiIndex,但是上面粘贴的a4.head()
和a5.head()
的结果表明,它在一行的某个地方更改为元组对。我怀疑这可能是最终的问题。
编辑
我不明白为什么concat
不工作,但我设法实现你的目标使用merge
。
首先,重置索引。然后合并dt
和product_id
上的dataframe:
a4.reset_index(inplace=True)
a5.reset_index(inplace=True)
>>> a4.merge(a5, on=['dt', 'product_id'], how='outer')
dt product_id actual_suborders refund_suborders
0 2015-01-15 10001 1 NaN
1 2015-01-15 10016 10 4
2 2015-01-15 10022 1 NaN
3 2015-01-15 10023 1 NaN
4 2015-01-15 10024 1 NaN
5 2015-01-15 10007 NaN 1
6 2015-01-15 10030 NaN 1
7 2015-01-16 10007 NaN 4
8 2015-01-16 10008 NaN 1