我对熊猫中视图的概念以及熊猫数据帧在复制时引用其内容的方式感到非常困惑。我相信这里的专家可以给我一些直接的答案。当谈到这个或麻木哲学时,我很厚,我想这两者是相关的。
下面是一个典型的例子:
import pandas as pd
import datetime
from random import randint
print "done importing modules"
def aRandomRow() :
out={}
out['aDate'] = datetime.datetime(randint(2010,2018),randint(1,12),randint(1,28))
out['aScalar']=randint(1,1000)
out['anArray']=[randint(1,1000),randint(1,1000),randint(1,1000)]
out['anArDate']=[datetime.datetime(randint(2010,2018),randint(1,12),randint(1,28)),datetime.datetime(randint(2010,2018),randint(1,12),randint(1,28)),datetime.datetime(randint(2010,2018),randint(1,12),randint(1,28))]
return out
# Now the dataframes for the examples
df=pd.DataFrame([aRandomRow(),aRandomRow(),aRandomRow()])
df1=df.copy()
df2=df.copy(deep=True)
#I get something like this :
aDate aScalar anArDate anArray
0 2016-07-28 5 [2015-02-06 00:00:00, 2015-12-14 00:00:00, 201... [121, 67, 277]
1 2014-05-04 39 [2015-11-03 00:00:00, 2014-04-23 00:00:00, 201... [939, 105, 714]
2 2010-12-01 157 [2015-07-05 00:00:00, 2012-05-06 00:00:00, 201... [43, 79, 230]
#Now I modify the copies and check the result on the original
df1.loc[0,'aDate'] = 1001
df2.loc[0,'aDate'] = 1002
# df1 and df2 get modified but not df. As intuitively intended.
df1.loc[0,'anArray'].append(1001)
df2.loc[0,'anArray'].append(1002)
# the list inside the cell index 0 of df.anArray gets appended with 1001 and 1002
# that is what bugs/puzzle me !
aDate aString anArDate anArray
0 2016-07-28 5 [2015-02-06 00:00:00, 2015-12-14 00:00:00, 201... [121, 67, 277, 1001, 1001, 1002]
1 2014-05-04 39 [2015-11-03 00:00:00, 2014-04-23 00:00:00, 201... [939, 105, 714]
2 2010-12-01 157 [2015-07-05 00:00:00, 2012-05-06 00:00:00, 201... [43, 79, 230]
pd.DataFrame.copy
文档指出
请注意,当复制 deep=True 数据时,不会递归复制实际的 python 对象,只会复制对该对象的引用。这与标准库中的copy.deepcopy形成鲜明对比,后者递归复制对象。
您可以使用copy
模块实现真正的复制:
import copy
df = pd.DataFrame({'a': [[1, 2]]})
df1 = df.copy(deep=True)
for c in df1:
df1[c] = [copy.deepcopy(e) for e in df1[c]]