我有一个数据帧,其中一列充满了numpy数组。
A B C
0 1.0 0.000000 [[0. 1.],[0. 1.]]
1 2.0 0.000000 [[85. 1.],[52. 0.]]
2 3.0 0.000000 [[5. 1.],[0. 0.]]
3 1.0 3.333333 [[0. 1.],[41. 0.]]
4 2.0 3.333333 [[85. 1.],[0. 21.]]
问题是,当我将其保存为CSV文件,并将其加载到另一个python文件时,numpy列将被读取为文本。
我尝试用np.fromstring()
或np.loadtxt()
转换列,但它不起作用。
pd.read_csv()
后的和阵列示例
"[[ 85. 1.]n [ 52. 0. ]]"
感谢
您可以尝试.to_json()
output = pd.DataFrame([
{'a':1,'b':np.arange(4)},
{'a':2,'b':np.arange(5)}
]).to_json()
但当你重新加载时,你只会得到列表
df=pd.read_json()
使用将它们转换为numpy数组
df['b']=[np.array(v) for v in df['b']]
下面的代码应该可以工作。我用了另一个问题来解决它,这里有更多的解释:将带括号的字符串转换为numpy数组
import pandas as pd
import numpy as np
from ast import literal_eval
# Recreating DataFrame
data = np.array([0, 1, 0, 1, 85, 1, 52, 0, 5, 1, 0, 0, 0, 1, 41, 0, 85, 1, 0, 21], dtype='float')
data = data.reshape((5,2,2))
write_df = pd.DataFrame({'A': [1.0,2.0,3.0,1.0,2.0],
'B': [0,0,0,3+1/3,3+1/3],
'C': data.tolist()})
# Saving DataFrame to CSV
fpath = 'D:\Data\test.csv'
write_df.to_csv(fpath)
# Reading DataFrame from CSV
read_df = pd.read_csv(fpath)
# literal_eval converts the string to a list of tuples
# np.array can convert this list of tuples directly into an array
def makeArray(rawdata):
string = literal_eval(rawdata)
return np.array(string)
# Applying the function row-wise, there could be a more efficient way
read_df['C'] = read_df['C'].apply(lambda x: makeArray(x))
这里有一个丑陋的解决方案。
import pandas as pd
import numpy as np
### Create dataframe
a = [1.0, 2.0, 3.0, 1.0, 2.0]
b = [0.000000,0.000000,0.000000,3.333333,3.333333]
c = [np.array([[0. ,1.],[0. ,1.]]),
np.array([[85. ,1.2],[52. ,0.]]),
np.array([[5. ,1.],[0. ,0.]]),
np.array([[0. ,1.],[41. ,0.]]),
np.array([[85. ,1.],[0. ,21.]]),]
df = pd.DataFrame({"a":a,"b":b,"c":c})
#### Save to csv
df.to_csv("to_trash.csv")
df = pd.read_csv("to_trash.csv")
### Bad string manipulation that could be done better with regex
df["c"] = ("np.array("+(df
.c
.str.split()
.str.join(' ')
.str.replace(" ",",")
.str.replace(",,",",")
.str.replace("[,", "[", regex=False)
)+")").apply(lambda x: eval(x))
我找到的最好的解决方案是使用Pickle文件。
您可以将数据帧保存为pickle文件。
import pickle
img = cv2.imread('img1.jpg')
data = pd.DataFrame({'img':img})
data.to_pickle('dataset.pkl')
然后你可以阅读是作为泡菜文件:
with (open(ref_path + 'dataset.pkl', "rb")) as openfile:
df_file = pickle.load(openfile)
如果有效,请告诉我。