如何通过附加异构列值来创建数据框架



我想创建一个像2列和几行的dataframe

[
['text1',[float1, float2, float3]]
['text2',[float4, float5, float6]]
.
.
.
]

列名应为contentembeddingstext1text2content列,浮点数列表在embeddings列。

我写的代码是
mycontent = ["i live in space","i live my life to fullest", "dogs live in kennel","we live to eat and not eat to live","cricket lives in heart of every indian","live and let live","my house is in someplace","my office is in someotherplace","chair is red"]
contents_and_embeddings_df = pd.DataFrame(columns=['content','embeddings'])
for content in mycontent:
embedding = get_embedding(content,engine='textsearchcuriedoc001mc') #returns list of floats
contents_and_embeddings_df.append(pd.DataFrame([content,embedding]))

contents_and_embeddings_df
在输出中,我得到contents_and_embeddings_df.append(pd.DataFrame([content,embedding])) /tmp/ipykernel_15879/3971327095.py:8: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. contents_and_embeddings_df.append(pd.DataFrame([content,embedding]))

的几个警告。数据框的内容为空。我只看到两个头-content embeddings

我尝试了一些其他的方法,但无法创建所需的dataframe

for content in mycontent:
embedding = get_embedding(content,engine='textsearchcuriedoc001mc')
#pd.concat(contents_and_embeddings_df,pd.DataFrame([content,embedding])) --> doesn't work
#contents_and_embeddings_df.append(pd.DataFrame([content,embedding])) --> doesn't work
tempdf = pd.DataFrame([content,embedding]) #doesn't work.
#    tempdf = pd.DataFrame([content,embedding], columns=['content','embeddings']) --> doesn't compile
contents_and_embeddings_df.add(tempdf) # doesn't work. 

contents_and_embeddings_df #shows empty

不支持使用DataFrame.append。这是一种低效的方法,因为它分配新的内存来存储生成的数据帧。它已在pandas 2.0中删除。

在将这两列组合成一个数据框之前,最好分别构造它们:

embeddings = [get_embedding(content, engine="textsearchcuriedoc001mc") for content in mycontent]
contents_and_embeddings_df = pd.DataFrame({
"content": mycontent,
"embeddings": embeddings
})

您可以简单地使用列表推导式生成数据框架数据:

data = [(content, get_embedding(content,engine='textsearchcuriedoc001mc')) for content in mycontent]
contents_and_embeddings_df = pd.DataFrame(data, columns=['content','embeddings'])

相关内容

  • 没有找到相关文章

最新更新