如何在Julia中将列添加到空的DataFrame中



我想将向量作为列附加到空的DataFrame。假设我定义了一个空的DataFrame,如下所示:

import DataFrames
dataframe = DataFrames.DataFrame()

然后我想把这个向量作为列附加到dataframe:

vec = [1,2,3]

我尝试了push!(dataframe , vec),但出现了以下错误:

DimensionMismatch("Length of `row` does not match `DataFrame` column count.")
Stacktrace:
[1] push!(df::DataFrames.DataFrame, row::Vector{Int64}; promote::Bool)
@ DataFrames C:UsersShayan.juliapackagesDataFramesBM4OQsrcdataframedataframe.jl:1691
[2] push!(df::DataFrames.DataFrame, row::Vector{Int64})
@ DataFrames C:UsersShayan.juliapackagesDataFramesBM4OQsrcdataframedataframe.jl:1680
[3] top-level scope
@ c:UsersShayanDocumentsPyJul ScriptsJul-test.ipynb:2
[4] eval
@ .boot.jl:373 [inlined]
[5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base .loading.jl:1196
[6] #invokelatest#2
@ .essentials.jl:716 [inlined]
[7] invokelatest
@ .essentials.jl:714 [inlined]
[8] (::VSCodeServer.var"#164#165"{VSCodeServer.NotebookRunCellArguments, String})()
@ VSCodeServer c:UsersShayan.vscodeextensionsjulialang.language-julia-1.6.17scriptspackagesVSCodeServersrcserve_notebook.jl:19
[9] withpath(f::VSCodeServer.var"#164#165"{VSCodeServer.NotebookRunCellArguments, String}, path::String)
@ VSCodeServer c:UsersShayan.vscodeextensionsjulialang.language-julia-1.6.17scriptspackagesVSCodeServersrcrepl.jl:184
[10] notebook_runcell_request(conn::VSCodeServer.JSONRPC.JSONRPCEndpoint{Base.PipeEndpoint, Base.PipeEndpoint}, params::VSCodeServer.NotebookRunCellArguments)
@ VSCodeServer c:UsersShayan.vscodeextensionsjulialang.language-julia-1.6.17scriptspackagesVSCodeServersrcserve_notebook.jl:13
[11] dispatch_msg(x::VSCodeServer.JSONRPC.JSONRPCEndpoint{Base.PipeEndpoint, Base.PipeEndpoint}, dispatcher::VSCodeServer.JSONRPC.MsgDispatcher, msg::Dict{String, Any})
@ VSCodeServer.JSONRPC c:UsersShayan.vscodeextensionsjulialang.language-julia-1.6.17scriptspackagesJSONRPCsrctyped.jl:67
[12] serve_notebook(pipename::String, outputchannel_logger::Base.CoreLogging.SimpleLogger; crashreporting_pipename::String)
@ VSCodeServer c:UsersShayan.vscodeextensionsjulialang.language-julia-1.6.17scriptspackagesVSCodeServersrcserve_notebook.jl:136
[13] top-level scope
@ c:UsersShayan.vscodeextensionsjulialang.language-julia-1.6.17scriptsnotebooknotebook.jl:32
[14] include(mod::Module, _path::String)
@ Base .Base.jl:418
[15] exec_options(opts::Base.JLOptions)
@ Base .client.jl:292
[16] _start()
@ Base .client.jl:495

此外,我尝试了insert!(dataframe , vec),但我得到了这个:

MethodError: no method matching insert!(::DataFrames.DataFrame, ::Vector{Int64})
Closest candidates are:
insert!(!Matched::DataStructures.AVLTree{K}, ::K) where K at C:UsersShayan.juliapackagesDataStructuresvSp4ssrcavl_tree.jl:128
insert!(!Matched::DataStructures.SortedSet, ::Any) at C:UsersShayan.juliapackagesDataStructuresvSp4ssrcsorted_set.jl:114
insert!(!Matched::DataStructures.SortedDict{K, D, Ord}, ::Any, !Matched::Any) where {K, D, Ord<:Base.Order.Ordering} at C:UsersShayan.juliapackagesDataStructuresvSp4ssrcsorted_dict.jl:268

我该怎么做?如有任何帮助,我们将不胜感激。

附加说明:vec不是在dataframe之前定义的,而是有意定义的!我的意思是,我必须先创建一个空的DataFrame!

根据您的需要,有以下选项。

  1. 添加矢量而不复制
julia> x = [1, 2, 3]
3-element Vector{Int64}:
1
2
3
julia> df = DataFrame()
0×0 DataFrame
julia> df.x = x
3-element Vector{Int64}:
1
2
3
julia> df.x === x
true

julia> x = [1, 2, 3]
3-element Vector{Int64}:
1
2
3
julia> df = DataFrame()
0×0 DataFrame
julia> df[!, :x] = x
3-element Vector{Int64}:
1
2
3
julia> df.x === x
true
  1. 添加带有复制的矢量
julia> x = [1, 2, 3]
3-element Vector{Int64}:
1
2
3
julia> df = DataFrame()
0×0 DataFrame
julia> df[:, :x] = x
3-element Vector{Int64}:
1
2
3
julia> df.x == x
true
julia> df.x === x
false
  1. 如果你有一个标量,你可以做(也适用于向量(
julia> df = DataFrame()
0×0 DataFrame
julia> insertcols!(df, :x => 1)
1×1 DataFrame
Row │ x
│ Int64
─────┼───────
1 │     1

您可以执行以下操作:

julia> r=DataFrame(:a=>rand(5),:b=>rand(5))
5×2 DataFrame
Row │ a         b        
│ Float64   Float64  
─────┼────────────────────
1 │ 0.8613    0.207534
2 │ 0.994096  0.561571
3 │ 0.220975  0.429286
4 │ 0.884805  0.835078
5 │ 0.964035  0.653509
julia> r[:,:c]=rand(5)
5-element Vector{Float64}:
0.5722614445699863
0.1582911302051686
0.14114436033460553
0.20981872218154363
0.07636493031324465
julia> r
5×3 DataFrame
Row │ a         b         c         
│ Float64   Float64   Float64   
─────┼───────────────────────────────
1 │ 0.8613    0.207534  0.572261
2 │ 0.994096  0.561571  0.158291
3 │ 0.220975  0.429286  0.141144
4 │ 0.884805  0.835078  0.209819
5 │ 0.964035  0.653509  0.0763649

nb:也可以从空数据帧开始工作:

julia> r=DataFrame()
0×0 DataFrame
julia> r[:,:c]=rand(5)
5-element Vector{Float64}:
0.6792303081607677
0.08094072339097869
0.5171831771259873
0.35343166177619845
0.44751700973394026
julia> r
5×1 DataFrame
Row │ c         
│ Float64   
─────┼───────────
1 │ 0.67923
2 │ 0.0809407
3 │ 0.517183
4 │ 0.353432
5 │ 0.447517

更新&摘要(使用BogumiłKamiński答案完成(

你可以做:

d[:,:colname] = x_vector # copy of x
d[!,:colname] = x_vector # no copy of x (shared)

如果x是标量,请参见BogumiłKamiński的答案。

最新更新