DataFrames:没有与setindex匹配的方法!(:DataFrame,:元组{Float64,Float64}



当我试图在应用返回元组的函数的DataFrame中使用点运算符(逐元素操作(时,我会得到以下错误。

这是一个玩具示例,

df = DataFrame()
df[:, :x] = rand(5)
df[:, :y] = rand(5)
#Function that returns two values in the form of a tuple
add_minus_two(x,y) = (x-y,x+y)
df[:,"x+y"] = add_minus_two.(df[:,:x], df[:,:y])[2]
#Out > ERROR: MethodError: no method matching setindex!(::DataFrame, ::Tuple{Float64, Float64}, ::Colon, ::String)
#However removing the dot operator works fine
df[:,"x+y"] = add_minus_two(df[:,:x], df[:,:y])[2]
#Out > 5 x 3 DataFrame
#Furthermore if its just one argument either dot or not, works fine as well
add_two(x,y) = x+y
df[:, "x+y"] = add_two(df[:,:x], df[:,:y])
df[:, "x+y"] = add_two.(df[:,:x], df[:,:y])
#out > 5 x 3 DataFrame

原因是什么?我认为对于elementwise操作,您需要使用";点";操作人员

同样对于我的实际问题(当一个函数在元组中返回2个值时(,当不使用点运算符时,

ERROR: MethodError: no method matching compute_T(::Vector{Float64}, ::Vector{Float64})

使用点运算符给出

ERROR: MethodError: no method matching setindex!(::DataFrame, ::Tuple{Float64, Float64}, ::Colon, ::String)  

返回一个参数(类似于toy示例(也很好。

你知道我在这里做错了什么吗?

这不是DataFrames.jl的问题,而是Julia Base的工作原理。

我只关注RHS,因为LHS是无关的(RHS与DataFrames.jl无关(

首先,如何写出你想要的东西。初始化:

julia> using DataFrames
julia> df = DataFrame()
0×0 DataFrame
julia> df[:, :x] = rand(5)
5-element Vector{Float64}:
0.6146045473316457
0.6319531776216596
0.599267794937812
0.40864382019544965
0.3738682778395166
julia> df[:, :y] = rand(5)
5-element Vector{Float64}:
0.07891853567296825
0.2143545316544586
0.5943274462916335
0.2182702556068421
0.5810132720450707
julia> add_minus_two(x,y) = (x-y,x+y)
add_minus_two (generic function with 1 method)

现在你得到了:

julia> add_minus_two(df[:,:x], df[:,:y])
([0.5356860116586775, 0.417598645967201, 0.004940348646178538, 0.19037356458860755, -0.2071449942055541], [0.693523083004614, 0.8463077092761182, 1.1935952412294455, 0.6269140758022917, 0.9548815498845873])
julia> add_minus_two.(df[:,:x], df[:,:y])
5-element Vector{Tuple{Float64, Float64}}:
(0.5356860116586775, 0.693523083004614)
(0.417598645967201, 0.8463077092761182)
(0.004940348646178538, 1.1935952412294455)
(0.19037356458860755, 0.6269140758022917)
(-0.2071449942055541, 0.9548815498845873)
julia> add_minus_two(df[:,:x], df[:,:y])[2]
5-element Vector{Float64}:
0.693523083004614
0.8463077092761182
1.1935952412294455
0.6269140758022917
0.9548815498845873
julia> add_minus_two.(df[:,:x], df[:,:y])[2]
(0.417598645967201, 0.8463077092761182)
julia> getindex.(add_minus_two.(df[:,:x], df[:,:y]), 2) # this is probably what you want
5-element Vector{Float64}:
0.693523083004614
0.8463077092761182
1.1935952412294455
0.6269140758022917
0.9548815498845873

现在的重点是,当你写:

df[:,"x+y"] = whatever_you_pass

whatever_you_pass部分必须是具有适当列数的AbstractVector。这意味着将起作用的是:

  • add_minus_two.(df[:,:x], df[:,:y])
  • add_minus_two(df[:,:x], df[:,:y])[2]
  • getindex.(add_minus_two.(df[:,:x], df[:,:y]), 2)

并且将失败的是(在这些情况下产生Tuple而不是AbstractVector(

  • add_minus_two(df[:,:x], df[:,:y])
  • add_minus_two.(df[:,:x], df[:,:y])[2]

从可用语法中选择一个即可。

一般建议是,在执行任务时,始终单独检查RHS,并分析其是否具有正确的结构。

此外,值得注意的是,这将起作用:

julia> transform(df, [:x, :y] => ByRow(add_minus_two) => ["x-y", "x+y"])
5×4 DataFrame
Row │ x         y          x-y          x+y
│ Float64   Float64    Float64      Float64
─────┼────────────────────────────────────────────
1 │ 0.614605  0.0789185   0.535686    0.693523
2 │ 0.631953  0.214355    0.417599    0.846308
3 │ 0.599268  0.594327    0.00494035  1.1936
4 │ 0.408644  0.21827     0.190374    0.626914
5 │ 0.373868  0.581013   -0.207145    0.954882

(你还没有问过,但也许这就是你真正想要的——与setindex!相反,这种语法是DataFrames.jl特定的(

最新更新