如何从DelimitedFiles.readdlm()对象创建数据帧



我正在尝试创建DataFrame,如下所示:

[root@srvr0 ~]# julia
_
_       _ _(_)_     |  Documentation: https://docs.julialang.org
(_)     | (_) (_)    |
_ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` |  |
| | |_| | | | (_| |  |  Version 1.4.1 (2020-04-14)
_/ |__'_|_|_|__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using DataFrames
julia> using DelimitedFiles

julia> P,H = readdlm("programminglanguages.csv",',';header=true);
julia> P
73×2 Array{Any,2}:
1951  "Regional Assembly Language"
1952  "Autocode"
1954  "IPL"
1955  "FLOW-MATIC"
1957  "FORTRAN"
1957  "COMTRAN"
1958  "LISP"
1958  "ALGOL 58"
1959  "FACT"
1959  "COBOL"
1959  "RPG"
1962  "APL"
1962  "Simula"
1962  "SNOBOL"
1963  "CPL"
1964  "Speakeasy"
1964  "BASIC"
1964  "PL/I"
1966  "JOSS"
1967  "BCPL"
1968  "Logo"
1969  "B"
1970  "Pascal"
1970  "Forth"
⋮  
1995  "Ada 95"
1995  "Java"
1995  "Delphi "
1995  "JavaScript"
1995  "PHP"
1997  "Rebol"
2000  "ActionScript"
2001  "C#"
2001  "D"
2002  "Scratch"
2003  "Groovy"
2003  "Scala"
2005  "F#"
2006  "PowerShell"
2007  "Clojure"
2009  "Go"
2010  "Rust"
2011  "Dart"
2011  "Kotlin"
2011  "Red"
2011  "Elixir"
2012  "Julia"
2014  "Swift"
julia> H
1×2 Array{AbstractString,2}:
"year"  "language"
julia> typeof(P)
Array{Any,2}
julia> typeof(H)
Array{AbstractString,2}
julia> vec(H)
2-element Array{AbstractString,1}:
"year"
"language"
julia> typeof(vec(H))
Array{AbstractString,1}
julia> DataFrame(P, H)

但我得到了以下错误:

ERROR: MethodError: no method matching DataFrame(::Array{Any,2}, ::Array{AbstractString,2})
Closest candidates are:
DataFrame(::AbstractArray{T,2} where T) at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/dataframe/dataframe.jl:209
DataFrame(::AbstractArray{T,2} where T, ::AbstractArray{Symbol,1}; makeunique) at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/dataframe/dataframe.jl:209
DataFrame(::T; copycols) where T at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/other/tables.jl:23
Stacktrace:
[1] top-level scope at REPL[10]:1

更新1:参考Bogumils博士解决方案:

julia> DataFrame(P, vec(H))
ERROR: MethodError: no method matching DataFrame(::Array{Any,2}, ::Array{AbstractString,1})
Closest candidates are:
DataFrame(::AbstractArray{T,2} where T) at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/dataframe/dataframe.jl:209
DataFrame(::AbstractArray{T,2} where T, ::AbstractArray{Symbol,1}; makeunique) at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/dataframe/dataframe.jl:209
DataFrame(::T; copycols) where T at /opt/julia/julia-1.4.1/share/julia/stdlib/v1.4/packages/DataFrames/yH0f6/src/other/tables.jl:23
Stacktrace:
[1] top-level scope at REPL[13]:1
julia> 

请指导我从readdlm对象创建带有标头的Datafrome。

更新2:

我在试错法中得到了它:

julia> df1=DataFrame(P, Symbol.(vec(H)))
73×2 DataFrame
│ Row │ year │ language                   │
│     │ Any  │ Any                        │
├─────┼──────┼────────────────────────────┤
│ 1   │ 1951 │ Regional Assembly Language │
│ 2   │ 1952 │ Autocode                   │
│ 3   │ 1954 │ IPL                        │
│ 4   │ 1955 │ FLOW-MATIC                 │
│ 5   │ 1957 │ FORTRAN                    │
│ 6   │ 1957 │ COMTRAN                    │
│ 7   │ 1958 │ LISP                       │
│ 8   │ 1958 │ ALGOL 58                   │
│ 9   │ 1959 │ FACT                       │
│ 10  │ 1959 │ COBOL                      │
│ 11  │ 1959 │ RPG                        │
│ 12  │ 1962 │ APL                        │
│ 13  │ 1962 │ Simula                     │
│ 14  │ 1962 │ SNOBOL                     │
│ 15  │ 1963 │ CPL                        │
│ 16  │ 1964 │ Speakeasy                  │
│ 17  │ 1964 │ BASIC                      │
│ 18  │ 1964 │ PL/I                       │
│ 19  │ 1966 │ JOSS                       │
│ 20  │ 1967 │ BCPL                       │
│ 21  │ 1968 │ Logo                       │
⋮
│ 52  │ 1995 │ Java                       │
│ 53  │ 1995 │ Delphi                     │
│ 54  │ 1995 │ JavaScript                 │
│ 55  │ 1995 │ PHP                        │
│ 56  │ 1997 │ Rebol                      │
│ 57  │ 2000 │ ActionScript               │
│ 58  │ 2001 │ C#                         │
│ 59  │ 2001 │ D                          │
│ 60  │ 2002 │ Scratch                    │
│ 61  │ 2003 │ Groovy                     │
│ 62  │ 2003 │ Scala                      │
│ 63  │ 2005 │ F#                         │
│ 64  │ 2006 │ PowerShell                 │
│ 65  │ 2007 │ Clojure                    │
│ 66  │ 2009 │ Go                         │
│ 67  │ 2010 │ Rust                       │
│ 68  │ 2011 │ Dart                       │
│ 69  │ 2011 │ Kotlin                     │
│ 70  │ 2011 │ Red                        │
│ 71  │ 2011 │ Elixir                     │
│ 72  │ 2012 │ Julia                      │
│ 73  │ 2014 │ Swift                      │

这很难准确回答,但错误只是告诉您不能将两个矩阵传递给DataFrame构造函数。

DataFrame的可能构造函数可以在这里的文档中找到。最接近你可能想要的可能是

DataFrame(columns::AbstractVecOrMat, names::Union{AbstractVector, Symbol};
makeunique::Bool=false, copycols::Bool=true)

适用于您的用例(我正在创建一个随机的P和一个简单的向量H,这里有列名,因为我当然没有您的数据(:

julia> P = Any[rand() for i ∈ 1:3, j ∈ 1:3]
3×3 Matrix{Any}:
0.0413352  0.41672   0.266163
0.487072   0.308392  0.810582
0.470833   0.459017  0.165082
julia> H = string.('a':'c')
3-element Vector{String}:
"a"
"b"
"c"
julia> DataFrame(P, H)
3×3 DataFrame
Row │ a          b         c        
│ Any        Any       Any      
─────┼───────────────────────────────
1 │ 0.0413352  0.41672   0.266163
2 │ 0.487072   0.308392  0.810582
3 │ 0.470833   0.459017  0.165082

编辑:我也应该建议只使用优秀的CSV包——正如Bogumil在评论中指出的那样,你面临的问题是readdlm在矩阵中放置标题。有了CSV,你本可以做到:

using CSV, DataFrames
df = CSV.read("programminglanguages.csv", DataFrame)

相关内容

  • 没有找到相关文章

最新更新