r语言 - 解析回'messy' API 结构



我通过API从在线数据库(REDCap)中获取数据,数据以逗号分隔的字符串形式传递,如下所示,

RAW.API <- structure("id,event_arm,name,dob,pushed_text,pushed_calc,completen"01","event_1_arm_1","John","1979-05-01","","",2n"01","event_2_arm_1","John","2012-09-02","abc","123",1n"01","event_3_arm_1","John","2012-09-10","","",2n"02","event_1_arm_1","Mary","1951-09-10","def","456",2n"02","event_2_arm_1","Mary","1978-09-12","","",2n", "`Content-Type`" = structure(c("text/html", "utf-8"), .Names = c("", "charset")))

我有一个很好地将其解析为数据帧的脚本

(df <- read.table(file = textConnection(RAW.API), header = TRUE, 
sep = ",", na.strings = "", stringsAsFactors = FALSE))
  id     event_arm name        dob pushed_text pushed_calc complete
1  1 event_1_arm_1 John 1979-05-01        <NA>          NA        2
2  1 event_2_arm_1 John 2012-09-02         abc         123        1
3  1 event_3_arm_1 John 2012-09-10        <NA>          NA        2
4  2 event_1_arm_1 Mary 1951-09-10         def         456        2
5  2 event_2_arm_1 Mary 1978-09-12        <NA>          NA        2

然后我做了一些计算,并将它们写入pushed_textpushed_calc,然后我需要将数据格式化为混乱的逗号分隔结构

我想是这样的,

API.back <- `some magic command`(df, ...)
identical(RAW.API, API.back)
[1] TRUE

一些命令可以将我制作的数据帧df中的数据格式化为原始API-对象所在的结构RAW.API

如有任何帮助,我们将不胜感激。

这似乎有效:

some_magic <- function(df) {
    ## Replace NA with "", converting column types as needed
    df[] <- lapply(df, function(X) {
                if(any(is.na(X))) {X[is.na(X)] <- ""; X} else {X}
            })
    ## Print integers in first column as 2-digit character strings
    ## (DO NOTE: Hardwiring the number of printed digits here is probably
    ## inadvisable, though needed to _exactly_ reconstitute RAW.API.) 
    df[[1]] <- sprintf("%02.0f", df[[1]])
    ## Separately build header and table body, then suture them together 
    l1 <- paste(names(df), collapse=",")
    l2 <- capture.output(write.table(df, sep=",", col.names=FALSE, 
                                     row.names=FALSE))
    out <- paste0(c(l1, l2, ""), collapse="n")
    ## Reattach attributes
    att <- list("`Content-Type`" = structure(c("text/html", "utf-8"), 
                .Names = c("", "charset")))
    attributes(out) <- att
    out
}
identical(some_magic(df), RAW.API)
# [1] TRUE

相关内容

  • 没有找到相关文章

最新更新