r-如何用空白换行符解析CSV数据块到数据帧列表中



我有下面的文本文件,它包含了几个表块。每个区块由空白分隔。

GENERALIZED BORN:
Complex Energy Terms
Frame #,BOND,ANGLE
0,6603.0521,7264
1,7434.9885,7602
Receptor Energy Terms
Frame #,BOND,ANGLE
0,6140.6338,5383.1241
1,6885.2965,5653.6637
Ligand Energy Terms
Frame #,BOND,ANGLE
0,462.4183,1881.428
1,549.692,1949.0482

我如何使用R将这个单一的文本文件解析为三个数据帧或tibble的列表?

我尝试过,但失败了:

library(readr)
readr::read_lines_chunked("myfile.txt", skip =1, chunk_size = 4)

因为readr::read_lines_chunked无法识别块之间的空白分隔符。

我不清楚是否应该这样使用该函数。我的猜测不是。但是您可以手动解析数据。创建一个列表,解析出块并将其保存到列表中。

xy <- readLines(con = "test.txt")
xy <- xy[-1]  # remove GENERALIZED BORN
xy <- xy[which(xy != "")]
# Start of a break is needed for names and subsetting in a loop.
breaks <- which(grepl("^.*Energy Terms", x = xy))
dfs <- vector(mode = "list", length = length(breaks))
names(dfs) <- xy[breaks]
# Adding one accounts for the Energy Terms line. It's either here
# or in the loop.
chunks <- breaks + 1
for (chunk in seq_along(chunks)) {
# If we extract the name and use it to subset, the order of the
# dfs doesn't really matter.
chk.name <- xy[chunks[chunk] - 1]

from <- chunks[chunk]
to <- chunks[chunk + 1] - 2

# When working with the last chunk, this sets the end of the text.
if (is.na(to)) {
to <- length(xy)
}

chk <- xy[from:to]
tmp <- read.table(
text =  paste(chk, collapse = "n"), 
header = TRUE, 
comment.char = "", 
sep = ","
)

dfs[[chk.name]] <- tmp
}

结果

$`Complex Energy Terms`
Frame..     BOND ANGLE
1       0 6603.052  7264
2       1 7434.989  7602
$`Receptor Energy Terms`
Frame..     BOND    ANGLE
1       0 6140.634 5383.124
2       1 6885.297 5653.664
$`Ligand Energy Terms`
Frame..     BOND    ANGLE
1       0 462.4183 1881.428
2       1 549.6920 1949.048

最新更新