我有下面的文本文件,它包含了几个表块。每个区块由空白分隔。
GENERALIZED BORN:
Complex Energy Terms
Frame #,BOND,ANGLE
0,6603.0521,7264
1,7434.9885,7602
Receptor Energy Terms
Frame #,BOND,ANGLE
0,6140.6338,5383.1241
1,6885.2965,5653.6637
Ligand Energy Terms
Frame #,BOND,ANGLE
0,462.4183,1881.428
1,549.692,1949.0482
我如何使用R将这个单一的文本文件解析为三个数据帧或tibble的列表?
我尝试过,但失败了:
library(readr)
readr::read_lines_chunked("myfile.txt", skip =1, chunk_size = 4)
因为readr::read_lines_chunked
无法识别块之间的空白分隔符。
我不清楚是否应该这样使用该函数。我的猜测不是。但是您可以手动解析数据。创建一个列表,解析出块并将其保存到列表中。
xy <- readLines(con = "test.txt")
xy <- xy[-1] # remove GENERALIZED BORN
xy <- xy[which(xy != "")]
# Start of a break is needed for names and subsetting in a loop.
breaks <- which(grepl("^.*Energy Terms", x = xy))
dfs <- vector(mode = "list", length = length(breaks))
names(dfs) <- xy[breaks]
# Adding one accounts for the Energy Terms line. It's either here
# or in the loop.
chunks <- breaks + 1
for (chunk in seq_along(chunks)) {
# If we extract the name and use it to subset, the order of the
# dfs doesn't really matter.
chk.name <- xy[chunks[chunk] - 1]
from <- chunks[chunk]
to <- chunks[chunk + 1] - 2
# When working with the last chunk, this sets the end of the text.
if (is.na(to)) {
to <- length(xy)
}
chk <- xy[from:to]
tmp <- read.table(
text = paste(chk, collapse = "n"),
header = TRUE,
comment.char = "",
sep = ","
)
dfs[[chk.name]] <- tmp
}
结果
$`Complex Energy Terms`
Frame.. BOND ANGLE
1 0 6603.052 7264
2 1 7434.989 7602
$`Receptor Energy Terms`
Frame.. BOND ANGLE
1 0 6140.634 5383.124
2 1 6885.297 5653.664
$`Ligand Energy Terms`
Frame.. BOND ANGLE
1 0 462.4183 1881.428
2 1 549.6920 1949.048