R如何分离基于字符串的空格,但保留小数点



你能帮我把一个字符串列通过保留小数分隔成7列吗?

这里有一个例子:

library(dplyr)
library(tidyr)
library(stringr)
x <- data.frame(Flat = c("2000M01  XZ ELDL U K EER 213.9", "2000M02  XY MLML O T RRE 255.6" , "2000M03  UY LEEE M P SSE 259.4" ))  
x %>% separate(Flat, c("A1","B2","C3","D4","E5","F6","Value")

输出为:

A1   B2  C3   D4 E5 F6 Value
1 2000M01 XZ ELDL  U  K EER   213
2 2000M02 XY MLML  O  T RRE   255
3 2000M03 UY LEEE  M  P SSE   259
Warning message:
Expected 7 pieces. Additional pieces discarded in 3 rows [1, 2, 3].

而不是所需的输出:

A1     B2 C3    D4 E5 F6   Value
1 2000M01 XZ ELDL  U  K EER   213.9
2 2000M02 XY MLML  O  T RRE   255.6
3 2000M03 UY LEEE  M  P SSE   259.4

我在单独的函数中尝试了许多"sep = "选项,但没有任何帮助。

谢谢你,

约翰·

sep参数作为separate中的空格传递。也可以使用convert = TRUEValue列自动更改为数字。

tidyr::separate(x, Flat, 
c("A1","B2","C3","D4","E5","F6","Value"), sep = '\s+', convert = TRUE)
#       A1 B2   C3 D4 E5  F6 Value
#1 2000M01 XZ ELDL  U  K EER 213.9
#2 2000M02 XY MLML  O  T RRE 255.6
#3 2000M03 UY LEEE  M  P SSE 259.4

我们可以用read.tablebase R中做到这一点

read.table(text = x$Flat, header = FALSE,
col.names =c("A1","B2","C3","D4","E5","F6","Value"))
A1 B2   C3 D4 E5  F6 Value
1 2000M01 XZ ELDL  U  K EER 213.9
2 2000M02 XY MLML  O  T RRE 255.6
3 2000M03 UY LEEE  M  P SSE 259.4

带有tstrsplitdata.table选项

type.convert(
setDT(x)[, setNames(
tstrsplit(Flat, "\s+"),
c("A1", "B2", "C3", "D4", "E5", "F6", "Value")
)],
as.is = TRUE
)

它给出

A1 B2   C3 D4 E5  F6 Value
1: 2000M01 XZ ELDL  U  K EER 213.9
2: 2000M02 XY MLML  O  T RRE 255.6
3: 2000M03 UY LEEE  M  P SSE 259.4

这当然不如亲爱的@akrun的那么精彩,但它也会起作用:

library(tidyr)
x %>%
extract(Flat, c("A1","B2","C3","D4","E5","F6","Value"), 
"(\d+M\d+)  ([[:upper:]]+) ([[:upper:]]+) ([[:upper:]]) ([[:upper:]]+) ([[:upper:]]+) (\d+\.\d)")
A1 B2   C3 D4 E5  F6 Value
1 2000M01 XZ ELDL  U  K EER 213.9
2 2000M02 XY MLML  O  T RRE 255.6
3 2000M03 UY LEEE  M  P SSE 259.4

data.table

library(data.table)
ans <- data.table()[,c("A1","B2","C3","D4","E5","F6","Value") := tstrsplit(x$Flat, "[ ]+")]
#         A1 B2   C3 D4 E5  F6 Value
# 1: 2000M01 XZ ELDL  U  K EER 213.9
# 2: 2000M02 XY MLML  O  T RRE 255.6
# 3: 2000M03 UY LEEE  M  P SSE 259.4

最新更新