我正在处理几个字符串,如下所示
Col1
--------------------------
554 - partial-completion_3
4011 - structure painted
5459 - 1 int mam-corrosion issue
996 - cast iron countershock
我的目标是将这些字符串分为两个部分
Col1_Id Col2_Desc
--------------------------
554 partial-completion_3
4011 structure painted
5459 1 int mam-corrosion issue
996 cast iron countershock
我尝试使用seperate
函数
df_sep = df %>%
separate(Col1, c("Col1_ID", "Col2_Desc"), "-")
仅在字符串中只有一个 - 的情况下,如果有两个 - 例如在字符串中
`5459 - 1 int mam-corrosion issue`
然后独立函数在第二个 - 之后删除描述,并且输出看起来像
`5459 - 1 int mam`
这不是我所期望的。我期望像这样的输出
Col1_Id Col2_Desc
--------------------------
554 partial-completion_3
4011 structure painted
5459 1 int mam-corrosion issue
996 cast iron countershock
任何提示或建议都非常感谢。
我们可以使用sub
用,
替换第一个-
,然后使用read.csv
read.csv(text= sub("-", ",", df1$Col1), header=FALSE,
col.names=c("Col1_Id", "Col2_Desc"), stringsAsFactors=FALSE)
# Col1_Id Col2_Desc
#1 554 partial-completion_3
#2 4011 structure painted
#3 5459 1 int mam-corrosion issue
#4 996 cast iron countershock
在separate
的情况下,有一个extra
参数,可用于整理此问题
library(tidyr)
separate(df1, Col1, into = c("Col1_Id", "Col2_Desc"), extra="merge")
# Col1_Id Col2_Desc
#1 554 partial-completion_3
#2 4011 structure painted
#3 5459 1 int mam-corrosion issue
#4 996 cast iron countershock
数据
df1 <- structure(list(Col1 = c("554 - partial-completion_3", "4011 - structure painted",
"5459 - 1 int mam-corrosion issue", "996 - cast iron countershock"
)), .Names = "Col1", class = "data.frame", row.names = c(NA,
-4L))
一个基本r替代方案是 strsplit
,将列分为列表,然后使用 rbind.data.frame
构造data.frame。SetNames
用于方便地在同一行中设置名称。
setNames(do.call(rbind.data.frame, strsplit(df1$Col1, split=" - ")),
c("Col1_Id", "Col2_Desc"))
Col1_Id Col2_Desc
1 554 partial-completion_3
2 4011 structure painted
3 5459 1 int mam-corrosion issue
4 996 cast iron countershock