r-如何将此字符串划分为多列

  • 本文关键字:划分 字符串 r string
  • 更新时间 :
  • 英文 :


我有这个字符串,我需要将它拆分成不同的列

legend = "Frequency..Derivatives.measure...Derivatives.instrument...Derivatives.risk.category...Derivatives.reporting.country...Derivatives.counterparty.sector...Derivatives.counterparty.country...Derivatives.underlying.risk.sector...Derivatives.currency.leg.1...Derivatives.currency.leg.2...Derivatives.maturity...Derivatives.rating...Derivatives.execution.method...Derivatives.basis...Period..30.06.1998.31.12.1998.30.06.1999.31.12.1999.30.06.2000.31.12.2000.30.06.2001.31.12.2001.30.06.2002.31.12.2002.30.06.2003.31.12.2003.30.06.2004.31.12.2004.30.06.2005.31.12.2005.30.06.2006.31.12.2006.30.06.2007.31.12.2007.30.06.2008.31.12.2008.30.06.2009.31.12.2009.30.06.2010.31.12.2010.30.06.2011.31.12.2011.30.06.2012.31.12.2012.30.06.2013.31.12.2013.30.06.2014.31.12.2014.30.06.2015.31.12.2015.30.06.2016.31.12.2016.30.06.2017.31.12.2017.30.06.2018.31.12.2018.30.06.2019"

每三点应该有一个新的列,直到单词perdiod。注意,第一个单词Frequency与第二个单词Derivatives.measure仅除以两点而不是三个

之后,有一系列日期(间隔6个月(,它们应该以这种方式划分:"每次有一个4位数的数字执行拆分"。

我该怎么做?谢谢

我们可以使用strsplit...fixed = TRUE处拆分为向量的list,然后使用rbind向量创建data.frame

df1 <- do.call(rbind.data.frame, strsplit(legend, "...", fixed = TRUE))
names(df1) <- paste0("V", seq_along(df1))

如果我们还需要包括分割"周期"的最后一个条件

library(dplyr)
library(tidyr)
library(stringr)
library(data.table)
tibble(col = legend) %>% 
mutate(rn = row_number()) %>% 
separate_rows(col, sep= "[.]{3}") %>%
mutate(rn2 = str_c("V", rowid(rn))) %>%
pivot_wider(names_from = rn2, values_from = col) %>% 
rename_at(ncol(.), ~ "Period") %>% 
mutate(Period = str_remove(Period, "Period\.+")) %>% 
separate_rows(Period, sep="(?<=\.[0-9]{4})\.")

最新更新