小贝子编程

R将具有不同分隔符的列名拆分为字符串，并为新数据帧分配唯一的字符串/字符串计数

本文关键字：字符串分配唯一数据帧拆分分隔符 r dataframe split strsplit
更新时间 : 2023-09-21
英文 : R split column names with different occurrences of delimiter into strings and assign unique strings/string counts to a new dataframe

我有一个很大的数据帧，其列名如下。我还没有尝试使用任何数据，只是使用列名。

菌株C_1_bacth2>菌株D_b_1_bacth1

菌株a_1_batch1		菌株b_1_bacth1	菌株C_2_bacth2		菌株D_a_1_bacth1

我认为如果你在"下划线、数字、下划线"；它为您的上述陈述提供了一个解决方案。这确实消除了数字和相关信息。这有关系吗？

names <- c("strainA_1_batch1", "strainA_2_batch2", "strainB_1_batch1", "strainC_1_batch2", "strainC_2_batch2", 
"strainD_a_1_batch1", "strainD_b_1_batch1")
#split at the underscore, digit and underscore 
splitList <- strsplit(names, "_\d_")
#convert to dataframe
df <-data.frame(t(as.data.frame.list(splitList)))
#clean up data.frame
rownames(df)<-NULL
names(df)<-c("Strain", "Batch")
df
#report
table(df$Strain)
table(df$Batch)

另一种选择是将数字两侧的下划线替换为"0"(或其他角色(，然后在空间上拆分。

names<-gsub("_(\d)_", " \1 ", names)

R将具有不同分隔符的列名拆分为字符串，并为新数据帧分配唯一的字符串/字符串计数

相关内容

最新更新

热门标签：