如何将包含分号分隔列表的向量转换为存在/不存在矩阵?



我有一个向量,向量的每个元素都包含一个字符串,该字符串由分号和/或逗号分隔的属性列表组成。 我想做的是获取该向量并将其转换为列表中每个属性的存在/不存在矩阵。

到目前为止,我采取的方法是首先抓取向量中的所有分号分隔元素,如下所示:

OrientationList <- c(NULL)
for (i in levels(stroller_attributes$Orientation))
{ OrientationList <- paste(OrientationList, ",", i)}
OrientationList <- unique(gsub("^[[:space:]]|[[:space:]]$", "", unlist(strsplit(OrientationList, split=";|,"))))

这为我提供了向量中包含的所有属性的列表。 但是现在我想做的是创建一个具有长度(方向列表)列和行(stroller_attributes)行的新矩阵,我这样做

OrientationFactorsMatrix <- matrix(ncol=length(OrientationList), nrow=nrow(stroller_attributes))
colnames(OrientationFactorsMatrix) <- OrientationList

接下来,我需要继续浏览原始向量,stroller_attributes$Orientation并确定每个元素中包含哪些元素,然后在 OrientationFactorsMatrix 中使用 TRUE 或 FALSE 值指示该元素的存在或不存在。 我最初的直觉是做类似的事情

OrientationList %in% stroller_attributes$Orientation[16],它将自动生成矩阵中每个元素的存在/不存在值(万岁!),不幸的是,因为如果元素在逗号/分号分隔的列表中包含两个不同的项目,它会返回FALSE。 从本质上讲,我想进行%in%检查,但要执行"这是否包含该术语"而不是"它是否仅包含该术语"。

我将不胜感激任何帮助。 曲头钉

structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 4L, 4L, 4L, 4L, 4L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 12L, 
2L, 2L, 2L, 2L, 2L, 2L, 12L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 12L, 2L, 2L, 12L, 2L, 21L, 21L, 23L, 22L, 17L, 17L, 17L, 
16L, 1L, 1L, 1L, 24L, 11L, 11L, 2L, 1L, 2L, 2L, 2L, 19L, 12L, 
17L, 17L, 19L, 19L, 17L, 17L, 21L, 17L, 1L, 17L, 1L, 1L, 2L, 
9L, 2L, 2L, 2L, 1L, 1L, 25L, 25L, 25L, 25L, 25L, 25L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 25L, 
13L, 2L, 25L, 1L, 26L, 2L, 25L, 25L, 13L, 2L, 2L, 1L, 25L, 25L, 
25L, 25L, 25L, 2L, 18L, 18L, 18L, 18L, 13L, 21L, 2L, 13L, 1L, 
6L, 1L, 1L, 2L, 1L, 2L, 12L, 2L, 12L, 12L, 12L, 2L, 2L, 10L, 
10L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 12L, 
2L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 2L, 12L, 12L, 2L, 
12L, 12L, 12L, 2L, 2L, 2L, 2L, 12L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 25L, 25L, 25L, 25L, 25L, 25L, 2L, 8L, 
14L, 14L, 14L, 8L, 8L, 7L, 8L, 15L, 15L, 8L, 8L, 8L, 15L, 14L, 
8L, 2L, 5L, 5L, 5L, 2L, 2L, 24L, 24L, 13L, 13L, 13L, 13L, 20L, 
20L, 20L, 20L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "Forward Facing", 
"Forward Facing ", "Forward Facing, Parent Facing", "Forward Facing; Full lie flat", 
"Forward Facing; Infant Car Seat", "Forward facing; Lie flat", 
"Forward Facing; Lie Flat", "Forward Facing; Lie flat option for Infants", 
"Forward Facing; Lie Flat; 2 Children Forward-Facing; 2 Children 1x Forward Facing, 1x Lie Flat; 2 Children 1x Forward Facing, 1x Parent Facing (Infant Car Seat); 1x Parent Facing (Infant Car Seat)", 
"Forward Facing; Lie-Flat Configuration For Newborns", "Forward Facing; Parent Facing", 
"Forward Facing; Parent Facing; Lie Flat", "Forward Facing; Parent Facing; Lie Flat On Buggy; Lie Flat Off Buggy", 
"Forward Facing; Parent Facing; Recline", "Forward Facing; Rear Facing; Lie Flat", 
"Lie Flat; Forward Facing", "Lie Flat; Forward Facing; Parent Facing", 
"Lie Flat; Forward Facing; Travel System", "Lie Flat; Forward-Facing", 
"Lie Flat; Parent Facing; Forward Facing", "Lie Flat; Travel System; Forward Facing; Second Seat", 
"Lie Flat; Travel System; Forward Facing; Second Seat; Parent Facing", 
"Off Stroller Bassinet; Forward Facing; Parent Facing; Lie Flat", 
"Reversible Seat", "Travel System; Forward Facing; Second Seat; Parent Facing"
), class = "factor")

好的,通常情况下,详细地写出问题有助于我找出自己问题的答案。 这是解决方案

for (i in 1:nrow(stroller_attributes))
{
result <- gsub("[[:space:]]", "", tolower(OrientationList)) %in% tolower(gsub("[[:space:]]", "", unlist(strsplit(as.character(stroller_attributes$Orientation[i]), split=",|;"))))
OrientationFactorsMatrix[i, ] <- result
}  

它的关键部分是我必须将原始向量中的逗号/分号分隔列表转换为带有征兵的项目向量。然后我通过删除所有空格来清理它,并将其转换为小写。 我对 OrientationList 的内容执行相同的基本操作,然后 %in% 运算符创建我想要的输出。

最新更新