我有一个数据清理问题。数据集的简短版本示例如下:
df <- data.frame(Text1 = c(NA, "Name","Jack","Jack"),
Text2 = c(NA, "District","Leon","Leon"),
Text3 = c(NA, "School","Black","Black"),
Text4 = c(NA, "Grade",1,1),
Scoring = c("Text", "Teacher","Ana","Ana"),
Item1 = c("11_ada", "/a/","1.0","0.0"),
Text7 = c(NA, "/c/","1.0","1.0"),
Text8 = c(NA, "/a/","1.0","1.0"),
Item2 = c("10_bada", "/x/","1.0","0.0"),
Text10 = c(NA, "/y/","0.0","1.0"),
Text11 = c(NA, "/z/","1.0","0.0"))
> df
Text1 Text2 Text3 Text4 Scoring Item1 Text7 Text8 Item2 Text10 Text11
1 <NA> <NA> <NA> <NA> Text 11_ada <NA> <NA> 10_bada <NA> <NA>
2 Name District School Grade Teacher /a/ /c/ /a/ /x/ /y/ /z/
3 Jack Leon Black 1 Ana 1.0 1.0 1.0 1.0 0.0 1.0
4 Jack Leon Black 1 Ana 0.0 1.0 1.0 0.0 1.0 0.0
数据集的第一部分包含人口统计信息。它还具有具有correct (1.0)
和incorrect (0.0)
响应的项目信息。我正在从excel文件中获取这些信息。我需要重组前两行,如下所示。特别是对于Item信息,我需要将它们重命名为1.1 1.2 1.3 2.1 2.2 2.3
。这只是数据集的一部分。每个Item
都有多个列,并且有许多项。
如何按如下方式清理此数据集?
df1
Name District School Grade Teacher 1.1 1.2 1.3 2.1 2.2 2.3
Jack Leon Black 1 Ana 1.0 1.0 1.0 1.0 0.0 1.0
Jack Leon Black 1 Ana 0.0 1.0 1.0 0.0 1.0 0.0
此外,如何获得第二个所需的数据集?
df2
Name District School Grade Teacher 1.1 1.2 1.3 2.1 2.2 2.3
NA NA NA NA NA 11_ada 11_ada 11_ada 10_bada 10_bada 10_bada
Jack Leon Black 1 Ana 1.0 1.0 1.0 1.0 0.0 1.0
Jack Leon Black 1 Ana 0.0 1.0 1.0 0.0 1.0 0.0
library(janitor)
library(dplyr)
i1 <- cumsum(grepl("Item", names(df)))
i2 <- match(1, i1)
nm1 <- make.unique(as.character(i1[i1 > 0]))
row_to_names(df, 2) %>%
setNames(make.unique(names(.))) %>%
rename_with(~nm1, i2:last_col())
-输出
Name District School Grade Teacher 1 1.1 1.2 2 2.1 2.2
3 Jack Leon Black 1 Ana 1.0 1.0 1.0 1.0 0.0 1.0
4 Jack Leon Black 1 Ana 0.0 1.0 1.0 0.0 1.0 0.0
对于第二种情况
i1 <- cumsum(grepl("Item", names(df)))
i2 <- match(1, i1)
nm1 <- make.unique(as.character(i1[i1 > 0]))
i3 <- match("Name", df$Text1)
df[i3, i2:ncol(df)] <- nm1
names(df) <- unlist(df[i3,])
df <- df[-i3,]
i4 <- is.na(df$Name)
df[i4, i2:ncol(df)] <- t(apply(df[i4, i2:ncol(df)], 1,
function(x) x[!is.na(x)][cumsum(!is.na(x))]))
-输出
> df
Name District School Grade Teacher 1 1.1 1.2 2 2.1 2.2
1 <NA> <NA> <NA> <NA> Text 11_ada 11_ada 11_ada 10_bada 10_bada 10_bada
3 Jack Leon Black 1 Ana 1.0 1.0 1.0 1.0 0.0 1.0
4 Jack Leon Black 1 Ana 0.0 1.0 1.0 0.0 1.0 0.0