我有一个大数据集,其中每5列对应1条动脉的测量值,但只有5列中的第一列被命名。一个例子:
df <- structure(list("agatston", "area", "volume", "density", "mass",
"agatston", "area", "volume", "density", "mass", "agatston",
"area", "volume", "density", "mass"),
.Names = c("Artery_1", NA, NA, NA, NA, "Artery_2", NA, NA, NA, NA, "Artery_3", NA, NA, NA, NA),
row.names = c(NA, -1L),
class = c("tbl_df", "tbl", "data.frame"))
它看起来像这样:
df
# A tibble: 1 x 15
Artery_1 `` `` `` `` Artery_2 `` `` `` `` Artery_3 `` `` `` ``
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 agatston area volume density mass agatston area volume density mass agatston area volume density mass
我试图使用for循环或apply()来获取用最近的非缺失列名填充的缺失列名。我想要实现的是这样的:
# A tibble: 1 x 15
Artery_1 Artery_1 Artery_1 Artery_1 Artery_1 Artery_2 Artery_2 Artery_2 Artery_2 Artery_2 Artery_3 Artery_3 Artery_3 Artery_3 Artery_3
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 agatston area volume density mass agatston area volume density mass agatston area volume density mass
帮忙吗?
编辑:下一步,我想通过将列与其下行的名称组合,使列成为non_unique,从而产生以下输出:
# A tibble: 1 x 15
Artery_1_agatson Artery_1_area Artery_1_volume Artery_1_density Artery_1_mass Artery_2_agatson Artery_2_area Artery_2_volume
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 agatston area volume density mass agatston area volume
# ... with 7 more variables: Artery_2_density <chr>, Artery_2_mass <chr>, Artery_3_agatson <chr>, Artery_3_area <chr>,
# Artery_3_volume <chr>, Artery_3_density <chr>, Artery_3_mass <chr>
可以用zoo::na.locf
代替NA
的值
names(df) <- zoo::na.locf(names(df))
names(df)
# [1] "Artery_1" "Artery_1" "Artery_1" "Artery_1" "Artery_1" "Artery_2"
# [7] "Artery_2" "Artery_2" "Artery_2" "Artery_2" "Artery_3" "Artery_3"
#[13] "Artery_3" "Artery_3" "Artery_3"```
但是,具有相同的列名并不是一个好的做法,因此您可以使用make.unique
来使列名唯一。
names(df) <- make.unique(zoo::na.locf(names(df)))
names(df)
# [1] "Artery_1" "Artery_1.1" "Artery_1.2" "Artery_1.3" "Artery_1.4"
# [6] "Artery_2" "Artery_2.1" "Artery_2.2" "Artery_2.3" "Artery_2.4"
#[11] "Artery_3" "Artery_3.1" "Artery_3.2" "Artery_3.3" "Artery_3.4"
要与第一行合并列,可以使用
names(df) <- paste(zoo::na.locf(names(df)), df[1, ], sep = '_')
names(df)
# [1] "Artery_1_agatston" "Artery_1_area" "Artery_1_volume"
# [4] "Artery_1_density" "Artery_1_mass" "Artery_2_agatston"
# [7] "Artery_2_area" "Artery_2_volume" "Artery_2_density"
#[10] "Artery_2_mass" "Artery_3_agatston" "Artery_3_area"
#[13] "Artery_3_volume" "Artery_3_density" "Artery_3_mass"
然后可能使用df <- df[-1, ]
删除第一行。
Base R解:
names(df) <- paste(
na.omit(names(df))[cumsum(!(is.na(names(df))))],
df[1,,drop=TRUE],
sep = "_"
)
或删除第一行:
clean_df <- setNames(
df[-1,],
paste(
na.omit(names(df))[cumsum(!(is.na(names(df))))],
df[1,,drop=TRUE],
sep = "_"
)
)