对于下面的数据帧,我想更改第一列以一个或多个单词开头的行的列名。这里是第二行和单词Company
。然而,行可以不同,如具有不同数据帧的第1、第5或第10行,字也可以不同,例如Investment
和其他。
structure(list(X1 = c("", "Company #", "Investments:"
), X2 = c("", "Type", ""), X3 = c("", "Reference",
""), X4 = c(NA_real_, NA_real_, NA_real_), X5= c("", "Footnotes",
""), X6 = c(NA_character_, NA_character_, NA_character_)), row.names = c(NA,
3L), class = "data.frame")
X1 X2 X3 X4 X5 X6
<chr> <chr> <chr> <dbl> <chr> <chr>
1 NA NA
2 Company # Type Reference NA Footnotes NA
3 Investments: NA NA
我想首先在第一列以单词开头时获得行号,然后使用该行号更改为列名,或者可能有更好的方法。
names(my_df)<- my_df[row_number,]
my_df <- my_df[-row_number,]
所需输出
Company # Type Reference NA Footnotes NA
<chr> <chr> <chr> <dbl> <chr> <chr>
3 Investments: NA NA
#row number of the first word in the first column
row_n <- min(which(nzchar(my_df[[1]])))
janitor::row_to_names(my_df, row_n)
输出
# Company # Type Reference NA Footnotes NA
#3 Investments: NA <NA>
请注意,如果这样做,您将拥有非唯一的列名(NA
)。您可以使用clean_names
快速解决此问题。
您可以尝试:
idx <- which(grepl('[^A-Za-z]', my_df$X1))[1]
colnames(my_df) <- my_df[idx, ]
my_df <- my_df[(idx + 1):nrow(my_df), ]
输出:
Company # Type Reference NA Footnotes NA
3 Investments: NA <NA>
这将检查是否有任何一行以字母开头,将第一个出现的行作为列名,并只保留后面的行
基本R:
colnames(df) <- df[2,]
df <- df[-2,]
df
Company # Type Reference NA Footnotes NA
1 NA <NA>
3 Investments: NA <NA>
您可以使用which
来获取具有以下字符串的行:
idx <- which("Company #" == my_df)
names(my_df) <- my_df[idx, ]
my_df <- my_df[-idx,]
my_df
#> Company # Type Reference NA Footnotes NA
#> 1 NA <NA>
#> 3 Investments: NA <NA>
创建于2023-01-05,reprex v2.0.2
您可以在第一列使用which
查找名称,然后使用names
分配给colnames,最后删除该行。所有这些都在基地R:
df <- structure(list(X1 = c("", "Company #", "Investments:"
), X2 = c("", "Type", ""), X3 = c("", "Reference",""), X4 = c(NA_real_, NA_real_, NA_real_), X5= c("", "Footnotes", ""),
X6 = c(NA_character_, NA_character_, NA_character_)), row.names = c(NA, 3L), class = "data.frame")
row_names <- which(nchar(data.frame(df)[, "X1"]) > 1)[1]
names(df) <- df[row_names, ]
df[-c(row_names),]
输出:
Company # Type Reference NA Footnotes NA
1 NA <NA>
3 Investments: NA <NA>
这在第一列中使用关键字来查找行,然后确保它与make.unique
没有重复名称,与replace
没有NA
s(字符或数字)。
key <- "Company #"
str <- as.character(dat[dat[,1] == key,][1,])
colnames(dat) <- make.unique(
replace(str, str %in% "NA" | is.na(str), "Missing"), sep="_")
结果
dat
Company # Type Reference Missing Footnotes Missing_1
1 NA <NA>
2 Company # Type Reference NA Footnotes <NA>
3 Investments: NA <NA>
如果应该拾取第一个非空单元格,请使用此
str <- as.character(dat[nchar(dat[,1]) != 0, ][1,])
colnames(dat) <- make.unique(
replace(str, str %in% "NA" | is.na(str), "Missing"), sep="_")