我正在尝试将数据框中的所有数值四舍五入。
问题是我的数据框还包括字符串,而不仅仅是在任何特定的列或行中。我想避免编写一个循环,在循环中我遍历每个单独的行列单元格对,并在舍入之前检查该值是否为数字。
是否有一个函数(或函数组合)可以让我实现这一点?
到目前为止,我已经尝试了 lambda 的round_df()
和各种lapply()
和apply()
组合。但是,我只根据列中的第一个值四舍五入(即,如果第一个值是数字,它将整个列视为数字并且仅对其进行四舍五入)。
我遇到了问题,第一个值是一个字符串,因此整个列不四舍五入,反之亦然,其中我的代码出错,因为它试图舍入一个字符串。
我的函数是:
library(readxl)
library(knitr)
library(gplots)
library(doBy)
library(dplyr)
library(plyr)
library(printr)
library(xtable)
library(gmodels)
library(survival)
library(pander)
library(psych)
library(questionr)
library(DT)
library(data.table)
library(expss)
library(xtable)
options(xtable.floating = FALSE)
options(xtable.timestamp = "")
library(kableExtra)
library(magrittr)
library(Hmisc)
library(forestmangr)
library(summarytools)
library(gmodels)
library(stats)
summaryTable <- function(y, bygroup, digit,
title="", caption_heading="", caption="", freq.tab, y.label="",
y.names="", boxplot) {
if (freq.tab) {
m = multi.fun(y)
}
else if (!missing(bygroup)) {
m = data.frame(y.label = "")
m = merge(m, data.frame(describeBy(y, bygroup, mat = T)))
m = select(m, y.label, n, mean, sd, min, median, max)
}
else {
m = data.frame(y.label = "")
m = merge(m, data.frame(sumconti(y)))
}
if (!freq.tab) {
m$y.label = y.names
}
m = round_df(m, digit, "signif")
if (freq.tab) {
colnames(m) = c(y.label, "Frequency", "%")
}
else if (missing(freq.tab) | !freq.tab) {
colnames(m) = c(y.label, "n", "Mean", "Std", "Min", "Median", "Max")
}
if (!missing(boxplot)) {
if (boxplot) {
attach(m)
layout(matrix(c(1, 1, 2, 1)), 2, 1)
kable(m, align = "c", "latex", booktabs = T, caption=figTitle(x, title, y.label)) %>%
kable_styling(position = 'center',
latex_options = c("striped", "repeat_header", "hold_position")) %>%
footnote(general = caption, general_title = caption_heading, footnote_as_chunk = T,
title_format = c("italic", "underline"), threeparttable = T)
boxplot(y ~ bygroup, main = figTitle(y, title, y.label), names = y.names, ylab = title,
xlab = y.label, col = c("red", "blue", "orange", "pink",
"green", "purple", "grey", "yellow"), border = "black",
horizontal = F, varwidth = T)
}
}
kable(m,
align = "c",
"latex",
booktabs = T,
caption = figTitle(x, title, y.label)) %>%
kable_styling(position = 'center',
latex_options = c("striped", "repeat_header", "hold_position")) %>%
footnote(general = caption,
general_title = caption_heading,
footnote_as_chunk = T,
title_format = c("italic", "underline"),
threeparttable = T)
}
figTitle = function(x, title, y.label) {
if (y.label != "") {
paste("Summary of", title, "by", y.label)
}
else if (title != "") {
paste("Summary of", title)
}
else {
paste("")
}
}
这个问题不包括数据,所以我们并不真正知道问题到底是什么(请始终提供一个完整的最小可重现示例),但我们根据问题的两种可能性将答案分为两部分,并为每个部分提供了测试数据。 不使用任何包。
仅舍入数字
如果问题是您混合了数字和字符,并且只想对数字进行舍入,那么这里有几种方法。
1)计算哪些列是数字,给出逻辑向量ok
,然后对其进行四舍五入。我们使用内置的嘌呤霉素数据集作为示例。不使用任何包。
ok <- sapply(Puromycin, is.numeric)
replace(Puromycin, ok, round(Puromycin[ok], 1))
给:
conc rate state
1 0.0 76 treated
2 0.0 47 treated
3 0.1 97 treated
4 0.1 107 treated
5 0.1 123 treated
6 0.1 139 treated
...etc...
1a)如果您不介意覆盖输入,最后一行也可以这样写。
Puromycin[ok] <- round(Puromycin[ok], 1)
2)另一种方法是在lapply
中执行条件
Round <- function(x, k) if (is.numeric(x)) round(x, k) else x
replace(Puromycin, TRUE, lapply(Puromycin, Round, 1))
2a)或覆盖:
Puromycin[] <- lapply(Puromycin, Round, 1)
舍入所有内容
如果问题是所有列都应该是数字,但有些实际上是字符,尽管它们表示数字,则以指示的数据框为例,应用type.convert
。
# create test data having numeric, character and factor columns but
# all intended to represent numbers
DF <- structure(list(Time = c("0.1", "0.12", "0.3", "0.14", "0.5",
"0.7"), demand = c(0.83, 1.03, 1.9, 1.6, 1.56, 1.98), Time2 = structure(c(1L,
2L, 4L, 3L, 5L, 6L), .Label = c("0.1", "0.12", "0.14", "0.3",
"0.5", "0.7"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L))
round(replace(DF, TRUE, lapply(DF, type.convert)), 1)
要为上述选项添加最后一种可能性:
假设您有字符列,其中还包含(不仅)数字,但包含字符串格式。那么以下方法可能会有所帮助。
library(dplyr)
library(purrr)
# I use the data from above's answer with an additional mixed column
DF <- structure(
list(
Time = c("0.1", "0.12", "0.3", "0.14", "0.5",
"0.7"),
demand = c(0.83, 1.03, 1.9, 1.6, 1.56, 1.98),
Mix = c("3.38", "4.403", "a", "5.34", "c", "9.32"),
Time2 = structure(
c(1L,
2L, 4L, 3L, 5L, 6L),
.Label = c("0.1", "0.12", "0.14", "0.3",
"0.5", "0.7"),
class = "factor"
)
),
class = "data.frame",
row.names = c(NA,-6L)
)
TBL <- as_tibble(DF)
# This are the functions we use
round_string_number <- function(x) {
ifelse(!is.na(as.double(x)),
as.character(round(as.double(x), digit = 1)),
x)
}
round_string_factor <- compose(round_string_number, as.character)
# Here the recode is happening
TBL %>%
mutate_if(is.numeric, ~ round(., digit = 1)) %>%
mutate_if(is.factor, round_string_factor) %>%
mutate_if(~!is.numeric(.), round_string_number)
这将转换此数据
Time demand Mix Time2
<chr> <dbl> <chr> <fct>
1 0.1 0.83 3.38 0.1
2 0.12 1.03 4.403 0.12
3 0.3 1.9 a 0.3
4 0.14 1.6 5.34 0.14
5 0.5 1.56 c 0.5
6 0.7 1.98 9.32 0.7
进入这个:
Time demand Mix Time2
<chr> <dbl> <chr> <chr>
1 0.1 0.8 3.4 0.1
2 0.1 1 4.4 0.1
3 0.3 1.9 a 0.3
4 0.1 1.6 5.3 0.1
5 0.5 1.6 c 0.5
6 0.7 2 9.3 0.7