r语言 - 使用排列、分组依据和索引计算百分比变化 - r - Computing percentage change using arrange, groupby and indexing 小贝子编程网

这是一个reprex：


library(crypto2)
library(dplyr)
coins = crypto_list(only_active = TRUE)
coins = coins[(coins$symbol %in% c("BTC","ETH")),]
thirteen.months.data = crypto_history(coins, start_date=Sys.Date() - (13 * 30))
mydf <- thirteen.months.data[substr(thirteen.months.data$timestamp,1,10) %in% as.character((Sys.Date()-c(1,31,366))),] %>% select(timestamp,name,close,market_cap) %>% arrange(name,timestamp) %>% as.data.frame
# Present
df1  <- mydf %>% group_by(name) %>% slice(3) %>% select(-1)
# M-o-M growth
df2  <- mydf %>% group_by(name) %>% summarise(m.o.m  = (close[3]-close[2])/close[2]*100)
# Y-o-Y growth
df3 <- mydf %>% group_by(name) %>% summarise(y.o.y = (close[3]-close[1])/close[1]*100)

我对上述程序有 2 个疑问。

在安排后完成的group_by会打乱使用安排完成的订单吗？
m.o.m(月环比)/同比(同比)会按预期工作吗？换句话说，如果我在分组依据之后关闭[2]，它会使用每个组中的第二个元素吗？是否允许这种索引方式？

不，发布group_by对数据的顺序绝对没有影响。通过演示如何完成分组，意识到其分组索引基于帧的顺序。
```
X <- data.frame(id=1:3, grp=c(4,6,4))
group_by(X, grp) %>%
attr("groups") %>%
str()
# tibble [2 × 2] (S3: tbl_df/tbl/data.frame)
#  $ grp  : num [1:2] 4 6
#  $ .rows: list<int> [1:2] 
#   ..$ : int [1:2] 1 3
#   ..$ : int 2
#   ..@ ptype: int(0) 
#  - attr(*, ".drop")= logi TRUE
```
分组框架的groups属性通常不显示为原始，尽管其内容会通知它的打印，# Groups: grp [2]。在此示例中，.rows的第一个元素是c(1, 3)，表示第一组由行 1 和 3 组成。
由此，可以理解分组由内部结构处理，该结构以任何顺序跟踪行。(再努力一些，可以看到，如果对行重新排序，groups/.rows属性会调整。

是的，[索引按预期工作。再举一个例子，

mtcars %>%
mutate(disp2a = disp[2]) %>%
group_by(cyl) %>%
mutate(disp2b = disp[2]) %>%
ungroup()
# # A tibble: 32 × 13
#      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb disp2a disp2b
#    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
#  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4    160   160 
#  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4    160   160 
#  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1    160   147.
#  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1    160   160 
#  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2    160   360 
#  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1    160   160 
#  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4    160   360 
#  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2    160   147.
#  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2    160   147.
# 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4    160   160 
# # … with 22 more rows
# # ℹ Use `print(n = ...)` to see more rows

请注意，disp2a(没有分组的disp的第二个元素)对所有行都160，disp2b(每个组中disp的第二个元素)显示组之间的可变性(每个组内的不变性)。

不过，正如@MartinGal所建议的，nth帮助程序函数在这里也很有用：

mtcars %>%
mutate(disp2a = nth(disp, 2)) %>%
group_by(cyl) %>%
mutate(disp2b = nth(disp, 2)) %>%
ungroup()

它的参数有效地提供了与[相同的功能：n=(索引(ices);order_by=mpg可以用disp[order(mpg)][2](n=2)来模仿;和default=允许更改索引超出范围时发生的情况(R 的默认行为是返回NA)：

(1:3)[4]
# [1] NA
nth(1:3, 4)
# [1] NA
nth(1:3, 4, default = Inf)
# [1] Inf

r语言 - 使用排列、分组依据和索引计算百分比变化

相关内容

最新更新

热门标签：