我有长格式的月报表。对于每个月,我想根据公司的业绩将它们分成10个大小相等的组(十分位数)。
我的数据样本看起来如下(这里按值升序排列,这是每月的回报):
print(R4, n=10)
# A tibble: 1,125 x 5
# Groups: year [3]
company year month value mktvalue
<chr> <dbl> <dbl> <dbl> <dbl>
1 STIFEL FINANCIAL 1997 7 0.821 50.3
2 RAYMOND JAMES FINL. 1997 10 0.833 1070.
3 CHARLES SCHWAB 1997 3 0.853 6250.
4 STATE STREET 1997 3 0.863 6178.
5 STIFEL FINANCIAL 1996 7 0.871 31.6
6 FRANKLIN RESOURCES 1997 3 0.872 7459.
7 BERKSHIRE HATHAWAY 'A' 1997 8 0.879 53857.
8 ALLIANCEBERNSTEIN HLDG. UNT. 1997 3 0.879 2257.
9 STATE STREET 1997 8 0.890 8504.
10 MORGAN STANLEY 1996 7 0.891 8764.
# ... with 1,115 more rows
因为我想为每个月的公司分配等级,所以我首先过滤了一个月:
R5 <- R4 %>%
filter(year == 1997, month == 12, !is.na(value)) %>%
arrange(value) %>%
mutate(rank = rank(value))
print(R5)
# A tibble: 13 x 6
# Groups: year [1]
company year month value mktvalue rank
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 FRANKLIN RESOURCES 1997 12 0.967 11505. 1
2 STATE STREET 1997 12 0.978 9250. 2
3 JEFFERIES FINANCIAL GROUP 1997 12 0.998 2175. 3
4 BERKSHIRE HATHAWAY 'A' 1997 12 1.02 55184. 4
5 BANK OF NEW YORK MELLON 1997 12 1.08 21152. 5
6 CHARLES SCHWAB 1997 12 1.09 10854. 6
7 MORGAN STANLEY 1997 12 1.09 32712. 7
8 RAYMOND JAMES FINL. 1997 12 1.11 1207. 8
9 MGIC INVESTMENT 1997 12 1.14 7030. 9
10 ALLIANCEBERNSTEIN HLDG. UNT. 1997 12 1.15 3107. 10
11 AFFILIATED MANAGERS 1997 12 1.16 373. 11
12 RADIAN GP. 1997 12 1.16 1284. 12
13 STIFEL FINANCIAL 1997 12 1.17 81.4 13
我试着用这个问题的答案把公司分成几组:R
分组排名请告诉我是否有更聪明的方法来创建十分位数组
percent.rank <- function(x) trunc(rank(x)/length(x)*100)
R6 <- within(R5, pr <- percent.rank(rank))
R6$decile <- cut(R6$pr, breaks = c(0,10,20,30,40,50,60,70,80,90,100), labels = c(1:10))
print(R6)
# A tibble: 13 x 8
# Groups: year [1]
company year month value mktvalue rank pr decile
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
1 FRANKLIN RESOURCES 1997 12 0.967 11505. 1 7 1
2 STATE STREET 1997 12 0.978 9250. 2 15 2
3 JEFFERIES FINANCIAL GROUP 1997 12 0.998 2175. 3 23 3
4 BERKSHIRE HATHAWAY 'A' 1997 12 1.02 55184. 4 30 3
5 BANK OF NEW YORK MELLON 1997 12 1.08 21152. 5 38 4
6 CHARLES SCHWAB 1997 12 1.09 10854. 6 46 5
7 MORGAN STANLEY 1997 12 1.09 32712. 7 53 6
8 RAYMOND JAMES FINL. 1997 12 1.11 1207. 8 61 7
9 MGIC INVESTMENT 1997 12 1.14 7030. 9 69 7
10 ALLIANCEBERNSTEIN HLDG. UNT. 1997 12 1.15 3107. 10 76 8
11 AFFILIATED MANAGERS 1997 12 1.16 373. 11 84 9
12 RADIAN GP. 1997 12 1.16 1284. 12 92 10
13 STIFEL FINANCIAL 1997 12 1.17 81.4 13 100 10
到目前为止,我手工做了一个月。我如何在不手动过滤的情况下对数据帧的每个月和年份应用此功能?我的全部数据是从1996年1月到2020年12月。组/十分位数必须大小相等,并根据每月的表现(值)排名。有什么循环的方法或者其他聪明的方法吗?复制数据:
R4 <- data.frame(company = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L",
"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L"),
year = c(1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996, 1996,1996, 1996, 1996,
1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997),
month = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3),
value = c(1.15999568000864, 0.983783783783784, 1.07692307692308, 1, 0.989795918367347,
0.989690721649484, 1, 1.04166666666667, 0.99, 1.05050505050505, 1.07211538461538,
1.01345291479821, 0.942477876106195, 1.06572769953052, 0.986784140969163,
0.879464285714286, 1.11167512690355, 0.922374429223744, 1.15841584158416,
1.18803418803419, 0.973018705035971, 1.10906058132519, 0.92, 1.00724637681159))
我不确定我是否完全遵循你想要做的事情,但是考虑到你的数据,我能够为每个month
/year
组合创建value
的排名分组:
library(dplyr)
R4 %>%
group_by(year, month) %>%
mutate(rank = rank(value)) %>%
arrange(month, year, rank)
这个收益率:
# A tibble: 60 × 6
# Groups: year, month [24]
company year month value mktvalue rank
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 ALLIANCEBERNSTEIN HLDG. UNT. 1996 1 0.984 1802. 1
2 BERKSHIRE HATHAWAY 'A' 1996 1 0.994 37088. 2
3 BANK OF NEW YORK MELLON 1996 1 1.05 9517. 3
4 ALLIANCEBERNSTEIN HLDG. UNT. 1997 1 1.07 2307. 1
5 BANK OF NEW YORK MELLON 1997 1 1.08 13652. 2
6 BANK OF NEW YORK MELLON 1996 2 1.01 10488. 1
7 ALLIANCEBERNSTEIN HLDG. UNT. 1996 2 1.08 1931. 2
8 BERKSHIRE HATHAWAY 'A' 1996 2 1.11 39584. 3
9 ALLIANCEBERNSTEIN HLDG. UNT. 1997 2 0.987 2441. 1
10 BANK OF NEW YORK MELLON 1997 2 1.06 15457. 2