R - DPLYR:应用具有不同列名的计算

  • 本文关键字:计算 DPLYR 应用 r dplyr
  • 更新时间 :
  • 英文 :


我正在尝试创建以下公式:

Interest expense / (Total Debt(for all years)) / # number of years

数据如下所示;

GE2017    GE2016    GE2015    GE2014
Interest Expense              -2753000  -2026000  -1706000  -1579000
Long Term Debt               108575000 105080000 144659000 186596000
Short/Current Long Term Debt 134591000 136211000 197602000 261424000
Total_Debt                   243166000 241291000 342261000 448020000
GOOG2017 GOOG2016 GOOG2015 GOOG2014
Interest Expense              -109000  -124000  -104000  -101000
Long Term Debt                3943000  3935000  1995000  2992000
Short/Current Long Term Debt  3969000  3935000  7648000  8015000
Total_Debt                    7912000  7870000  9643000 11007000
NVDA2018 NVDA2017 NVDA2016 NVDA2015
Interest Expense               -61000   -58000   -47000   -46000
Long Term Debt                1985000  1985000     7000  1384000
Short/Current Long Term Debt  2000000  2791000  1434000  1398000
Total_Debt                    3985000  4776000  1441000  2782000

也就是说,对于GE,我试图将最近-2753000一年的利息支出除以GE所有 4 年的Total Debt平均值。

所以;

-2753000 / AVERAGE(243166000 + 241291000 + 342261000 + 448020000) = 0.0086

但是,我在取平均值时遇到了group_by()问题,因为GE和其他公司由于年份不同而具有不同的列名称。

cost_of_debt %>%
t() %>%
data.frame() %>%
rownames_to_column('rn') %>%
group_by(rn)
#Calcualtion here

其次;如果可能的话,我想做与上面相同的计算,但只使用每家公司的最后两年。

-2753000 / AVERAGE(243166000 + 241291000) = 0.01136

也许grepl函数在这里工作吗?

我有一个名为symbols的向量。

symbols <- c("NVDA", "GOOG", "GE")

数据:

cost_of_debt <- structure(list(GE2017 = c(-2753000, 108575000, 134591000, 243166000
), GE2016 = c(-2026000, 105080000, 136211000, 241291000), GE2015 = c(-1706000, 
144659000, 197602000, 342261000), GE2014 = c(-1579000, 186596000, 
261424000, 448020000), GOOG2017 = c(-109000, 3943000, 3969000, 
7912000), GOOG2016 = c(-124000, 3935000, 3935000, 7870000), GOOG2015 = c(-104000, 
1995000, 7648000, 9643000), GOOG2014 = c(-101000, 2992000, 8015000, 
11007000), NVDA2018 = c(-61000, 1985000, 2e+06, 3985000), NVDA2017 = c(-58000, 
1985000, 2791000, 4776000), NVDA2016 = c(-47000, 7000, 1434000, 
1441000), NVDA2015 = c(-46000, 1384000, 1398000, 2782000)), .Names = c("GE2017", 
"GE2016", "GE2015", "GE2014", "GOOG2017", "GOOG2016", "GOOG2015", 
"GOOG2014", "NVDA2018", "NVDA2017", "NVDA2016", "NVDA2015"), row.names = c("Interest Expense", 
"Long Term Debt", "Short/Current Long Term Debt", "Total_Debt"
), class = "data.frame")

对于第一种情况,在将行名称创建为一列(rownames_to_column- 从tibble(创建后,通过在"年"的开始和公司结束之间的交界处拆分,将其分隔为"公司"和"年份",按"公司"分组的名称通过取"利息.费用"的比例与mean值"Total_Debt"来创建"新"列。 然后,我们可以按"年"arrange,得到每个"公司"的最后两个"Total_Debt"的mean,然后除以"利息.费用">

library(dplyr)
cost_of_debt %>%
t() %>%
data.frame() %>%
rownames_to_column('rn')  %>%
separate(rn, into = c("firm", "year"),
"(?<=[A-Z])(?=[0-9])", convert = TRUE) %>%
group_by(firm) %>%
mutate(New = Interest.Expense/mean(Total_Debt)) %>%
arrange(firm, year) %>%
mutate(NewLast = Interest.Expense/mean(tail(Total_Debt, 2)))

我认为您需要先清理数据,以便更容易理解什么是观察值和什么是变量。谷歌整洁的数据:)这是我的解决方案。首先,我使数据整洁,然后计算简单。

library(tidyverse)
library(stringr)
                                                                                                                                                                                                                                                                                                                                                                                         ), class = "data.frame")
# Clean and make the data tidy
cost_of_debt <- cost_of_debt %>% 
as_tibble() %>% 
rownames_to_column(var = "indicator") %>% 
mutate(indicator = str_replace_all(indicator, regex("\s|\/"), "_")) %>% 
gather(k, value, -indicator) %>% 
separate(k, into = c("company", "year"), -4) %>% 
spread(indicator, value) %>% 
rename_all(tolower)

结果数据如下所示:

company year  interest_expense long_term_debt short_current_long_term_debt total_debt
<chr>   <chr>            <dbl>          <dbl>                        <dbl>      <dbl>
1 GE      2014          -1579000      186596000                    261424000  448020000
2 GE      2015          -1706000      144659000                    197602000  342261000
3 GE      2016          -2026000      105080000                    136211000  241291000
4 GE      2017          -2753000      108575000                    134591000  243166000
5 GOOG    2014           -101000        2992000                      8015000   11007000

然后我们可以回答您的问题:

cost_of_debt <- cost_of_debt %>%
group_by(company) %>% 
mutate(int_over_totdept4 = interest_expense / mean(total_debt),
int_over_totdept2 = interest_expense / mean(total_debt[year %in% c("2017", "2016")]))

这给出了一个数据帧(您的新变量最靠右

(:
company year  interest_expense long_term_debt short_current_long_term_debt total_debt int_over_totdept4 int_over_totdept2
<chr>   <chr>            <dbl>          <dbl>                        <dbl>      <dbl>             <dbl>             <dbl>
1 GE      2014          -1579000      186596000                    261424000  448020000          -0.00495          -0.00652
2 GE      2015          -1706000      144659000                    197602000  342261000          -0.00535          -0.00704
3 GE      2016          -2026000      105080000                    136211000  241291000          -0.00636          -0.00836
4 GE      2017          -2753000      108575000                    134591000  243166000          -0.00864          -0.0114 
5 GOOG    2014           -101000        2992000                      8015000   11007000          -0.0111           -0.0128 

如果您想要问题的摘要形式:

# First question:
cost_of_debt %>% filter(company == "GE", year == "2017") %>% select(company, year, int_over_totdept4)
# Second question:
cost_of_debt %>% filter(year == "2017") %>% select(company, year, int_over_totdept2)

最新更新