如何获得R中一个变量与另一个变量的总数?这两个变量都是非数字的



我有这个数据集,我正在尝试创建一个新的变量(n_commissions(,它将为我提供每个国家的段落总数。我知道这是非常基本的,但不知怎么的,我已经被困了一个小时了。我认为这与两个变量都是字符类有关,并且我想要一个数字作为输出。

请帮忙,这样我终于可以继续前进了。谢谢。

structure(list(country = c("Afghanistan", "Afghanistan"), paragraphs = c("The representative of Afghanistan confirmed that his Government would ensure the transparency of its ongoing privatization programme. He stated that his Government would provide reports to WTO Members on developments in its privatisation programme, periodically and upon request, as long as the programme would be in existence, and along the lines of the information already provided to the Working Party during the accession process. The Working Party took note of this commitment. ", 
"The representative of Afghanistan confirmed that from the date of accession, State-trading enterprises (including State-owned and State-controlled enterprises, enterprises with special or exclusive privileges, and unitary enterprises) in Afghanistan would make any purchases or sales, which were not for the Government's own use or consumption, solely in accordance with commercial considerations, including price, quality, availability, marketability, transportation and other conditions of purchase or sale. He further confirmed that these State trading enterprises would afford the enterprises of other Members adequate opportunity, in accordance with customary business practice, to compete for participation in purchases from or sales to Afghanistan's State enterprises. The Working Party took note of these commitments.  "
)), row.names = 1:2, class = "data.frame")
Columns: 8
$ country            <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", "Afghanis…
$ category           <chr> "State Ownership and Privatization; State-Trading Entities", "State Ownership and Pr…
$ paragraphs         <chr> "The representative of Afghanistan confirmed that his Government would ensure the tr…
$ year_complete      <int> 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, 2016, …
$ year_start         <int> 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, …
$ accession_duration <int> 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, …
$ wto                <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ n_commitments      <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", …

以下是如何按国家统计唯一段落:

df %>% 
group_by(country) %>%
summarize(n_unique_paragraphs = n_distinct(paragraphs))

如果,正如你所说;数据的每一行是一个唯一的段落",那么我们可以简化并只计算行数:

df %>% group_by(country) %>%
summarize(n = n())

还有内置的实用程序功能:

df %>% count(country)

最新更新