R-按灵活的标准排序



我正在使用协作过滤构建产品推荐引擎(在R中)。为了将更多有利可图的物品放在建议的顶部,我们已经开发了 fixible 看起来像图1的业务规则。

+---------------+----------+-----------------+
| Sort Priority | Level 1  | Level 2         |
+---------------+----------+-----------------+
| 1             | Brand    | Versatile Foods |
+---------------+----------+-----------------+
|               |          | Agro            |
+---------------+----------+-----------------+
|               |          | Specialty Foods |
+---------------+----------+-----------------+
|               |          |                 |
+---------------+----------+-----------------+
| 2             | Category | Dairy           |
+---------------+----------+-----------------+
|               |          | Produce         |
+---------------+----------+-----------------+
|               |          | Seafood         |
+---------------+----------+-----------------+
|               |          |                 |
+---------------+----------+-----------------+
| 3             | Seasonal | Y               |
+---------------+----------+-----------------+
|               |          | N               |
+---------------+----------+-----------------+
            figure 1

业务规则:在排序表格时,品牌列应优先考虑 应该优先于季节性的类别。这是由列排序优先级的值决定的。

在品牌列中排序时,多功能食品优先于农业和农业 多于特种食品。 如果品牌列中的价值没有出现 在规则中,值必须按字母顺序排序。

相同的排序逻辑应适用于规则定义中的每个条目。

作为推荐算法的发展。可以更改/编辑业务规则,以具有较小或更多的级别。例如将来可能会添加一个额外的Level1条目,例如类型(犹太,素食,清真)等。规则将如下所示:

+---------------+----------+-----------------+
| Sort Priority | Level 1  | Level 2         |
+---------------+----------+-----------------+
| 1             | Brand    | Versatile Foods |
+---------------+----------+-----------------+
|               |          | Agro            |
+---------------+----------+-----------------+
|               |          | Specialty Foods |
+---------------+----------+-----------------+
|               |          |                 |
+---------------+----------+-----------------+
| 2             | Category | Dairy           |
+---------------+----------+-----------------+
|               |          | Produce         |
+---------------+----------+-----------------+
|               |          | Seafood         |
+---------------+----------+-----------------+
|               |          |                 |
+---------------+----------+-----------------+
| 3             | Type     | Kosher          |
+---------------+----------+-----------------+
|               |          | Halal           |
+---------------+----------+-----------------+
|               |          | Vegan           |
+---------------+----------+-----------------+
|               |          |                 |
+---------------+----------+-----------------+
| 4             | Seasonal | Y               |
+---------------+----------+-----------------+
|               |          | N               |
+---------------+----------+-----------------+
            figure 2

我需要在R中构建脚本的帮助,该脚本将按照上述业务规则对上表(加载到数据框架中)。我要解决的真正问题是,我不想每次将新条目添加到规则中时更改代码。

输入数据(推荐引擎输出)将是这种类型(图3)。

+-----+-----------------+----------+----------+
| SKU | Brand           | Category | Seasonal |
+-----+-----------------+----------+----------+
| 1   | Versatile Foods | Dairy    | Y        |
+-----+-----------------+----------+----------+
| 2   | Agro            | Produce  | Y        |
+-----+-----------------+----------+----------+
| 3   | Specialty Foods | Seafood  | N        |
+-----+-----------------+----------+----------+
| 4   | Agro            | Produce  | N        |
+-----+-----------------+----------+----------+
| 5   | Specialty Foods | Organic  | Y        |
+-----+-----------------+----------+----------+
| 6   | Agro            | Meat     | N        |
+-----+-----------------+----------+----------+
| 7   | Versatile Foods | Seafood  | N        |
+-----+-----------------+----------+----------+
| 8   | USA Bread       | Bakery   | Y        |
+-----+-----------------+----------+----------+
| 9   | Specialty Foods | Seafood  | N        |
+-----+-----------------+----------+----------+
| 10  | Versatile Foods | Seafood  | N        |
+-----+-----------------+----------+----------+
                  figure 3

使用规则定义如图1所示,脚本的输出应与下表一样。
注意商业规则中未发生的品牌=美国面包是如何放置在排序列表底部的。
此外,对于项目4和6,具有类别='农产品'的记录被放置在唱片上方,属于类别='肉',因为在商业规则中找不到条目"肉",而是"农产品"。

+-----+-----------------+----------+----------+
| SKU | Brand           | Category | Seasonal |
+-----+-----------------+----------+----------+
| 1   | Versatile Foods | Dairy    | Y        |
+-----+-----------------+----------+----------+
| 7   | Versatile Foods | Seafood  | N        |
+-----+-----------------+----------+----------+
| 10  | Versatile Foods | Seafood  | N        |
+-----+-----------------+----------+----------+
| 2   | Agro            | Produce  | Y        |
+-----+-----------------+----------+----------+
| 4   | Agro            | Produce  | N        |
+-----+-----------------+----------+----------+
| 6   | Agro            | Meat     | N        |
+-----+-----------------+----------+----------+
| 3   | Specialty Foods | Seafood  | N        |
+-----+-----------------+----------+----------+
| 9   | Specialty Foods | Seafood  | N        |
+-----+-----------------+----------+----------+
| 5   | Specialty Foods | Organic  | Y        |
+-----+-----------------+----------+----------+
| 8   | USA bread       | Bakery   | Y        |
+-----+-----------------+----------+----------+
                 figure 4

您可以使用因子编码随心所欲地订购东西。例如:

> lvl <- c('Versatile Foods', 'Agro', 'Specialty Foods')
> lvl <- append(lvl, sort(setdiff(unique(df$Brand), lvl)))
> 
> df$Brand <- factor(df$Brand, levels=lvl)
> 
> lvl <- c("Dairy", "Produce", "Seafood")
> lvl <- append(lvl, sort(setdiff(unique(df$Category), lvl)))
> 
> df$Category <- factor(df$Category, levels=lvl)
> 
> df$Seasonal <- factor(df$Seasonal, levels=c('Y', 'N'))
> 
> 
> df[order(df$Brand, df$Category, df$Seasonal), ]
   SKU           Brand Category Seasonal
1    1 Versatile Foods    Dairy        Y
7    7 Versatile Foods  Seafood        N
10  10 Versatile Foods  Seafood        N
2    2            Agro  Produce        Y
4    4            Agro  Produce        N
6    6            Agro  Produce        N
3    3 Specialty Foods  Seafood        N
9    9 Specialty Foods  Seafood        N
5    5 Specialty Foods  Organic        Y
8    8       USA Bread   Bakery        Y

此方法涉及定义排序级别表,然后在与主表合并后使用新列执行排序。

library(dplyr)
rank <- data_frame(Brand = c('Versatile Foods','Agro','Specialty Foods'),
                   Brand_rank = c(1,2,3))
df <- left_join(df, rank, on="Brand") %>%
    arrange(Brand_rank, Brand, Category, Seasonal) %>%
    select(-Brand_rank)
df
# A tibble: 10 × 4
#    SKU           Brand Category Seasonal
#    <dbl>           <chr>    <chr>    <chr>
#1      1 Versatile Foods    Dairy        Y
#2      7 Versatile Foods  Seafood        N
#3     10 Versatile Foods  Seafood        N
#4      4            Agro  Produce        N
#5      6            Agro  Produce        N
#6      2            Agro  Produce        Y
#7      5 Specialty Foods  Organic        Y
#8      3 Specialty Foods  Seafood        N
#9      9 Specialty Foods  Seafood        N
#10     8       USA Bread   Bakery        Y

最新更新