剥离选定的querystring属性/值对,因此清漆不会因它们而变化



我的目标是"白名单"某些Querystring属性及其值,因此Varnish不会在URL之间变化。

示例:

Url 1: http://foo.com/someproduct.html?utm_code=google&type=hello  
Url 2: http://foo.com/someproduct.html?utm_code=yahoo&type=hello  
Url 3: http://foo.com/someproduct.html?utm_code=yahoo&type=goodbye

在上面的示例中,我想将" utm_code"而不是" type",因此在第一个URL被击中后,我希望Varnish将缓存的内容提供给第二个URL。

但是,在第三个URL的情况下,属性"类型"值是不同的,因此应该是一个清漆的高速缓存。

我尝试了下面的两种方法(在我现在无法找到的Drupal帮助文章中找到),这些方法似乎不起作用。可能是因为我的正则是错误的。

# 1. strip out certain querystring values so varnish does not vary cache.
set req.url = regsuball(req.url, "([?|&])utm_(campaign|content|medium|source|term)=[^&s]*&?", "1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[?|&]+$", "");
# 2. strip out certain querystring values so varnish does not vary cache.
set req.url = regsuball(req.url, "([?|&])utm_campaign=[^&s]*&?", "1");
set req.url = regsuball(req.url, "([?|&])foo_bar=[^&s]*&?", "1");
set req.url = regsuball(req.url, "([?|&])bar_baz=[^&s]*&?", "1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[?|&]+$", "");

我想出了这一点并想分享。我找到了这一代码,使该代码可用于我需要的子例程。

sub vcl_recv {
    # strip out certain querystring params that varnish should not vary cache by
    call normalize_req_url;
    # snip a bunch of other code
}
sub normalize_req_url {
    # Strip out Google Analytics campaign variables. They are only needed
    # by the javascript running on the page
    # utm_source, utm_medium, utm_campaign, gclid, ...
    if(req.url ~ "(?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=") {
        set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=[%.-_A-z0-9]+&?", "");
    }
    set req.url = regsub(req.url, "(?&?)$", "");
}

正则有问题。
我更改了两个regsub调用中使用的言论:

sub normalize_req_url {
    # Clean up root URL
    if (req.url ~ "^/(?:?.*)?$") {
        set req.url = "/";
    }
    # Strip out Google Analytics campaign variables
    # They are only needed by the javascript running on the page
    # utm_source, utm_medium, utm_campaign, gclid, ...
    if (req.url ~ "(?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=") {
        set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=[%._A-z0-9-]+&?", "");
    }
    set req.url = regsub(req.url, "(?&|?|&)$", "");
}

第一个更改是" [%._ a-Z0-9-]"部分,因为破折号的功能像范围符号一样,这就是为什么我将其移至末端,并且应该逃脱点。

第二个更改不仅是在其余的URL上删除问号,还要删除一个anmpersand或问号和anmpersand。

来自https://github.com/mattiasgeniar/varnish-4.0-configuration-templates:

# Some generic URL manipulation, useful for all templates that follow
# First remove the Google Analytics added parameters, useless for our backend
if (req.url ~ "(?|&)(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=") {
  set req.url = regsuball(req.url, "&(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_-.%25]+)", "");
  set req.url = regsuball(req.url, "?(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_-.%25]+)", "?");
  set req.url = regsub(req.url, "?&", "?");
  set req.url = regsub(req.url, "?$", "");
}

您要剥离utm_code,但您使用的任何一个正格都不涵盖。

尝试以下操作:

# Strip out specific utm_ values from request URL query parameters
set req.url = regsuball(req.url, "([?|&])utm_(campaign|content|medium|source|term|code)=[^&s]*&?", "1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[?|&]+$", "");

或如果要剥离以utm_开头的所有 URL参数,则可以使用:

# Strip out ALL utm_ values from request URL query parameters
set req.url = regsuball(req.url, "([?|&])utm_(w+)=[^&s]*&?", "1");
# get rid of trailing & or ?
set req.url = regsuball(req.url, "[?|&]+$", "");

runamok的副本,但我在参数中获得了 %20

sub vcl_recv {
    # strip out certain querystring params that varnish should not vary cache by
    call normalize_req_url;
    # snip a bunch of other code
}
sub normalize_req_url {
    # Strip out Google Analytics campaign variables.
    # I allso stribe facebook local that are use for facebook javascript.
    # They are only neededby the javascript running on the page
    # utm_source, utm_medium, utm_campaign, gclid, ...
    if(req.url ~ "(?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|fb_local|mr:[A-z]+)=") {
        set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|fb_local|mr:[A-z]+)=[%.+-_A-z0-9]+&?", "");
    }
    set req.url = regsub(req.url, "(?&?)$", "");
}

你们是否尝试过?https://github.com/dridi/libvmod-querystring

示例
设置req.url = queryString.regfilter(req.url," utm _。*");

i通过添加对空参数的支持并对剩余的访问进行排序,在Runamok的答案上进行了一些改进,这是我实现的完整VTC文件,以验证正确性。

varnishtest "Test for URL normalization - Varnish 4"
server s1 {
  rxreq
  txresp -hdr "Backend: up" -body "Some content"
} -repeat 11 -start
varnish v1 -vcl+backend {
  import std;
  sub vcl_recv {
    # Strip out marketing variables. They are only needed by
    # the javascript running on the page.
    if (req.url ~ "(?|&)(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)(=|&|$)") {
      # Process params with value.
      set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=[%.-_A-z0-9]+&?", "");
      # Process params without value.
      set req.url = regsuball(req.url, "(gclid|cx|ie|cof|siteurl|zanpid|origin|utm_[a-z]+|mr:[A-z]+)=?(&|$)", "");
    }
    # Remove trailing '?', '?&'
    set req.url = regsub(req.url, "(?&?)$", "");
    # Sort query params, also removes trailing '&'
    set req.url = std.querysort(req.url);
  }
  sub vcl_deliver {
    set resp.http.X-Normalized-URL = req.url;
  }
} -start
client c1 {
  # Basic, no params.
  txreq -url "/test/some-url"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"
  # One blacklisted param.
  txreq -url "/test/some-url?utm_campaign=1"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"
  # One blacklisted param, without value.
  txreq -url "/test/some-url?utm_campaign"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"
  # Two blacklisted params.
  txreq -url "/test/some-url?utm_campaign=1&origin=hpg"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"
  # Two blacklisted params, one without value
  txreq -url "/test/some-url?utm_campaign&origin=123-abc%20"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"
  # Two blacklisted params, both without value
  txreq -url "/test/some-url?utm_campaign&origin="
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"
  # Three blacklisted params.
  txreq -url "/test/some-url?utm_campaign=ABC&origin=hpg&siteurl=br2"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"
  # Three blacklisted params, two without value
  txreq -url "/test/some-url?utm_campaign=1&origin=&siteurl"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url"
  # Three blacklisted params; one param to keep, with space encoded as +.
  txreq -url "/test/some-url?qss=hello+one&utm_campaign=some-value&origin=hpg&siteurl=br2"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url?qss=hello+one"
  # Three blacklisted params; one param to keep, with space encoded as %20, passed in-between blacklisted ones.
  txreq -url "/test/some-url?utm_campaign=1&qss=hello%20one&origin=hpg&siteurl=br2"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url?qss=hello%20one"
  # Three blacklisted params; three params to keep.
  txreq -url "/test/some-url?utm_campaign=a-value&qss=hello+one&origin=hpg&siteurl=br2&keep2=abc&keep1"
  rxresp
  expect resp.http.X-Normalized-URL == "/test/some-url?keep1&keep2=abc&qss=hello+one"
} -run
varnish v1 -expect client_req == 11

最新更新