php中的URL白名单



我们正在开发一个聊天应用程序,学生和老师可以通过聊天交流进行交流,现在网站上有作业,如果学生有相关问题,他可以在消息中包含作业的url,现在出于教师和学生的安全考虑,我们希望在某些url上列入白名单。

这里是它应该如何工作

消息:与此作业相关的问题https://school.com/assignment/1425

链接是可点击的,因为它是白名单

消息:此分配存在一些问题https://schoool.com/assignment/1425

这个链接有一个额外的o,在我们的情况下应该标记为垃圾邮件,我们将删除该链接

我们无法找到如何进行这项工作,我在下面提到了我们期望的输出

https://school.com白名单

https://www.school.com白名单

http://school.com白名单

http://wwwschool.com白名单

school.com白名单

www.school.com白名单

www.schoool.com垃圾邮件url

https://www.schoool.com垃圾邮件url

www.schoool.com垃圾邮件url

http://www.schoool.com垃圾邮件url

schoool.com垃圾邮件url

我们的当前代码

function filter_url($string = null)
{
    $url = '/(((https?://)?www)?.?[a-z0-9]+.[a-z0-9]+[a-z0-9-/?&#%=]+)/';
    $whitelist = '/b(school)b/';
    if(preg_match($url,$string,$output))
    {
        if(preg_match($whitelist,$output[0]))
        {
            // whitelisted string
            return $string;
        }
        else
        {
            return null;
        }
     }
 }

这个代码的问题是,它将像这样的URL列入白名单

school.stealpassword.com

school.xxx

为您想要允许的域定义一个白名单,然后使用内置的parse_urlphp函数从url中提取域并对照白名单进行检查。

$testLinks = [
    'https://school.com',
    'https://www.school.com',
    'http://school.com',
    'http://wwwschool.com',
    'school.com',
    'www.school.com',
    'www.schoool.com',
    'https://www.schoool.com',
    'www.schoool.com',
    'http://www.schoool.com',
    'schoool.com'
];
$whitelistDomains = [
    'school.com'
];
foreach($testLinks as $link){
    print $link . ' is ' . (checkUrl($link,$whitelistDomains)===TRUE? 'valid':'spam'). PHP_EOL;
}

function checkUrl($link,$whitelistDomains)
{
    $urlData = parse_url($link);
    $domain = isset($urlData['host'])? $urlData['host'] : $link;
    if (in_array($domain,$whitelistDomains)){
        return true;
    }
    else{
        return false;
    }   
}

将输出

https://school.com is valid
https://www.school.com is spam
http://school.com is valid
http://wwwschool.com is spam
school.com is valid
www.school.com is spam
www.schoool.com is spam
https://www.schoool.com is spam
www.schoool.com is spam
http://www.schoool.com is spam
schoool.com is spam

www.school.comwwwschool.com添加到白名单将输出以下

https://school.com is valid
https://www.school.com is valid  // this becomes valid
http://school.com is valid
http://wwwschool.com is valid // this becomes valid
school.com is valid
www.school.com is valid
www.schoool.com is spam
https://www.schoool.com is spam
www.schoool.com is spam
http://www.schoool.com is spam
schoool.com is spam

这个怎么样?

preg_match("/(([h|H]ttps?://)?[w|W]ww)?.?([s|S]chool.com.*)/", $input, $output);

http://www.phpliveregex.com/p/fAU

所有列入白名单的URL都有"school.com"的共同点。因此,将整个字符串添加到regexp中。

最新更新