用于获取include和require指令的正则表达式



我试图从使用正则表达式(在Java中)的PHP文件中获取所有包含指令。

表达式应该只挑选那些文件名表示为未连接的字符串字面值的文件名。不需要包含常量或变量。

检测应该适用于单引号和双引号,include -s和require -s,加上_once的额外技巧,最后但并非最不重要的是,关键字和函数式调用。

一个粗略的输入示例:

<?php
require('a.php');
require 'b.php';
require("c.php");
require "d.php";
include('e.php');
include 'f.php';
include("g.php");
include "h.php";
require_once('i.php');
require_once 'j.php';
require_once("k.php");
require_once "l.php";
include_once('m.php');
include_once 'n.php';
include_once("o.php");
include_once "p.php";
?>

和输出:

["a.php","b.php","c.php","d.php","f.php","g.php","h.php","i.php","j.php","k.php","l.php","m.php","n.php","o.php","p.php"]

任何想法?

使用token_get_all。它很安全,不会让你头痛。如果您需要用户地代码,还可以使用PEAR的PHP_Parser

要准确地做到这一点,您确实需要完全解析PHP源代码。这是因为文本序列:require('a.php');可以出现在根本不是真正包含的地方——比如注释、字符串和HTML标记。例如,以下不是真正的PHP包含,但将由正则表达式匹配:

<?php // Examples where a regex solution gets false positives:
    /* PHP multi-line comment with: require('a.php'); */
    // PHP single-line comment with: require('a.php');
    $str = "double quoted string with: require('a.php');";
    $str = 'single quoted string with: require("a.php");';
?>
    <p>HTML paragraph with: require('a.php');</p>

也就是说,如果您对得到一些误报感到满意,那么以下单个正则表达式解决方案将很好地从所有PHP include变体中抓取所有文件名:

// Get all filenames from PHP include variations and return in array.
function getIncludes($text) {
    $count = preg_match_all('/
        # Match PHP include variations with single string literal filename.
        b              # Anchor to word boundary.
        (?:             # Group for include variation alternatives.
          include       # Either "include"
        | require       # or "require"
        )               # End group of include variation alternatives.
        (?:_once)?      # Either one may be the "once" variation.
        s*             # Optional whitespace.
        (               # $1: Optional opening parentheses.
          (            # Literal open parentheses,
          s*           # followed by optional whitespace.
        )?              # End $1: Optional opening parentheses.
        (?|             # "Branch reset" group of filename alts.
          '([^']+)'  # Either $2{1]: Single quoted filename,
        | "([^"]+)"     # or $2{2]: Double quoted filename.
        )               # End branch reset group of filename alts.
        (?(1)           # If there were opening parentheses,
          s*           # then allow optional whitespace
          )            # followed by the closing parentheses.
        )               # End group $1 if conditional.
        s*             # End statement with optional whitespace
        ;               # followed by semi-colon.
        /ix', $text, $matches);
    if ($count > 0) {
        $filenames = $matches[2];
    } else {
        $filenames = array();
    }
    return $filenames;
}

附加2011-07-24原来OP想要一个解决方案在Java而不是PHP。下面是一个经过测试的Java程序,它几乎完全相同。请注意,我不是Java专家,也不知道如何动态调整数组的大小。因此,下面的解决方案(粗略地)设置了一个固定大小的数组(100)来保存文件名数组。

import java.util.regex.*;
public class TEST {
    // Set maximum size of array of filenames.
    public static final int MAX_NAMES = 100;
    // Get all filenames from PHP include variations and return in array.
    public static String[] getIncludes(String text)
    {
        int count = 0;                          // Count of filenames.
        String filenames[] = new String[MAX_NAMES];
        String filename;
        Pattern p = Pattern.compile(
            "# Match include variations with single string filename. n" +
            "\b             # Anchor to word boundary.              n" +
            "(?:             # Group include variation alternatives. n" +
            "  include       # Either 'include',                     n" +
            "| require       # or 'require'.                         n" +
            ")               # End group of include variation alts.  n" +
            "(?:_once)?      # Either one may have '_once' suffix.   n" +
            "\s*            # Optional whitespace.                  n" +
            "(?:             # Group for optional opening paren.     n" +
            "  \(           # Literal open parentheses,             n" +
            "  \s*          # followed by optional whitespace.      n" +
            ")?              # Opening parentheses are optional.     n" +
            "(?:             # Group for filename alternatives.      n" +
            "  '([^']+)'     # $1: Either a single quoted filename,  n" +
            "| "([^"]+)"  # or $2: a double quoted filename.      n" +
            ")               # End group of filename alternativess.  n" +
            "(?:             # Group for optional closing paren.     n" +
            "  \s*          # Optional whitespace,                  n" +
            "  \)           # followed by the closing parentheses.  n" +
            ")?              # Closing parentheses is optional .     n" +
            "\s*            # End statement with optional ws,       n" +
            ";               # followed by a semi-colon.               ",
            Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.COMMENTS);
        Matcher m = p.matcher(text);
        while (m.find() && count < MAX_NAMES) {
            // The filename is in either $1 or $2
            if (m.group(1) != null) filename = m.group(1);
            else                    filename = m.group(2);
            // Add this filename to array of filenames.
            filenames[count++] = filename;
        }
        return filenames;
    }
    public static void main(String[] args)
    {
        // Test string full of various PHP include statements.
        String text = "<?phpn"+
            "n"+
            "require('a.php');n"+
            "require 'b.php';n"+
            "require("c.php");n"+
            "require "d.php";n"+
            "n"+
            "include('e.php');n"+
            "include 'f.php';n"+
            "include("g.php");n"+
            "include "h.php";n"+
            "n"+
            "require_once('i.php');n"+
            "require_once 'j.php';n"+
            "require_once("k.php");n"+
            "require_once "l.php";n"+
            "n"+
            "include_once('m.php');n"+
            "include_once 'n.php';n"+
            "include_once("o.php");n"+
            "include_once "p.php";n"+
            "n"+
            "?>n";
        String filenames[] = getIncludes(text);
        for (int i = 0; i < MAX_NAMES && filenames[i] != null; i++) {
            System.out.print(filenames[i] +"n");
        }
    }
}

/(?:require|include)(?:_once)?[( ]['"](.*).php['"])?;/

应该适用于您指定的所有情况,并且只捕获没有扩展名

的文件名。测试脚本:

<?php
$text = <<<EOT
require('a.php');
require 'b.php';
require("c.php");
require "d.php";
include('e.php');
include 'f.php';
include("g.php");
include "h.php";
require_once('i.php');
require_once 'j.php';
require_once("k.php");
require_once "l.php";
include_once('m.php');
include_once 'n.php';
include_once("o.php");
include_once "p.php";
EOT;
$re = '/(?:require|include)(?:_once)?[( ]['"](.*).php['"])?;/';
$result = array();
preg_match_all($re, $text, $result);
var_dump($result);

要获得您想要的文件名,请阅读$results[1]

我可能应该指出,我也偏向于cweiske的答案,并且除非您真的只是想练习正则表达式(或者想使用grep),否则您应该使用标记器。

下面的代码应该可以很好地工作:

/^(require|include)(_once)?((s+)("|')(.*?)("|')()|s+);$/

您将需要第四个捕获组

这个适合我:

preg_match_all('/b(require|include|require_once|include_once)b((| )('|")(.+).php('|"))?;/i', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[4];

最新更新