simplehtmldom解析脚本,用于中断纯文本中的数据



这是我的脚本,我在其中获取三个项目药品名称、通用名称和类名。我的问题是,我成功地分别获取了医药名称,但通用名称和类名是字符串。如果您运行该脚本,您将更好地了解我实际想说的内容,我想在表中存储泛型名称和类名是单独的列。

脚本

<?php
error_reporting(0);
//simple html dom file
require('simple_html_dom.php');
//target url
$html = file_get_html('http://www.drugs.com/condition/atrial-flutter.html?rest=1');
//crawl td columns
 foreach($html->find('td') as $element)
{   
    //get drug name
    $drug_name = $element->find('b');
    foreach($drug_name as $drug_name)
    {
        echo "Drug Name:-".$drug_name;
        foreach($element->find('span[class=small] a',2) as $t)
        {
            //get the inner HTML
            $data = $t->plaintext;
            echo $data;
        }
        echo "<br/>";
    }
}
?>

提前感谢

您当前的代码离您需要做的有点远,但您可以使用css选择器来简化这些元素。

示例:

$data = array();
$html = file_get_html('http://www.drugs.com/condition/atrial-flutter.html?rest=1');
foreach($html->find('tr td[1]') as $td) { // you do not need to loop each td!
// target the first td of the row
    $drug_name = $td->find('a b', 0)->innertext; // get the drug name bold tag inside anchor
    $other_info = $td->find('span.small[2]', 0); // get the other info
    $generic_name = $other_info->find('a[1]', 0)->innertext; // get the first anchor, generic name
    $children_count = count($other_info->children()); // count all of the children
    $classes = array();
    for($i = 1; $i < $children_count; $i++) { // since you already got the first, (in position zero) iterate all children starting from 1
        $classes[] = $other_info->find('a', $i)->innertext; // push it inside another container
    }
    $data[] = array(
        'drug_name' => $drug_name,
        'generic_name' => $generic_name,
        'classes' => $classes,
    );
}
echo '<pre>';
print_r($data);

最新更新