所以我有这个url $url = "localhost:8000/vehicles"我想通过cron作业获取但页面返回的是HTML所以我想用symfony dom crawler获取所有车辆而不是正则表达式
在我的文件的顶部我添加了
use SymfonyComponentDomCrawlerCrawler;
创建一个新的实例,我试了:
$crawler = new Crawler($data);
and I tried
$crawler = Crawler::create($data);
,但这给了我一个错误,也尝试添加
SymfonyComponentDomCrawlerCrawler::class,
到服务提供者,但是当我执行命令:
composer dump-autoload
它给了我以下错误
In Crawler.php line 66:
SymfonyComponentDomCrawlerCrawler::__construct(): Argument #1 ($node) must be of type DOMNodeList|DOMNode|array|string|null, IlluminateFoundationApplication given, called in C:xampphtdocsDrostMachinehandelDrostMachinehandelvendorlaravelfr
ameworksrcIlluminateFoundationProviderRepository.php on line 208
Script @php artisan package:discover --ansi handling the post-autoload-dump event returned with error code 1
我不知道如何解决这个问题。
获取url的函数如下:
public function handle()
{
$url = SettingsController::fetchSetting("fetch:vehicles");
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$data = curl_exec($ch);
$vehicles = $this->scrapeVehicles($data, $url);
Log::debug($vehicles);
curl_close($ch);
}
private function scrapeVehicles(string $data, string $url): array
{
$crawler = Crawler::create($data);
$vehicles = $crawler->filter(".vehicleTile");
return $vehicles;
}
$data的内容:
https://pastebin.com/GJ300KEv
由于没有经过测试,我不确定。
确保你安装了正确的软件包
composer require symfony/dom-crawler
初始化,使用全路径。(因为它不是Laravel的方式(包))
$crawler = SymfonyComponentDomCrawlerCrawler::create($data);
composer require——dev" symfony/dom-crawler"; "^6.3.x-dev">
示例爬虫方法名称空间的应用程序 Http 控制器;
use GuzzleHttpExceptionGuzzleException;
use SymfonyComponentDomCrawlerCrawler;
class CrawlerController extends Controller
{
private $url;
public function __construct()
{
$this->url = "https://www.everything5pounds.com/en/Shoes/c/shoes/results?q=&page=6";
}
public function index()
{
$client = new GuzzleHttpClient();
try {
$response = $client->request('GET', $this->url);
if ($response->getStatusCode() == 200) {
$res = json_decode($response->getBody());
$results = $res->results;
return $results;
/*$results = (array)json_decode($res);
$products = array();
foreach ($results as $result) {
$product = [
"name" => $result["name"],
];
array_push($products, $product);
}
return $products;*/
//return $result;
//return $this->parseContent($result);
} else {
return $response->getReasonPhrase();
}
} catch (GuzzleException $e) {
return $e->getMessage();
}
}
解析内容并存储
public function parseContent($result)
{
$crawler = new Crawler($result);
$elements = $crawler->filter('.productGridItem')->each(function (Crawler $node, $i) {
return $node;
});
$products = array();
foreach ($elements as $item) {
$image = $item->filter('.thumb .productMainLink img')->attr('src');
$title = $item->filter('.productGridItem .details')->text();
$price = $item->filter('.productGridItem .priceContainer')->text();
$product = [
"image" => 'https:' . $image,
"title" => $title,
"price" => $price,
];
array_push($products, $product);
}
return $products;
}