我尝试通过bs4抓取数据。对于每一页,我想取所有的产品id,当我从第一页取数据时,这是可以的,但从第2页开始,它总是显示第一页的产品id。下面是我的代码(虽然我更改了page = 5):
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://tiki.vn/lam-sach-da-mat/c11232?sort=top_seller%3Fpage%3D5&page=5')
bs = BeautifulSoup(html, 'html.parser')
result =bs.find_all(lambda tag: tag.get('class') == ['product-item'])
这是第5页的结果在我的代码
我想把第5页的product-id作为这个
我想获得第5页的product-id,但不明白为什么我的代码仍然显示第一页的结果。
看来,包括广告在内,共有107种产品。下面是一种直接抓取API端点并获取所有产品的方法:
import requests
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
headers = {'accept': 'application/json, text/plain, */*',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
url = 'https://tiki.vn/api/personalish/v1/blocks/listings?limit=300&include=advertisement&aggregations=2&trackity_id=527749d7-0a68-f53e-54b5-fe2da48136f2&category=11232&page=1&sort=top_seller%3Fpage%3D5&urlKey=lam-sach-da-mat'
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data'])
print(df)
结果:
id sku name url_key url_path type author_name book_cover brand_name short_description price list_price badges badges_new discount discount_rate rating_average review_count order_count favourite_count thumbnail_url thumbnail_width thumbnail_height freegift_items has_ebook inventory_status is_visible productset_id productset_group_name seller is_flower is_gift_card inventory url_attendant_input_form option_color stock_item salable_type seller_product_id installment_info url_review bundle_deal video_url tiki_live original_price shippable impression_info availability quantity_sold.text quantity_sold.value advertisement.ad advertisement quantity_sold
0 33606848 9815250596996 Kem tẩy da chết làm trắng sáng và đều màu da Paula’s Choice RESIST Daily Smoothing Treatment With 5% AHA 50 ml - 7660 dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848 dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848.html?spid=33606849 None Paula's Choice 849000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'official_store', 'icon': 'https://salt.tikicdn.com/ts/upload/5d/4c/f7/0261315e75127c2ff73efd7a1f1ffdf2.png', 'icon_height': 14, 'icon_width': 68, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship+', 'type': 'under_rating_text'}, {'code': 'freegift_items', 'placement': 'under_rating', 'text': 'Quà tặng', 'type': 'under_rating_text'}, {'code': 'asa_reward_badge', 'icon': 'https://salt.tikicdn.com/ts/upload/d6/51/17/cde193f3d0f6da18147a739247c95c93.png', 'icon_height': 20, 'icon_width': 53, 'placement': 'bottom', 'type': 'asa_reward'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 144 ASA (48k ₫)<br/>≈ 5.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 100000 11 4.8 42 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/2e/43/38/ca1cbb77f9993e07db7ba3e107644d56.jpg 280 280 [] False available False 345 None False False None [] None 33606849 None False None None 949000 True [{'impression_id': 'thanos-product-VaFjRtzzGwSO09QA', 'metadata': {'price': 849000, 'rating_average': 4.8, 'reviews_count': 42, 'seller_product_id': 33606849}}, {'impression_id': '97c3dfe2-cf95-4161-94a3-529235d45ae1', 'metadata': {'advert_id': 3492748, 'business_id': 4769, 'flags': {'ad-2752': 5, 'p_cate': 8206, 'predictor': 'cb10', 'src': 'cat'}, 'product_id': 33606849, 'service_name': 'makesense', 'user_bucket': 570}}, {'impression_id': '99e7098f-6f87-4ee3-bd7a-7be637d2f402', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 207 207.0 [{'match_id': 0, 'advert_id': 0, 'business_id': 0, 'seller_id': 3946, 'clickUrl': '//tka.tiki.vn/pixel/pixel?data=djAwMUXcpZFK1GmJWEEH5wJs-tR0dD_QzrnUur3S0rkR7vFGb3Cx5LSWiqPQqZanWSaNuWGOindrBvik_aCq_9cjyQF8D906qQBOt-t08-ZoBweFhLNgyc5q11ZIVWlIHUQ19sfWQ8KQU_v-jzTMDXWv8osQqhXDUvwVkcKHHbvPz_q81AFAXFZp2IFbKnoZAYFibzaoW-UcoQiAnYZVtCWBvbQg2Qzx5TUVh6LJAgL0aMNk5tts6O8clarx2ICB8U95RnWeT6o8QjqNUl3NRakOED4nqSFEgtddT2Rci9Xqr-7vt_JEYULCGuKG2Oj7zqT-sAhWduFt3dkzhmsozBZvSURwk9vgVt1K4wvBf8wMX33iRyMCM1VIjd3PKGEV0QaEkQMGl_ulC_3fST17wZvrfdcFVqSPoGj98O63eir50lnrVNXYbpFlgDmIYMUqMs-rEkx_XvtSo76XIKoDmgn5GvLe2aewoNYvkI27vRCvV8Ufj7qhD9RAUXVFHv_DY5lVJRJ0j1vtnPYbnv8USOGUKu4RPRc93gxXukOuRxHq84a69M8zczLS25KVVfHnMmqbe3TZHvVg3zCtlc6tAiXiyJ0YhkxrhlRvEU32wpdX9Cv4M97rReqQJa7mZfFGxZ0rnWO58FSRO3Yt3xs_iAsMPHQ-0i8XTgFrh6BPaxS0xvM3EtgIjcjF63byLGz0NXcVj77whvoi2f9TFZMxy1O_Tte_Htf7TnrFNUCdo7xYVlIdymv3Jsfcy-YwW38uQ7Q9_4f9tcZGKF8BDuxaCPrKBZTp32HXOSd7zWsMX2wv3t0l4r4VjBaC2CZSSfvZNRlME4o0m-Q6YVNRb4Wk33DocSnphdBXztLhwpMaWSAFoErDNsZL9Qgqk4y-U8wb-UAV8BppXUKMpJkDwG8GxtGNUZ_PhZN4G1Jb4C-IdwLyeZxfwgcUV2LZ5k1D4WJ8lv707sAIHBCADaCxdRDJmjcX6-A7kDfpfT05W6tQzak5ElGkYC4YZxcr7TRW8EoJaV72glEkSBFdj1J5GNQqiajyh3XC88zQNSce-hI1keyZe-0qUJmZxcNEOUVoYZ7ifX-lz5jtWx_kbIYnZ_R3bvxf-FWyaNEOpXnh3s3iCluPd74Lmpea8AeLY2jGvqD77_cuvelXgbae4mP44E6uDCe_YOPF2ud3XgcssU_LoqxQrjj2x-nAYYy9tTIYPDsweEC89EKD0KciSFSB4UH5AQLzpTluPSFBGR9ki46I49xvOM3SS7fLaqcwyqWnnqMIDQ&CLICK&reqid=vFRZNdJMRr&pos=1&redirect=https%3A%2F%2Ftiki.vn%2Fdung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848.html%3Fspid%3D33606849', 'impUrl': '//tka.tiki.vn/pixel/pixel?data=djAwMR4kJOdkR6ThYEk7cciKp_1p0OpnJbH7SMwoWHj-Kjq2Cfp3Kr1qYggzrNrQE94SkWBR63dQO0Rj9g43Lhxrv8ggtrbKXBXQrAcmo32rEnvX_c4JiVgX_dPpAdubrRE_zagc7UdYpVqjJEiBQxE1_ioxavawDQ2SN12opCjx-yV3SuJOQyd9daAyxHl76CHW8acVYXE4wCHCeuLpW4YdhZwd4gXzMxPiyQyxGUVYLOJZBObM9md59_Ow96AddUaasyn0Yry3RUv5GZ_46O7u0eFENwZDlwEE2jrz6IYGHPOre4hWQmTtK1HSZWi6UmPDyy4qDjcw45nqIbs8hFmXBoCwCEa3oRLQ1sP8iVHJvYTN1eNQblQmPqWgVECEm3bd2kEMOd2H3qoVw__KqnBfD4G4avOZ5CnN-DFQTURnjUvqeyKJNuIAsvx8CLyZx25T6Ni6S1gL16_v4X3w3-0NbRHpqZrbJ1vFYZqdXba65MtDtsLz35yyWc3fo2iNTgvXMf5qkYGiAnMNoYaP6yv0YuvAozh0ekqDyS9qbEqDIwa7R87K4f_IDuKWwrhqfvC6gLAlPZk8M3vTOi1lxV15y5jI3WLsW5-sv0T7ypCmNnv56QMxfPMDJipD5ae9XFZWpBuQRoIHYOgASgWFTKrs85Escv0JjXkcLKkNoJM5KrPzNVxs5JeQv1qLvUXzyR0UYwnc3qVBlxIpbSoZOoK2bien691jxWED6osm94CVSVv_fw7yHW12fP9sh-Req2vGQvGq3D40ndF2ag1xpAdytgsJoHVxgoAPQS58TneDHE-GBdigYJSIjIKJ5c0Hxh7pvkajdnMHqvhxR44_Zo6LRgXvE_ZL048BwQOlz0gXJrzj8rVmwALMWC6N5R4_0bLjV4kUFEdg6LhdWC3Uu6CYsW5qyXcpfB2ampI2Q9Y3vOPgWKSa59e68kaUvZU_XHqFDotp2Kxc6fsQWH3f_e9z2NF6_jlLjN1N_xO-EODoEFEYGC2lxuCHo7h0qZLQO_TsIHo2eEfTqtEfHj7uWM1gHnlyf7mi2esr2Z2ow2Ul8NYYlSoMi_HHdynp-ogLWbva8_Zg-2h27X8&SHOW&reqid=vFRZNdJMRr&pos=1', 'trueImpUrl': '//tka.tiki.vn/pixel/pixel?data=djAwMcxZvpGC5q0kmKZdC0f7E7NTPgcQz3QIsrPU8HOdN3XcwRPLThqpL4IOS8wEXwiUOs06YO5ZboJEkr9lfAKAxN3tx0uDN278ihaJR75TE4VxdG2GeapxtNnUPEAY_92eVJfWBCwT2w4bGphxkBPuXbfRGo2viNSDHGPBz0EQbY94JAW8aYFV-O0_zl71Umd-6E5gd1-MlcnwMuV-VFQIKrjyTS1Udes_05EiM0KNl6glpvJPbN0dnL1SwCHHIWYZoxBNUfnfnS6aFqieWT-Zy_JirTZ3zWycjMX7gGaNkel3nUf4lQz4v7y6VHDkbfNtuKMJ5-OpzOF4A39-_ehIGtMohJjhx853kTRf3M5CO6_Fsw7LOKlKBPt6vioGblWSYiMF7fgO7qmrpv6Pv5DIaMEjJVE1qaGiKZmClN0xRLQS47bUFfrv7MTpAgCpj8YneFs0q7to23HiInMZpMSYBhFIXFIMk_RtA32f2pTeO399OG1_fV6J05VxACqfzfm_1zxWt1LSlUGME_Sb1F25uJP542ZPru8sgo6Q5Vy46A-zpRfk02MqXefuHEtw2cSRKxAzaK4yK7xyZiPK8cBarDgbPv8TuEpsT0bhu6x9lOB-fyvTZFAN8LS9uQ1nfeFn2icW0d472Q6w1TWtO5IC_fst1or5KHW9qKC_P4EDERumpobf7GAb-34ojWcAQtOWPHJpmqRGvLV0MR_ISNFwIkXmOW8Bgytfkuenox9_7Niqg9x3qEEdmA1Rr0R0KbEkiIkZRDS7ZsGeVIVjn1cTSaLbGwLNIVAIytKyNa6_BhYNvnjihOFjVweuIaNhvjmyWAMht-X9NkzNnWsQPNZy2ha3JABfCw4fkD4cOk8m16NH6avet0fG1cLGOYmKOneUWk7ihIWWRNFB0OoN2HQDsREfhQ2qEUWBrmDPHI-1Slt5MbcnS8cm0sF8mBu5LO4-RV-9i3e7VI0Ti7Cbcj5f6kD8QVkQSAijsSCDAwKA3oRPpjl7eeCu1qrmXCMW2VEBe0ftOFZSNg6PKQ-Pyvrxwkdpjf_yICGdOfGH_t0KNPi3DnKOlVrTKjg_osk&VIEW&reqid=vFRZNdJMRr&pos=1', 'properties': {'product_id': '33606849', 'matched_query': 0, 'image': '', 'url': ''}, 'type': 'ProductAdvertType', 'impression_info': [], 'image_ratio': 0}] NaN NaN
1 33606786 5722576750824 Lotion tẩy tế bào chết làm sáng da Paula’s Choice Skin Perfecting 8% AHA Lotion 100ml 2060 lotion-tay-te-bao-chet-lam-sang-da-paula-s-choice-skin-perfecting-8-aha-lotion-100ml-2060-p33606786 lotion-tay-te-bao-chet-lam-sang-da-paula-s-choice-skin-perfecting-8-aha-lotion-100ml-2060-p33606786.html?spid=66640131 None Paula's Choice 663000 0 [] [{'code': 'tikinow', 'icon': 'https://salt.tikicdn.com/ts/upload/3f/76/87/4c636b7bea11521f46f733b7839df4de.png', 'icon_height': 16, 'icon_width': 32, 'placement': 'delivery_info', 'text': 'Giao siêu tốc 2H', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship+', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 73 ASA (24k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0 0 4.7 20 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/2f/6d/8d/081edbe77b16439c4fa0b18263cbede7.jpg 280 280 [] False available False 345 None False False None [] None 66640131 None False None None 663000 True [{'impression_id': 'thanos-product-QvglqBASVoDvOhFS', 'metadata': {'price': 663000, 'rating_average': 4.7, 'reviews_count': 20, 'seller_product_id': 66640131}}, {'impression_id': '75f5fa8b-3b32-4280-86a4-141778a1cb1f', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 36 36.0 NaN NaN NaN
2 11239286 9792297299199 Gel tẩy da chết Arrahan Lemon White Peeling Gel (180ml) gel-tay-da-chet-arrahan-lemon-white-peeling-gel-180ml-p11239286 gel-tay-da-chet-arrahan-lemon-white-peeling-gel-180ml-p11239286.html?spid=20116852 None Arrahan 61900 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship+', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 7 ASA (2k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0 0 4.7 77 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/93/cb/da/afd6b13fe3654bf4351b260b801c41e3.jpg 280 280 [] False available False 345 None False False None [] None 20116852 None False None None 61900 True [{'impression_id': 'thanos-product-UDr0lE1YpujdRftZ', 'metadata': {'price': 61900, 'rating_average': 4.7, 'reviews_count': 77, 'seller_product_id': 20116852}}, {'impression_id': 'e81a5e80-1c98-4d19-abdc-a60250309814', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 539 539.0 NaN NaN NaN
3 33606848 8573828662870 Kem tẩy da chết làm trắng sáng và đều màu da Paula’s Choice RESIST Daily Smoothing Treatment With 5% AHA 50 ml - 7660 dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848 dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848.html?spid=66638723 None Paula's Choice 529000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship+', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 58 ASA (19k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0 0 4.7 7 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/f8/10/ef/714f6b435ade504ce920caeff4ace16f.jpg 280 280 [] False available False 345 None False False None [] None 66638723 None False None None 529000 True [{'impression_id': 'thanos-product-zTwvu1Q7UONamIJN', 'metadata': {'price': 529000, 'rating_average': 4.7, 'reviews_count': 7, 'seller_product_id': 66638723}}, {'impression_id': '95894dfd-049c-41e4-848a-e179e8a5c03b', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 27 27.0 NaN NaN NaN
4 20525156 3751926198377 Tẩy Tế Bào Chết 3W Clinic Collagen Crystal Peeling Gel 180ml tay-te-bao-chet-3w-clinic-collagen-crystal-peeling-gel-180ml-p20525156 tay-te-bao-chet-3w-clinic-collagen-crystal-peeling-gel-180ml-p20525156.html?spid=20525157 None 3W Clinic 119000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 3 ASA (981 ₫)<br/>≈ 0.8% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 31000 21 4.7 12 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/9a/01/46/71fa72df01b8addc69770f67b3bcedab.jpg 280 280 [] False available False 345 None False False None [] None 20525157 None False None None 150000 True [{'impression_id': 'thanos-product-avJJKrffpf79iq8i', 'metadata': {'price': 119000, 'rating_average': 4.7, 'reviews_count': 12, 'seller_product_id': 20525157}}, {'impression_id': 'c9e9e842-8065-4d9e-8269-0d3b466e311b', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 51 51.0 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
102 4701145 6917512701766 Kem Tẩy tế bào chết cho mặt Byphasse Exfoliant Face Scrub Dành cho mọi loại da kem-tay-te-bao-chet-cho-mat-byphasse-exfoliant-face-scrub-danh-cho-moi-loai-da-p4701145 kem-tay-te-bao-chet-cho-mat-byphasse-exfoliant-face-scrub-danh-cho-moi-loai-da-p4701145.html?spid=27924960 None Byphasse 119000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship+', 'type': 'under_rating_text'}, {'code': 'asa_reward_badge', 'icon': 'https://salt.tikicdn.com/ts/upload/d6/51/17/cde193f3d0f6da18147a739247c95c93.png', 'icon_height': 20, 'icon_width': 53, 'placement': 'bottom', 'type': 'asa_reward'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 20 ASA (7k ₫)<br/>≈ 5.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0 0 4.0 8 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/a1/7c/77/7acfba66ad481b870be5fdb1d10a4662.jpg 280 280 [] False available False 345 None False False None [] None 27924960 None False None None 119000 True [{'impression_id': 'thanos-product-Khf0Kz3w7kEtxJ1U', 'metadata': {'price': 119000, 'rating_average': 4, 'reviews_count': 8, 'seller_product_id': 27924960}}, {'impression_id': 'cabf20ec-d814-48ba-b000-7126ed1a22d5', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 49 49.0 NaN NaN NaN
103 38465349 8446201287222 Gel Tẩy Tế Bào Chết Keana Baking Soda Moist Peeling (120G) - HÀNG CHÍNH HÃNG gel-tay-te-bao-chet-keana-baking-soda-moist-peeling-120g-hang-chinh-hang-p38465349 gel-tay-te-bao-chet-keana-baking-soda-moist-peeling-120g-hang-chinh-hang-p38465349.html?spid=38465350 None Keana 421200 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship+', 'type': 'under_rating_text'}, {'code': 'asa_reward_badge', 'icon': 'https://salt.tikicdn.com/ts/upload/d6/51/17/cde193f3d0f6da18147a739247c95c93.png', 'icon_height': 20, 'icon_width': 53, 'placement': 'bottom', 'type': 'asa_reward'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 72 ASA (24k ₫)<br/>≈ 5.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 118800 22 4.5 2 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/7e/a0/59/8dd959d52b59306d83523204062ad713.jpg 280 280 [] False available False 345 None False False None [] None 38465350 None False None None 540000 True [{'impression_id': 'thanos-product-Rcuvm8ucH1kfYALI', 'metadata': {'price': 421200, 'rating_average': 4.5, 'reviews_count': 2, 'seller_product_id': 38465350}}, {'impression_id': '3281c96d-3a6c-4b26-838b-4b50e9f9a618', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 3 3.0 NaN NaN NaN
104 15213464 4407455438680 Tẩy bào chết Belif Mild And Effective Facial Scrub 100ml tay-bao-chet-belif-mild-and-effective-facial-scrub-100ml-p15213464 tay-bao-chet-belif-mild-and-effective-facial-scrub-100ml-p15213464.html?spid=76083479 None Belif 630000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 16 ASA (5k ₫)<br/>≈ 0.8% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0 0 0.0 0 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/2a/cf/b8/fb265c0ce6944bbb6aa822eca1642be3.png 280 280 [] False available False 345 None False False None [] None 76083479 None False None None 630000 True [{'impression_id': 'thanos-product-v8EPXz3gtAHsbWNz', 'metadata': {'price': 630000, 'rating_average': 0, 'reviews_count': 0, 'seller_product_id': 76083479}}, {'impression_id': 'f801f336-5090-46f9-ba75-41a7e22a0dc2', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 1 1.0 NaN NaN NaN
105 51088975 9244203860400 Dấm táo The Inkey List Apple Cider Vinegar Acid Peel 30ml dam-tao-the-inkey-list-apper-cider-vinegar-acid-peel-30ml-p51088975 dam-tao-the-inkey-list-apper-cider-vinegar-acid-peel-30ml-p51088975.html?spid=51088976 None The Inkey List 589000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship+', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 65 ASA (21k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0 0 5.0 1 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/87/6f/86/0bae14bd8ebd26ae57a95f8bb47de9da.png 280 280 [] False available False 345 None False False None [] None 51088976 None False None None 589000 True [{'impression_id': 'thanos-product-RKuBQgdldtq0QZut', 'metadata': {'price': 589000, 'rating_average': 5, 'reviews_count': 1, 'seller_product_id': 51088976}}, {'impression_id': '4a762b28-db43-4da1-a039-6bf01d078413', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 3 3.0 NaN NaN NaN
106 24408456 5696255831404 Gel Giúp Loại Bỏ Tế Bào Chết IASO gel-giup-loai-bo-te-bao-chet-p24408456 gel-giup-loai-bo-te-bao-chet-p24408456.html?spid=24408458 None IASO 441000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship+', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 49 ASA (16k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 49000 10 0.0 0 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/78/29/34/b52258f69bfe3349bfe9c55a7dd9095c.jpg 280 280 [] False available False 345 None False False None [] None 24408458 None False None None 490000 True [{'impression_id': 'thanos-product-mSLaaOhB4aLTtZa7', 'metadata': {'price': 441000, 'rating_average': 0, 'reviews_count': 0, 'seller_product_id': 24408458}}, {'impression_id': '4fd674dd-ee1d-459a-9c0d-1cf00d99662b', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 NaN NaN NaN NaN NaN
顺便说一句,你可以用soup.select_one('li > a.current[data-view-id="product_list_pagination_item"][data-view-label]').get('data-view-label')
.
来检查html的实际页码。:无论您使用哪个页面的链接,它总是首先加载的第一个页面,然后页面被动态更新(使用JavaScript和api)。你可以在devtools的网络选项卡中看到这一点,你可能需要在打开后刷新页面,并确保"保留日志";选项是而不是选中]并点击[名称]在日志中的第一个请求[它应该结束相同的链接在地址栏];html中的"响应";是requests.get
获取的——你可能会注意到这个html是第一页的。
如果您在日志中滚动其他请求,您应该可以找到一个https://tiki.vn/api/personalish/v1/blocks/listings?limit=40&include=advertisement&aggregations=2&trackity_id=3dddf2b8-1eb2-e891-0cdf-c23b37663c28&category=11232&page=5&sort=top_seller%3Fpage%3D5&urlKey=lam-sach-da-mat
和产品可能是从这里加载的。除trackity_id
外,所有参数都需要固定,或者可以在页面url中找到。;如果你查看请求启动器链,你可以看到哪个JavaScript文件发出请求,你可以尝试弄清楚trackity_id
是如何生成的;但就我个人而言,我发现使用硒更容易。
建议方案1:看起来你实际上可以只使用我们已经知道的参数(category
,urlKey
,sort
)的API:
# import cloudscraper
r = cloudscraper.create_scraper().get('https://tiki.vn/api/personalish/v1/blocks/listings?limit=300&category=11232&sort=top_seller%3Fpage%3D5&urlKey=lam-sach-da-mat')
productList = r.json()['data']
print('### [{id}_{sku}: {name}] for first 10 products of', f'{len(productList)} ###n')
for p in productList[:10]: print(f"{p['id']}_{p['sku']}: {p['name']}")
(我使用cloudscraper
,因为我不太熟悉urlopen
,也我不擅长设置正确的标题与requests
,以避免403错误....)这个打印
### [{id}_{sku}: {name}] for first 10 products of 100 ###
33606786_5722576750824: Lotion tẩy tế bào chết làm sáng da Paula’s Choice Skin Perfecting 8% AHA Lotion 100ml 2060
11239286_9792297299199: Gel tẩy da chết Arrahan Lemon White Peeling Gel (180ml)
33606848_8573828662870: Kem tẩy da chết làm trắng sáng và đều màu da Paula’s Choice RESIST Daily Smoothing Treatment With 5% AHA 50 ml - 7660
20525156_3751926198377: Tẩy Tế Bào Chết 3W Clinic Collagen Crystal Peeling Gel 180ml
67089667_9204550497315: Combo 2 chai tiện lợi - Natureine AQUA PEEL Moisture Peeling Gel - Gel tẩy tế bào da chết, cấp ẩm Nhật Bản - Chính Hãng
21481823_9335684703529: Gel tẩy tế bào chết sáng da hồng sâm Hàn Quốc My Gold Korea Red Ginseng Peeling Gel (130ml) – Hàng Chính Hãng
46203526_8584500833846: Bông Tẩy Da Chết Cosrx One-Step Original Clear Pad 70 Sheets (New 2019)
1941543_2999847759227: Kem tẩy tế bào chết mặt Organic Shop Organic Coffee & Powder 75ml
57783000_9733773668061: Natureine AQUA PEEL Moisture Peeling Gel - Gel tẩy tế bào da chết, cấp ẩm Nhật Bản - Chính Hãng
7319657_7325473003642: Trial Tinh chất dành cho da mụn cao cấp Resist BHA 9 0.83 ml
然而,我觉得应该有超过100个产品-硒分页(下图)表明应该有177个产品。
方案二:您可以使用我编写的用于获取和解析html(使用selenium + bs4)的函数循环遍历页面
maxPages = 10 # or as you prefer
nextUrl = 'https://tiki.vn/lam-sach-da-mat/c11232?sort=top_seller'
pgi_sel = 'data-view-id="product_list_pagination_item"'
for pn in range(1, maxPages+1):
curPage_xpath = f'//li/a[@class="current"][@{pgi_sel}][@data-view-label="{pn}"]'
soup = linkToSoup_selenium(nextUrl, ecx=curPage_xpath)
if soup is None or type(soup) == str: break
###################### EXTRACT DATA ######################
# this is just printing the page# and 1st five IDs, but you can extract whatever you need from soup at this point
curPg = soup.select_one(f'li > a.current[{pgi_sel}][data-view-label]')
curPg = f'page {curPg.get("data-view-label")}' if curPg else '!! page ERROR !!'
pageProds = soup.select('a.product-item[href*=".html?spid="]')
curPg += f" [{len(pageProds)} products]:"
first5ids = [a.get('href').split('.html?spid=')[-1] for a in pageProds][:5]
print(f'{curPg:>22} ', " ".join([f'{i:>10}' for i in first5ids]), '...')
##########################################################
nxtPg = soup.select_one(f'li > a[{pgi_sel}][href]:has(img[alt="arrow-right"])')
if nxtPg is None or 'disabled' in nxtPg.get('class', ''): break
nextUrl = nxtPg.get('href')
和打印的
page 1 [40 products]: 66640131 20116852 66638723 20525157 67089668 ...
page 2 [40 products]: 63465592 20911921 54388844 58555745 13385021 ...
page 3 [40 products]: 1515345 57703788 1060978 54929902 2076819 ...
page 4 [40 products]: 35737314 26299382 7029351 14970693 32139853 ...
page 5 [11 products]: 52274203 51988147 50422842 36828505 45439018 ...
(如果你不想限制到maxPages
,你可以只使用while True
代替for pn in range(maxPages)
,但你还需要使用计数器或其他东西来获得ecx
的pn
,因为这是告诉函数等待,直到那部分html被加载。)