May 10

scrapy 爬蟲問題

0 comments

我正在用scrapy練習爬蟲 遇到了這一個問題 由於我剛接觸python不久 尚不了解 所以卡了很久 請問有專人能替我解答一下這個問題嗎

 

這是我練習的一個範例

import scrapy

from bs4 import BeautifulSoup

 

class AppleCrawler(scrapy.Spider):

name = 'apple'

start_urls = ["http://www.appledaily.com.tw/realtimenews/section/new/"]

def parse(self, response):

domain = ['http://www.appledaily.com.tw']

res = BeautifulSoup(response.body)

for news in res.select('.rtddt'):

yield scrapy.Request(domain + news.select('a')[0]['href'], self.parse_detail)

 

def parse_detail(self, response):

res = BeautifulSoup(response.body)

print(res.select('#h1')[0].text)

 

錯誤代碼 

Traceback (most recent call last):

File "c:\users\labpc-1\anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback

yield next(it)

File "c:\users\labpc-1\anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output

for x in result:

File "c:\users\labpc-1\anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in <genexpr>

return (_set_referer(r) for r in result or ())

File "c:\users\labpc-1\anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in <genexpr>

return (r for r in result or () if _filter(r))

File "c:\users\labpc-1\anaconda3\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in <genexpr>

return (r for r in result or () if _filter(r))

File "C:\Users\labpc-1\apple\apple\spiders\crawler.py", line 11, in parse

yield scrapy.Request(domain + news.select('a')[0]['href'], self.parse_detail)

 

我目前能貼的就這些 如果還需要哪一段程式碼的話請告訴我 我再貼出來 感謝各位

 

New Posts
 ​聯絡方式

台灣, ​台北

FB : Taiwan Code School

 關於
政策
 常見問題
  • Facebook - Grey Circle
  • Instagram - Grey Circle
Beta Test Version 3.2
Copyright © Taiwan Code School