Scrapy
初窥Scrapy
选择一个网站
定义您想抓取的数据
编写提取数据的Spider
执行spider,获取数据
查看提取到的数据
还有什么?
接下来
安装指南
安装Scrapy
平台安装指南
Scrapy入门教程
创建项目
定义Item
编写第一个爬虫(Spider)
保存爬取到的数据
下一步
例子
命令行工具(Command line tools)
默认的Scrapy项目结构
使用
scrapy
工具
可用的工具命令(tool commands)
自定义项目命令
Items
声明Item
Item字段(Item Fields)
与Item配合
扩展Item
Item对象
字段(Field)对象
Spiders
Spider参数
内置Spider参考手册
选择器(Selectors)
使用选择器(selectors)
内建选择器的参考
Item Loaders
Using Item Loaders to populate items
Input and Output processors
Declaring Item Loaders
Declaring Input and Output Processors
Item Loader Context
ItemLoader objects
Reusing and extending Item Loaders
Available built-in processors
Scrapy终端(Scrapy shell)
启动终端
使用终端
终端会话(shell session)样例
在spider中启动shell来查看response
Item Pipeline
编写你自己的item pipeline
Item pipeline 样例
启用一个Item Pipeline组件
Feed exports
序列化方式(Serialization formats)
存储(Storages)
存储URI参数
存储端(Storage backends)
设定(Settings)
Link Extractors
内置Link Extractor 参考
Logging
Log levels
如何设置log级别
如何记录信息(log messages)
在Spider中添加log(Logging from Spiders)
scrapy.log模块
Logging设置
数据收集(Stats Collection)
常见数据收集器使用方法
可用的数据收集器
发送email
简单例子
MailSender类参考手册
Mail设置
Telnet终端(Telnet Console)
如何访问telnet终端
telnet终端中可用的变量
Telnet console usage examples
Telnet终端信号
Telnet设定
Web Service
常见问题(FAQ)
Scrapy相BeautifulSoup或lxml比较,如何呢?
Scrapy支持那些Python版本?
Scrapy支持Python 3么?
Scrapy是否从Django中”剽窃”了X呢?
Scrapy支持HTTP代理么?
如何爬取属性在不同页面的item呢?
Scrapy退出,ImportError: Nomodule named win32api
我要如何在spider里模拟用户登录呢?
Scrapy是以广度优先还是深度优先进行爬取的呢?
我的Scrapy爬虫有内存泄露,怎么办?
如何让Scrapy减少内存消耗?
我能在spider中使用基本HTTP认证么?
为什么Scrapy下载了英文的页面,而不是我的本国语言?
我能在哪里找到Scrapy项目的例子?
我能在不创建Scrapy项目的情况下运行一个爬虫(spider)么?
我收到了 “Filtered offsite request” 消息。如何修复?
发布Scrapy爬虫到生产环境的推荐方式?
我能对大数据(large exports)使用JSON么?
我能在信号处理器(signal handler)中返回(Twisted)引用么?
reponse返回的状态值999代表了什么?
我能在spider中调用
pdb.set_trace()
来调试么?
将所有爬取到的item转存(dump)到JSON/CSV/XML文件的最简单的方法?
在某些表单中巨大神秘的
__VIEWSTATE
参数是什么?
分析大XML/CSV数据源的最好方法是?
Scrapy自动管理cookies么?
如何才能看到Scrapy发出及接收到的Cookies呢?
要怎么停止爬虫呢?
如何避免我的Scrapy机器人(bot)被禁止(ban)呢?
我应该使用spider参数(arguments)还是设置(settings)来配置spider呢?
我爬取了一个XML文档但是XPath选择器不返回任何的item
调试(Debugging)Spiders
Parse命令
Scrapy终端(Shell)
在浏览器中打开
Logging
Spiders Contracts
自定义Contracts
实践经验(Common Practices)
在脚本中运行Scrapy
同一进程运行多个spider
分布式爬虫(Distributed crawls)
避免被禁止(ban)
动态创建Item类
通用爬虫(Broad Crawls)
增加并发
降低log级别
禁止cookies
禁止重试
减小下载超时
禁止重定向
启用 “Ajax Crawlable Pages” 爬取
借助Firefox来爬取
在浏览器中检查DOM的注意事项
对爬取有帮助的实用Firefox插件
使用Firebug进行爬取
介绍
获取到跟进(follow)的链接
提取数据
调试内存溢出
内存泄露的常见原因
使用
trackref
调试内存泄露
使用Guppy调试内存泄露
Leaks without leaks
下载项目图片
使用图片管道
使用样例
开启你的图片管道
图片存储
额外的特性
实现定制图片管道
定制图片管道的例子
Ubuntu 软件包
Scrapyd
自动限速(AutoThrottle)扩展
设计目标
扩展是如何实现的
限速算法
设置
Benchmarking
Jobs: 暂停,恢复爬虫
Job 路径
怎么使用
保持状态
持久化的一些坑
DjangoItem
使用DjangoItem
DjangoItem注意事项
配置Django的设置
架构概览
概述
组件
数据流(Data flow)
事件驱动网络(Event-driven networking)
下载器中间件(Downloader Middleware)
激活下载器中间件
编写您自己的下载器中间件
内置下载中间件参考手册
Spider中间件(Middleware)
激活spider中间件
编写您自己的spider中间件
内置spider中间件参考手册
扩展(Extensions)
扩展设置(Extension settings)
加载和激活扩展
可用的(Available)、开启的(enabled)和禁用的(disabled)的扩展
禁用扩展(Disabling an extension)
实现你的扩展
内置扩展介绍
核心API
Crawler API
设置(Settings) API
SpiderManager API
信号(Signals) API
状态收集器(Stats Collector) API
Requests and Responses
Request对象
Request.meta special keys
Request subclasses
Response objects
Response subclasses
Settings
指定设定(Designating the settings)
获取设定值(Populating the settings)
如何访问设定(How to access settings)
设定名字的命名规则
内置设定参考手册
信号(Signals)
延迟的信号处理器(Deferred signal handlers)
内置信号参考手册(Built-in signals reference)
异常(Exceptions)
内置异常参考手册(Built-in Exceptions reference)
Item Exporters
使用 Item Exporter
Item Exporters 参考资料
Release notes
0.24.4 (2014-08-09)
0.24.3 (2014-08-09)
0.24.2 (2014-07-08)
0.24.1 (2014-06-27)
0.24.0 (2014-06-26)
0.22.2 (released 2014-02-14)
0.22.1 (released 2014-02-08)
0.22.0 (released 2014-01-17)
0.20.2 (released 2013-12-09)
0.20.1 (released 2013-11-28)
0.20.0 (released 2013-11-08)
0.18.4 (released 2013-10-10)
0.18.3 (released 2013-10-03)
0.18.2 (released 2013-09-03)
0.18.1 (released 2013-08-27)
0.18.0 (released 2013-08-09)
0.16.5 (released 2013-05-30)
0.16.4 (released 2013-01-23)
0.16.3 (released 2012-12-07)
0.16.2 (released 2012-11-09)
0.16.1 (released 2012-10-26)
0.16.0 (released 2012-10-18)
0.14.4
0.14.3
0.14.2
0.14.1
0.14
0.12
0.10
0.9
0.8
0.7
Contributing to Scrapy
Reporting bugs
Writing patches
Submitting patches
Coding style
Scrapy Contrib
Documentation policies
Tests
Versioning and API Stability
Versioning
API Stability
试验阶段特性
使用外部库插入命令
Scrapy
Docs
»
返回在线手册首页
索引
_
|
A
|
B
|
C
|
D
|
E
|
F
|
G
|
H
|
I
|
J
|
L
|
M
|
N
|
O
|
P
|
R
|
S
|
T
|
U
|
V
|
W
|
X
_
__nonzero__() (scrapy.selector.Selector 方法)
(scrapy.selector.SelectorList 方法)
A
adapt_response() (scrapy.contrib.spiders.XMLFeedSpider 方法)
add_css() (scrapy.contrib.loader.ItemLoader 方法)
add_value() (scrapy.contrib.loader.ItemLoader 方法)
add_xpath() (scrapy.contrib.loader.ItemLoader 方法)
adjust_request_args() (scrapy.contracts.Contract 方法)
AJAXCRAWL_ENABLED
setting
AjaxCrawlMiddleware (scrapy.contrib.downloadermiddleware.ajaxcrawl 中的类)
allowed_domains (scrapy.spider.Spider 属性)
AUTOTHROTTLE_DEBUG
setting
AUTOTHROTTLE_ENABLED
setting
AUTOTHROTTLE_MAX_DELAY
setting
AUTOTHROTTLE_START_DELAY
setting
AWS_ACCESS_KEY_ID
setting
AWS_SECRET_ACCESS_KEY
setting
B
BaseItemExporter (scrapy.contrib.exporter 中的类)
bench
command
bindaddress
reqmeta
body (scrapy.http.Request 属性)
(scrapy.http.Response 属性)
body_as_unicode() (scrapy.http.TextResponse 方法)
BOT_NAME
setting
C
check
command
ChunkedTransferMiddleware (scrapy.contrib.downloadermiddleware.chunked 中的类)
clear_stats() (scrapy.statscol.StatsCollector 方法)
close_spider()
(scrapy.statscol.StatsCollector 方法)
closed() (scrapy.spider.Spider 方法)
CloseSpider
CLOSESPIDER_ERRORCOUNT
setting
CLOSESPIDER_ITEMCOUNT
setting
CLOSESPIDER_PAGECOUNT
setting
CLOSESPIDER_TIMEOUT
setting
command
bench
check
crawl
deploy
edit
fetch
genspider
list
parse
runspider
settings
shell
startproject
version
view
COMMANDS_MODULE
setting
Compose (scrapy.contrib.loader.processor 中的类)
COMPRESSION_ENABLED
setting
CONCURRENT_ITEMS
setting
CONCURRENT_REQUESTS
setting
CONCURRENT_REQUESTS_PER_DOMAIN
setting
CONCURRENT_REQUESTS_PER_IP
setting
connect() (scrapy.signalmanager.SignalManager 方法)
context (scrapy.contrib.loader.ItemLoader 属性)
Contract (scrapy.contracts 中的类)
cookiejar
reqmeta
COOKIES_DEBUG
setting
COOKIES_ENABLED
setting
CookiesMiddleware (scrapy.contrib.downloadermiddleware.cookies 中的类)
copy() (scrapy.http.Request 方法)
(scrapy.http.Response 方法)
(scrapy.settings.Settings 方法)
CoreStats (scrapy.contrib.corestats 中的类)
crawl
command
crawl() (scrapy.crawler.Crawler 方法)
(scrapy.crawler.CrawlerRunner 方法)
crawl_deferreds (scrapy.crawler.CrawlerRunner 属性)
Crawler (scrapy.crawler 中的类)
crawler (scrapy.spider.Spider 属性)
CrawlerRunner (scrapy.crawler 中的类)
crawlers (scrapy.crawler.CrawlerRunner 属性)
CrawlSpider (scrapy.contrib.spiders 中的类)
CRITICAL() (在 scrapy.log 模块中)
css() (scrapy.http.TextResponse 方法)
(scrapy.selector.Selector 方法)
(scrapy.selector.SelectorList 方法)
CSVFeedSpider (scrapy.contrib.spiders 中的类)
CsvItemExporter (scrapy.contrib.exporter 中的类)
D
DEBUG() (在 scrapy.log 模块中)
default_input_processor (scrapy.contrib.loader.ItemLoader 属性)
DEFAULT_ITEM_CLASS
setting
default_item_class (scrapy.contrib.loader.ItemLoader 属性)
default_output_processor (scrapy.contrib.loader.ItemLoader 属性)
DEFAULT_REQUEST_HEADERS
setting
default_selector_class (scrapy.contrib.loader.ItemLoader 属性)
DefaultHeadersMiddleware (scrapy.contrib.downloadermiddleware.defaultheaders 中的类)
delimiter (scrapy.contrib.spiders.CSVFeedSpider 属性)
deploy
command
DEPTH_LIMIT
setting
DEPTH_PRIORITY
setting
DEPTH_STATS
setting
DEPTH_STATS_VERBOSE
setting
DepthMiddleware (scrapy.contrib.spidermiddleware.depth 中的类)
disconnect() (scrapy.signalmanager.SignalManager 方法)
disconnect_all() (scrapy.signalmanager.SignalManager 方法)
DNSCACHE_ENABLED
setting
dont_redirect
reqmeta
dont_retry
reqmeta
DOWNLOAD_DELAY
setting
DOWNLOAD_HANDLERS
setting
DOWNLOAD_HANDLERS_BASE
setting
DOWNLOAD_TIMEOUT
setting
DOWNLOADER
setting
DOWNLOADER_MIDDLEWARES
setting
DOWNLOADER_MIDDLEWARES_BASE
setting
DOWNLOADER_STATS
setting
DownloaderMiddleware (scrapy.contrib.downloadermiddleware 中的类)
DownloaderStats (scrapy.contrib.downloadermiddleware.stats 中的类)
DownloadTimeoutMiddleware (scrapy.contrib.downloadermiddleware.downloadtimeout 中的类)
DropItem
DummyStatsCollector (scrapy.statscol 中的类)
DUPEFILTER_CLASS
setting
DUPEFILTER_DEBUG
setting
E
edit
command
EDITOR
setting
encoding (scrapy.contrib.exporter.BaseItemExporter 属性)
(scrapy.http.TextResponse 属性)
engine (scrapy.crawler.Crawler 属性)
engine_started
signal
engine_started() (在 scrapy.signals 模块中)
engine_stopped
signal
engine_stopped() (在 scrapy.signals 模块中)
ERROR() (在 scrapy.log 模块中)
export_empty_fields (scrapy.contrib.exporter.BaseItemExporter 属性)
export_item() (scrapy.contrib.exporter.BaseItemExporter 方法)
EXTENSIONS
setting
extensions (scrapy.crawler.Crawler 属性)
EXTENSIONS_BASE
setting
extract() (scrapy.selector.Selector 方法)
(scrapy.selector.SelectorList 方法)
F
FEED_EXPORTERS
setting
FEED_EXPORTERS_BASE
setting
FEED_FORMAT
setting
FEED_STORAGES
setting
FEED_STORAGES_BASE
setting
FEED_STORE_EMPTY
setting
FEED_URI
setting
fetch
command
Field (scrapy.item 中的类)
fields (scrapy.item.Item 属性)
fields_to_export (scrapy.contrib.exporter.BaseItemExporter 属性)
find_by_request() (scrapy.spidermanager.SpiderManager 方法)
finish_exporting() (scrapy.contrib.exporter.BaseItemExporter 方法)
flags (scrapy.http.Response 属性)
FormRequest (scrapy.http 中的类)
freeze() (scrapy.settings.Settings 方法)
from_crawler() (scrapy.spider.Spider 方法)
from_response() (scrapy.http.FormRequest 类方法)
from_settings() (scrapy.mail.MailSender 类方法)
(scrapy.spidermanager.SpiderManager 方法)
frozencopy() (scrapy.settings.Settings 方法)
G
genspider
command
get() (scrapy.settings.Settings 方法)
get_collected_values() (scrapy.contrib.loader.ItemLoader 方法)
get_css() (scrapy.contrib.loader.ItemLoader 方法)
get_input_processor() (scrapy.contrib.loader.ItemLoader 方法)
get_media_requests() (scrapy.contrib.pipeline.images.ImagesPipeline 方法)
get_oldest() (在 scrapy.utils.trackref 模块中)
get_output_processor() (scrapy.contrib.loader.ItemLoader 方法)
get_output_value() (scrapy.contrib.loader.ItemLoader 方法)
get_stats() (scrapy.statscol.StatsCollector 方法)
get_value() (scrapy.contrib.loader.ItemLoader 方法)
(scrapy.statscol.StatsCollector 方法)
get_xpath() (scrapy.contrib.loader.ItemLoader 方法)
getbool() (scrapy.settings.Settings 方法)
getdict() (scrapy.settings.Settings 方法)
getfloat() (scrapy.settings.Settings 方法)
getint() (scrapy.settings.Settings 方法)
getlist() (scrapy.settings.Settings 方法)
H
handle_httpstatus_list
reqmeta
headers (scrapy.contrib.spiders.CSVFeedSpider 属性)
(scrapy.http.Request 属性)
(scrapy.http.Response 属性)
HtmlResponse (scrapy.http 中的类)
HttpAuthMiddleware (scrapy.contrib.downloadermiddleware.httpauth 中的类)
HTTPCACHE_DBM_MODULE
setting
HTTPCACHE_DIR
setting
HTTPCACHE_ENABLED
setting
HTTPCACHE_EXPIRATION_SECS
setting
HTTPCACHE_IGNORE_HTTP_CODES
setting
HTTPCACHE_IGNORE_MISSING
setting
HTTPCACHE_IGNORE_SCHEMES
setting
HTTPCACHE_POLICY
setting
HTTPCACHE_STORAGE
setting
HttpCacheMiddleware (scrapy.contrib.downloadermiddleware.httpcache 中的类)
HttpCompressionMiddleware (scrapy.contrib.downloadermiddleware.httpcompression 中的类)
HTTPERROR_ALLOW_ALL
setting
HTTPERROR_ALLOWED_CODES
setting
HttpErrorMiddleware (scrapy.contrib.spidermiddleware.httperror 中的类)
HttpProxyMiddleware (scrapy.contrib.downloadermiddleware.httpproxy 中的类)
I
Identity (scrapy.contrib.loader.processor 中的类)
IgnoreRequest
IMAGES_EXPIRES
setting
IMAGES_MIN_HEIGHT
setting
IMAGES_MIN_WIDTH
setting
IMAGES_STORE
setting
IMAGES_THUMBS
setting
ImagesPipeline (scrapy.contrib.pipeline.images 中的类)
inc_value() (scrapy.statscol.StatsCollector 方法)
INFO() (在 scrapy.log 模块中)
item (scrapy.contrib.loader.ItemLoader 属性)
Item (scrapy.item 中的类)
item_completed() (scrapy.contrib.pipeline.images.ImagesPipeline 方法)
item_dropped
signal
item_dropped() (在 scrapy.signals 模块中)
ITEM_PIPELINES
setting
ITEM_PIPELINES_BASE
setting
item_scraped
signal
item_scraped() (在 scrapy.signals 模块中)
ItemLoader (scrapy.contrib.loader 中的类)
iter_all() (在 scrapy.utils.trackref 模块中)
iterator (scrapy.contrib.spiders.XMLFeedSpider 属性)
itertag (scrapy.contrib.spiders.XMLFeedSpider 属性)
J
Join (scrapy.contrib.loader.processor 中的类)
JsonItemExporter (scrapy.contrib.exporter 中的类)
JsonLinesItemExporter (scrapy.contrib.exporter 中的类)
L
list
command
list() (scrapy.spidermanager.SpiderManager 方法)
load() (scrapy.spidermanager.SpiderManager 方法)
load_item() (scrapy.contrib.loader.ItemLoader 方法)
log() (scrapy.spider.Spider 方法)
LOG_ENABLED
setting
LOG_ENCODING
setting
LOG_FILE
setting
LOG_LEVEL
setting
LOG_STDOUT
setting
LogStats (scrapy.contrib.logstats 中的类)
LxmlLinkExtractor (scrapy.contrib.linkextractors.lxmlhtml 中的类)
M
MAIL_FROM
setting
MAIL_HOST
setting
MAIL_PASS
setting
MAIL_PORT
setting
MAIL_SSL
setting
MAIL_TLS
setting
MAIL_USER
setting
MailSender (scrapy.mail 中的类)
make_requests_from_url() (scrapy.spider.Spider 方法)
MapCompose (scrapy.contrib.loader.processor 中的类)
max_value() (scrapy.statscol.StatsCollector 方法)
MEMDEBUG_ENABLED
setting
MEMDEBUG_NOTIFY
setting
MemoryStatsCollector (scrapy.statscol 中的类)
MEMUSAGE_ENABLED
setting
MEMUSAGE_LIMIT_MB
setting
MEMUSAGE_NOTIFY_MAIL
setting
MEMUSAGE_REPORT
setting
MEMUSAGE_WARNING_MB
setting
meta (scrapy.http.Request 属性)
(scrapy.http.Response 属性)
METAREFRESH_ENABLED
setting
MetaRefreshMiddleware (scrapy.contrib.downloadermiddleware.redirect 中的类)
method (scrapy.http.Request 属性)
min_value() (scrapy.statscol.StatsCollector 方法)
msg() (在 scrapy.log 模块中)
N
name (scrapy.spider.Spider 属性)
namespaces (scrapy.contrib.spiders.XMLFeedSpider 属性)
NEWSPIDER_MODULE
setting
NotConfigured
NotSupported
O
object_ref (scrapy.utils.trackref 中的类)
OffsiteMiddleware (scrapy.contrib.spidermiddleware.offsite 中的类)
open_spider()
(scrapy.statscol.StatsCollector 方法)
P
parse
command
parse() (scrapy.spider.Spider 方法)
parse_node() (scrapy.contrib.spiders.XMLFeedSpider 方法)
parse_row() (scrapy.contrib.spiders.CSVFeedSpider 方法)
parse_start_url() (scrapy.contrib.spiders.CrawlSpider 方法)
PickleItemExporter (scrapy.contrib.exporter 中的类)
post_process() (scrapy.contracts.Contract 方法)
PprintItemExporter (scrapy.contrib.exporter 中的类)
pre_process() (scrapy.contracts.Contract 方法)
print_live_refs() (在 scrapy.utils.trackref 模块中)
process_exception() (scrapy.contrib.downloadermiddleware.DownloaderMiddleware 方法)
process_item()
process_request() (scrapy.contrib.downloadermiddleware.DownloaderMiddleware 方法)
process_response() (scrapy.contrib.downloadermiddleware.DownloaderMiddleware 方法)
process_results() (scrapy.contrib.spiders.XMLFeedSpider 方法)
process_spider_exception() (scrapy.contrib.spidermiddleware.SpiderMiddleware 方法)
process_spider_input() (scrapy.contrib.spidermiddleware.SpiderMiddleware 方法)
process_spider_output() (scrapy.contrib.spidermiddleware.SpiderMiddleware 方法)
process_start_requests() (scrapy.contrib.spidermiddleware.SpiderMiddleware 方法)
Python 提高建议
PEP 8
,
[1]
R
RANDOMIZE_DOWNLOAD_DELAY
setting
re() (scrapy.selector.Selector 方法)
(scrapy.selector.SelectorList 方法)
REDIRECT_ENABLED
setting
REDIRECT_MAX_METAREFRESH_DELAY
setting
,
[1]
REDIRECT_MAX_TIMES
setting
,
[1]
REDIRECT_PRIORITY_ADJUST
setting
redirect_urls
reqmeta
RedirectMiddleware (scrapy.contrib.downloadermiddleware.redirect 中的类)
REFERER_ENABLED
setting
RefererMiddleware (scrapy.contrib.spidermiddleware.referer 中的类)
register_namespace() (scrapy.selector.Selector 方法)
remove_namespaces() (scrapy.selector.Selector 方法)
replace() (scrapy.http.Request 方法)
(scrapy.http.Response 方法)
replace_css() (scrapy.contrib.loader.ItemLoader 方法)
replace_value() (scrapy.contrib.loader.ItemLoader 方法)
replace_xpath() (scrapy.contrib.loader.ItemLoader 方法)
reqmeta
bindaddress
cookiejar
dont_redirect
dont_retry
handle_httpstatus_list
redirect_urls
Request (scrapy.http 中的类)
request (scrapy.http.Response 属性)
request_scheduled
signal
request_scheduled() (在 scrapy.signals 模块中)
Response (scrapy.http 中的类)
response_downloaded
signal
response_downloaded() (在 scrapy.signals 模块中)
response_received
signal
response_received() (在 scrapy.signals 模块中)
RETRY_ENABLED
setting
RETRY_HTTP_CODES
setting
RETRY_TIMES
setting
RetryMiddleware (scrapy.contrib.downloadermiddleware.retry 中的类)
ReturnsContract (scrapy.contracts.default 中的类)
ROBOTSTXT_OBEY
setting
RobotsTxtMiddleware (scrapy.contrib.downloadermiddleware.robotstxt 中的类)
Rule (scrapy.contrib.spiders 中的类)
rules (scrapy.contrib.spiders.CrawlSpider 属性)
runspider
command
S
SCHEDULER
setting
ScrapesContract (scrapy.contracts.default 中的类)
scrapy.contracts (模块)
scrapy.contracts.default (模块)
scrapy.contrib.closespider (模块)
scrapy.contrib.closespider.CloseSpider (scrapy.contrib.closespider 中的类)
scrapy.contrib.corestats (模块)
scrapy.contrib.debug (模块)
scrapy.contrib.debug.Debugger (scrapy.contrib.debug 中的类)
scrapy.contrib.debug.StackTraceDump (scrapy.contrib.debug 中的类)
scrapy.contrib.downloadermiddleware (模块)
scrapy.contrib.downloadermiddleware.ajaxcrawl (模块)
scrapy.contrib.downloadermiddleware.chunked (模块)
scrapy.contrib.downloadermiddleware.cookies (模块)
scrapy.contrib.downloadermiddleware.defaultheaders (模块)
scrapy.contrib.downloadermiddleware.downloadtimeout (模块)
scrapy.contrib.downloadermiddleware.httpauth (模块)
scrapy.contrib.downloadermiddleware.httpcache (模块)
scrapy.contrib.downloadermiddleware.httpcompression (模块)
scrapy.contrib.downloadermiddleware.httpproxy (模块)
scrapy.contrib.downloadermiddleware.redirect (模块)
scrapy.contrib.downloadermiddleware.retry (模块)
scrapy.contrib.downloadermiddleware.robotstxt (模块)
scrapy.contrib.downloadermiddleware.stats (模块)
scrapy.contrib.downloadermiddleware.useragent (模块)
scrapy.contrib.exporter (模块)
scrapy.contrib.linkextractors (模块)
scrapy.contrib.linkextractors.lxmlhtml (模块)
scrapy.contrib.loader (模块)
scrapy.contrib.loader.processor (模块)
scrapy.contrib.logstats (模块)
scrapy.contrib.memdebug (模块)
scrapy.contrib.memdebug.MemoryDebugger (scrapy.contrib.memdebug 中的类)
scrapy.contrib.memusage (模块)
scrapy.contrib.memusage.MemoryUsage (scrapy.contrib.memusage 中的类)
scrapy.contrib.pipeline.images (模块)
scrapy.contrib.spidermiddleware (模块)
scrapy.contrib.spidermiddleware.depth (模块)
scrapy.contrib.spidermiddleware.httperror (模块)
scrapy.contrib.spidermiddleware.offsite (模块)
scrapy.contrib.spidermiddleware.referer (模块)
scrapy.contrib.spidermiddleware.urllength (模块)
scrapy.contrib.spiders (模块)
scrapy.contrib.statsmailer (模块)
scrapy.contrib.statsmailer.StatsMailer (scrapy.contrib.statsmailer 中的类)
scrapy.crawler (模块)
scrapy.exceptions (模块)
scrapy.http (模块)
scrapy.item (模块)
scrapy.log (模块)
scrapy.mail (模块)
scrapy.selector (模块)
scrapy.settings (模块)
scrapy.signalmanager (模块)
scrapy.signals (模块)
scrapy.spider (模块)
scrapy.spidermanager (模块)
scrapy.statscol (模块)
,
[1]
scrapy.telnet (模块)
,
[1]
scrapy.telnet.TelnetConsole (scrapy.telnet 中的类)
scrapy.utils.trackref (模块)
selector (scrapy.contrib.loader.ItemLoader 属性)
(scrapy.http.TextResponse 属性)
Selector (scrapy.selector 中的类)
SelectorList (scrapy.selector 中的类)
send() (scrapy.mail.MailSender 方法)
send_catch_log() (scrapy.signalmanager.SignalManager 方法)
send_catch_log_deferred() (scrapy.signalmanager.SignalManager 方法)
serialize_field() (scrapy.contrib.exporter.BaseItemExporter 方法)
set() (scrapy.settings.Settings 方法)
set_stats() (scrapy.statscol.StatsCollector 方法)
set_value() (scrapy.statscol.StatsCollector 方法)
setdict() (scrapy.settings.Settings 方法)
setmodule() (scrapy.settings.Settings 方法)
setting
AJAXCRAWL_ENABLED
AUTOTHROTTLE_DEBUG
AUTOTHROTTLE_ENABLED
AUTOTHROTTLE_MAX_DELAY
AUTOTHROTTLE_START_DELAY
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
BOT_NAME
CLOSESPIDER_ERRORCOUNT
CLOSESPIDER_ITEMCOUNT
CLOSESPIDER_PAGECOUNT
CLOSESPIDER_TIMEOUT
COMMANDS_MODULE
COMPRESSION_ENABLED
CONCURRENT_ITEMS
CONCURRENT_REQUESTS
CONCURRENT_REQUESTS_PER_DOMAIN
CONCURRENT_REQUESTS_PER_IP
COOKIES_DEBUG
COOKIES_ENABLED
DEFAULT_ITEM_CLASS
DEFAULT_REQUEST_HEADERS
DEPTH_LIMIT
DEPTH_PRIORITY
DEPTH_STATS
DEPTH_STATS_VERBOSE
DNSCACHE_ENABLED
DOWNLOADER
DOWNLOADER_MIDDLEWARES
DOWNLOADER_MIDDLEWARES_BASE
DOWNLOADER_STATS
DOWNLOAD_DELAY
DOWNLOAD_HANDLERS
DOWNLOAD_HANDLERS_BASE
DOWNLOAD_TIMEOUT
DUPEFILTER_CLASS
DUPEFILTER_DEBUG
EDITOR
EXTENSIONS
EXTENSIONS_BASE
FEED_EXPORTERS
FEED_EXPORTERS_BASE
FEED_FORMAT
FEED_STORAGES
FEED_STORAGES_BASE
FEED_STORE_EMPTY
FEED_URI
HTTPCACHE_DBM_MODULE
HTTPCACHE_DIR
HTTPCACHE_ENABLED
HTTPCACHE_EXPIRATION_SECS
HTTPCACHE_IGNORE_HTTP_CODES
HTTPCACHE_IGNORE_MISSING
HTTPCACHE_IGNORE_SCHEMES
HTTPCACHE_POLICY
HTTPCACHE_STORAGE
HTTPERROR_ALLOWED_CODES
HTTPERROR_ALLOW_ALL
IMAGES_EXPIRES
IMAGES_MIN_HEIGHT
IMAGES_MIN_WIDTH
IMAGES_STORE
IMAGES_THUMBS
ITEM_PIPELINES
ITEM_PIPELINES_BASE
LOG_ENABLED
LOG_ENCODING
LOG_FILE
LOG_LEVEL
LOG_STDOUT
MAIL_FROM
MAIL_HOST
MAIL_PASS
MAIL_PORT
MAIL_SSL
MAIL_TLS
MAIL_USER
MEMDEBUG_ENABLED
MEMDEBUG_NOTIFY
MEMUSAGE_ENABLED
MEMUSAGE_LIMIT_MB
MEMUSAGE_NOTIFY_MAIL
MEMUSAGE_REPORT
MEMUSAGE_WARNING_MB
METAREFRESH_ENABLED
NEWSPIDER_MODULE
RANDOMIZE_DOWNLOAD_DELAY
REDIRECT_ENABLED
REDIRECT_MAX_METAREFRESH_DELAY
,
[1]
REDIRECT_MAX_TIMES
,
[1]
REDIRECT_PRIORITY_ADJUST
REFERER_ENABLED
RETRY_ENABLED
RETRY_HTTP_CODES
RETRY_TIMES
ROBOTSTXT_OBEY
SCHEDULER
SPIDER_CONTRACTS
SPIDER_CONTRACTS_BASE
SPIDER_MANAGER_CLASS
SPIDER_MIDDLEWARES
SPIDER_MIDDLEWARES_BASE
SPIDER_MODULES
STATSMAILER_RCPTS
STATS_CLASS
STATS_DUMP
TELNETCONSOLE_ENABLED
TELNETCONSOLE_HOST
TELNETCONSOLE_PORT
,
[1]
TEMPLATES_DIR
URLLENGTH_LIMIT
USER_AGENT
settings
command
settings (scrapy.crawler.Crawler 属性)
Settings (scrapy.settings 中的类)
settings (scrapy.spider.Spider 属性)
SETTINGS_PRIORITIES() (在 scrapy.settings 模块中)
shell
command
signal
engine_started
engine_stopped
item_dropped
item_scraped
request_scheduled
response_downloaded
response_received
spider_closed
spider_error
spider_idle
spider_opened
update_telnet_vars
SignalManager (scrapy.signalmanager 中的类)
signals (scrapy.crawler.Crawler 属性)
sitemap_alternate_links (scrapy.contrib.spiders.SitemapSpider 属性)
sitemap_follow (scrapy.contrib.spiders.SitemapSpider 属性)
sitemap_rules (scrapy.contrib.spiders.SitemapSpider 属性)
sitemap_urls (scrapy.contrib.spiders.SitemapSpider 属性)
SitemapSpider (scrapy.contrib.spiders 中的类)
spider (scrapy.crawler.Crawler 属性)
Spider (scrapy.spider 中的类)
spider_closed
signal
spider_closed() (在 scrapy.signals 模块中)
SPIDER_CONTRACTS
setting
SPIDER_CONTRACTS_BASE
setting
spider_error
signal
spider_error() (在 scrapy.signals 模块中)
spider_idle
signal
spider_idle() (在 scrapy.signals 模块中)
SPIDER_MANAGER_CLASS
setting
SPIDER_MIDDLEWARES
setting
SPIDER_MIDDLEWARES_BASE
setting
SPIDER_MODULES
setting
spider_opened
signal
spider_opened() (在 scrapy.signals 模块中)
spider_stats (scrapy.statscol.MemoryStatsCollector 属性)
SpiderManager (scrapy.spidermanager 中的类)
SpiderMiddleware (scrapy.contrib.spidermiddleware 中的类)
start() (在 scrapy.log 模块中)
start_exporting() (scrapy.contrib.exporter.BaseItemExporter 方法)
start_requests() (scrapy.spider.Spider 方法)
start_urls (scrapy.spider.Spider 属性)
startproject
command
stats (scrapy.crawler.Crawler 属性)
STATS_CLASS
setting
STATS_DUMP
setting
StatsCollector (scrapy.statscol 中的类)
STATSMAILER_RCPTS
setting
status (scrapy.http.Response 属性)
stop() (scrapy.crawler.CrawlerRunner 方法)
T
TakeFirst (scrapy.contrib.loader.processor 中的类)
TELNETCONSOLE_ENABLED
setting
TELNETCONSOLE_HOST
setting
TELNETCONSOLE_PORT
setting
,
[1]
TEMPLATES_DIR
setting
TextResponse (scrapy.http 中的类)
U
update_telnet_vars
signal
update_telnet_vars() (在 scrapy.telnet 模块中)
url (scrapy.http.Request 属性)
(scrapy.http.Response 属性)
UrlContract (scrapy.contracts.default 中的类)
URLLENGTH_LIMIT
setting
UrlLengthMiddleware (scrapy.contrib.spidermiddleware.urllength 中的类)
USER_AGENT
setting
UserAgentMiddleware (scrapy.contrib.downloadermiddleware.useragent 中的类)
V
version
command
view
command
W
WARNING() (在 scrapy.log 模块中)
X
XMLFeedSpider (scrapy.contrib.spiders 中的类)
XmlItemExporter (scrapy.contrib.exporter 中的类)
XmlResponse (scrapy.http 中的类)
xpath() (scrapy.http.TextResponse 方法)
(scrapy.selector.Selector 方法)
(scrapy.selector.SelectorList 方法)