响应对象 - CFspider API 文档

概述

CFspider 的响应对象 CFSpiderResponse 和 AsyncCFSpiderResponse 封装了 HTTP 响应，提供与 requests/httpx 兼容的接口，并额外提供 Cloudflare 特有的信息和数据提取功能。

CFSpiderResponse

同步请求的响应对象，继承自 requests.Response 的接口。

属性

属性	类型	说明
`text`	`str`	响应文本内容（自动解码）
`content`	`bytes`	响应原始字节内容
`status_code`	`int`	HTTP 状态码（如 200, 404, 500）
`headers`	`dict`	响应头字典
`cookies`	`RequestsCookieJar`	响应 Cookie
`url`	`str`	最终请求的 URL（跟随重定向后）
`encoding`	`str`	响应编码，可读写
`cf_colo`	`str`	Cloudflare 节点代码（使用 Workers 时可用），表示请求经过的 CF 节点： NRT：东京（日本） SIN：新加坡 LAX：洛杉矶（美国） LHR：伦敦（英国）更多节点代码...
`cf_ray`	`str`	Cloudflare Ray ID，每个请求的唯一标识符，可用于调试和追踪请求

方法

方法	说明
`json(**kwargs)`	将响应解析为 JSON，返回 dict 或 list。参数传递给 `json.loads()`
`raise_for_status()`	当状态码非 2xx 时抛出 `requests.HTTPError`
`find(selector, attr=None, strip=True, regex=None, parser=None)`	查找第一个匹配的元素（数据提取，详见数据提取文档）
`find_all(selector, attr=None, strip=True)`	查找所有匹配的元素
`css(selector, attr=None, html=False, strip=True)`	使用 CSS 选择器提取第一个匹配
`css_all(selector, attr=None, html=False, strip=True)`	使用 CSS 选择器提取所有匹配
`css_one(selector)`	返回第一个匹配的 Element 对象
`xpath(expression)`	使用 XPath 表达式提取第一个匹配
`xpath_all(expression)`	使用 XPath 表达式提取所有匹配
`xpath_one(expression)`	返回第一个匹配的 Element 对象
`jpath(expression)`	使用 JSONPath 表达式提取第一个匹配
`jpath_all(expression)`	使用 JSONPath 表达式提取所有匹配
`pick(**fields)`	批量提取多个字段，返回 ExtractResult
`extract(rules)`	使用规则字典提取数据
`save(filepath, encoding='utf-8')`	保存响应内容到文件，返回输出文件的绝对路径

使用示例

python

import cfspider

response = cfspider.get("https://httpbin.org/ip")

# 基本属性
print(response.status_code)  # 200
print(response.text)         # 响应文本
print(response.content)      # 响应字节
print(response.headers)      # 响应头
print(response.url)          # 最终 URL

# Cloudflare 信息
print(response.cf_colo)      # NRT
print(response.cf_ray)       # 8a1b2c3d4e5f-NRT

# 解析 JSON
data = response.json()
print(data)

# 检查状态码
response.raise_for_status()  # 非 2xx 时抛出异常

# 数据提取
title = response.find("h1")
links = response.find_all("a", attr="href")

# 保存响应
response.save("response.html")

AsyncCFSpiderResponse

异步请求的响应对象，基于 httpx.Response。

属性

与 CFSpiderResponse 相同的属性，额外提供：

属性	类型	说明
`http_version`	`str`	HTTP 协议版本（如 "HTTP/1.1" 或 "HTTP/2"）

方法

包含 CFSpiderResponse 的所有方法，额外提供异步迭代方法：

方法	说明
`async aiter_bytes(chunk_size=None)`	异步迭代响应字节，用于流式处理大文件
`async aiter_text(chunk_size=None)`	异步迭代响应文本
`async aiter_lines()`	异步迭代响应行

使用示例

python

import asyncio
import cfspider

async def main():
    response = await cfspider.aget("https://httpbin.org/ip")
    
    # 基本属性
    print(response.status_code)
    print(response.http_version)  # HTTP/2
    
    # 流式处理
    async for chunk in response.aiter_bytes():
        process(chunk)

asyncio.run(main())

Element 对象

HTML 元素封装类，支持链式操作。

属性

属性	类型	说明
`text`	`str`	元素的文本内容
`html`	`str`	元素的 HTML 内容
`attrs`	`dict`	所有属性的字典

方法

方法	说明
`__getitem__(key)`	获取属性值，如 `element["href"]`
`get(key, default=None)`	获取属性值，支持默认值
`find(selector, attr=None, strip=True)`	在当前元素内查找第一个匹配的元素
`find_all(selector, attr=None, strip=True)`	在当前元素内查找所有匹配的元素
`css_one(selector)`	返回第一个匹配的 Element 对象，支持链式操作

使用示例

python

# 获取 Element 对象
product = response.css_one(".product")

# 访问属性
print(product.text)        # 文本内容
print(product.html)        # HTML 内容
print(product["href"])      # 获取 href 属性
print(product.get("src"))  # 获取 src 属性，不存在返回 None

# 链式操作
title = product.find("h1")
price = product.find(".price")
link = product.find("a", attr="href")

# 嵌套查找
product = response.css_one("#main")
items = product.find_all(".item")
for item in items:
    print(item.find("h2"))

错误处理

python

import cfspider

try:
    response = cfspider.get("https://example.com")
    response.raise_for_status()  # 检查状态码
    data = response.json()
except cfspider.CFSpiderError as e:
    print(f"CFspider 错误: {e}")
except requests.RequestException as e:
    print(f"请求错误: {e}")
except ValueError as e:
    print(f"JSON 解析错误: {e}")