Python 中执行 HTTP 请求：从 curl 到 Requests

引言：网络的基石 – HTTP 请求

在现代软件开发中，与网络进行交互是不可或缺的一部分。无论是获取网页内容、调用远程 API、上传数据还是与其他服务通信，核心操作都是发送和接收 HTTP 请求。HTTP（Hypertext Transfer Protocol）是互联网上应用最为广泛的一种网络协议，它定义了客户端（如浏览器、应用程序）如何向服务器发起请求，以及服务器如何返回响应。

对于开发者而言，理解并能够在自己的程序中执行 HTTP 请求是至关重要的技能。我们经常从命令行工具 curl 开始学习和测试 HTTP 请求，因为它直观且强大。然而，在编写复杂的应用程序时，我们需要在编程语言中实现这些功能。Python，作为一门强大且易于使用的语言，提供了多种执行 HTTP 请求的方式。本文将带领你从 curl 的视角出发，逐步深入 Python 的标准库 urllib，最终抵达被广泛誉为“Python HTTP 请求的事实标准”的第三方库 requests，详细探讨它们的使用方法、优劣势，以及 requests 如何极大地简化了网络编程。

第一站：理解 HTTP 请求 – 从 `curl` 开始

在深入 Python 之前，我们先回顾一下 curl。curl 是一个命令行工具和库，用于传输数据，支持多种协议，其中最常用的是 HTTP。它被广泛用于测试 API、下载文件以及了解 HTTP 协议的工作原理。通过 curl，我们可以直观地看到 HTTP 请求的各个组成部分：请求方法、URL、请求头、请求体等，以及服务器返回的响应：状态码、响应头、响应体。

示例 1：使用 curl 发送一个简单的 GET 请求

bash curl https://www.example.com

执行这个命令，curl 会向 https://www.example.com 发送一个 GET 请求，并将服务器返回的 HTML 内容直接输出到终端。

示例 2：使用 curl 发送一个带查询参数的 GET 请求

bash curl "https://api.example.com/users?id=123&format=json"

这里，我们在 URL 中包含了查询参数 (id=123&format=json)。curl 能够正确地发送包含这些参数的 GET 请求。

示例 3：使用 curl 发送一个 POST 请求并带数据

发送表单数据：
bash curl -X POST -d "username=testuser&password=testpass" https://api.example.com/login

发送 JSON 数据：
bash curl -X POST -H "Content-Type: application/json" -d '{"username": "testuser", "password": "testpass"}' https://api.example.com/login

在 POST 请求中，我们使用了 -X POST 指定方法，-d（或 --data）发送请求体数据。当发送 JSON 数据时，我们还需要使用 -H（或 --header）设置 Content-Type 请求头为 application/json，告知服务器请求体是 JSON 格式。

从 curl 中学到的 HTTP 基础：

请求方法 (Method): GET, POST, PUT, DELETE 等。
URL (Uniform Resource Locator): 资源的地址。
请求头 (Headers): 包含关于请求的元数据，如 Content-Type (请求体类型), User-Agent (客户端信息), Authorization (认证信息) 等。
请求体 (Body): 在 POST、PUT 等请求中携带的数据，如表单数据或 JSON 数据。
响应状态码 (Status Code): 服务器对请求的处理结果，如 200 OK, 404 Not Found, 500 Internal Server Error 等。
响应头 (Response Headers): 包含关于响应的元数据，如 Content-Type (响应体类型), Set-Cookie (设置 Cookie) 等。
响应体 (Body): 服务器返回的实际内容，如 HTML、JSON、图片等。

curl 是一个优秀的工具，用于手动测试和调试。但当我们需要在程序中自动化这些任务，根据程序逻辑动态生成请求、处理响应，并在复杂的流程中集成网络操作时，就需要借助编程语言提供的库了。

第二站：Python 的标准库 – `urllib` 的时代

Python 的标准库中包含了用于处理 URL 的模块，统称为 urllib。随着 Python 版本的发展，urllib 模块经历了一些变化。在 Python 3 中，原有的 urllib 和 urllib2 被整合进了 urllib 包下的子模块，其中 urllib.request 主要负责发送 HTTP 请求。

urllib.request 提供了一些函数和类来打开 URL（通常是 HTTP URL），并处理一些简单的认证、重定向、cookie 等。

示例 4：使用 urllib.request 发送一个简单的 GET 请求

“`python
import urllib.request
import urllib.error

try:
# 打开 URL
with urllib.request.urlopen(‘https://www.example.com’) as response:
# 读取响应体
html = response.read()
# 解码响应体（通常是 UTF-8）
print(html.decode(‘utf-8’))
# 获取状态码
print(f”Status Code: {response.getcode()}”)
# 获取响应头
print(“Headers:”)
for header, value in response.getheaders():
print(f” {header}: {value}”)

except urllib.error.URLError as e:
print(f”URL Error: {e.reason}”)
except urllib.error.HTTPError as e:
print(f”HTTP Error: {e.code} – {e.reason}”)
except Exception as e:
print(f”An unexpected error occurred: {e}”)

“`

这个简单的 GET 请求已经展示了 urllib.request 的基本用法：使用 urlopen() 函数打开一个 URL，返回一个类文件对象，然后可以读取其内容。我们还需要手动进行错误处理 (URLError, HTTPError)。

示例 5：使用 urllib.request 发送一个带数据的 POST 请求

发送 POST 请求需要更多的步骤：

准备请求体数据。
如果数据是字典等形式，需要进行 URL 编码。
将编码后的数据转换为字节串。
创建一个 urllib.request.Request 对象，指定 URL、数据和请求方法。
（可选）添加请求头，如 Content-Type。
使用 urlopen() 打开 Request 对象。

“`python
import urllib.request
import urllib.parse
import urllib.error

假设这是 POST 的目标 URL

post_url = ‘https://httpbin.org/post’ # httpbin.org 是一个用于测试 HTTP 请求的网站

准备要发送的数据 (字典形式)

data = {‘key1’: ‘value1’, ‘key2’: ‘value2’}

try:
# 1. 对数据进行 URL 编码
# urllib.parse.urlencode({‘key’: ‘value’}) -> ‘key=value’
encoded_data = urllib.parse.urlencode(data)

# 2. 将编码后的字符串转换为字节串 (默认使用 UTF-8)
data_bytes = encoded_data.encode('utf-8')

# 3. 创建 Request 对象
# Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)
# 对于 POST，数据放在 data 参数里
req = urllib.request.Request(post_url, data=data_bytes, method='POST')

# 4. (可选) 添加请求头
# urllib 通常会在 data 参数存在时自动设置 Content-Type 为 application/x-www-form-urlencoded
# 如果要发送 JSON，需要手动设置 Content-Type 并将数据编码为 JSON 字符串
# json_data = json.dumps({'user': 'test'})
# json_data_bytes = json_data.encode('utf-8')
# req = urllib.request.Request(post_url, data=json_data_bytes, method='POST')
# req.add_header('Content-Type', 'application/json')


print(f"Sending POST request to: {post_url}")
# 5. 使用 urlopen 打开 Request 对象
with urllib.request.urlopen(req) as response:
    # 读取响应体
    response_body = response.read().decode('utf-8')
    print("\nResponse Body:")
    print(response_body)

    # 获取状态码
    print(f"\nStatus Code: {response.getcode()}")
    # 获取响应头
    print("\nResponse Headers:")
    for header, value in response.getheaders():
        print(f"  {header}: {value}")

“`

可以看到，即使是一个简单的 POST 请求，使用 urllib.request 也需要进行手动编码、创建 Request 对象、处理字节串等步骤。相比 curl 的简洁 -d 参数，这显得有些繁琐。处理文件上传、Cookie、Session、重定向、代理等高级功能会进一步增加代码的复杂性。

urllib.request 的优点：

它是 Python 标准库的一部分，无需额外安装。
提供了处理 HTTP 请求的基本功能。

urllib.request 的缺点：

API 设计相对底层，使用起来不够直观和便捷。
对于常见任务（如 POST 表单、发送 JSON、处理 Cookie、管理 Session、自动重定向、重试机制等）需要编写大量样板代码。
错误处理不够友好，需要区分 URLError 和 HTTPError。
发送不同类型的数据（表单、JSON、文件）方式不同，需要手动编码和设置头。
没有内置的连接池管理，每次请求可能建立新的连接（除非手动实现）。

正因为 urllib.request 在便捷性和易用性上的不足，社区开始寻求更高级、更人性化的 HTTP 客户端库，这为 requests 库的兴起奠定了基础。

第三站：现代化的解决方案 – `requests` 库

requests 是一个非凡的 Python HTTP 库，它旨在让 HTTP 请求变得简单而直观。它的设计理念是“为人而非为程序”，致力于提供一个干净、高级的 API，屏蔽底层细节。requests 并不是标准库的一部分，需要单独安装。

安装 requests：

bash pip install requests

一旦安装完成，你就可以开始使用它了。

示例 6：使用 requests 发送一个简单的 GET 请求

“`python
import requests

try:
# 发送 GET 请求
response = requests.get(‘https://www.example.com’)

# 检查响应状态码
if response.status_code == 200:
    # 获取响应体文本内容
    print(response.text)
    # 获取状态码
    print(f"\nStatus Code: {response.status_code}")
    # 获取响应头 (字典形式)
    print("\nHeaders:")
    for header, value in response.headers.items():
        print(f"  {header}: {value}")
else:
    print(f"Request failed with status code: {response.status_code}")
    print(response.text) # 打印错误信息或内容

`requests` 提供了统一的异常基类 `RequestException`

except requests.exceptions.RequestException as e:
print(f”An error occurred during the request: {e}”)

“`

与 urllib.request 相比，requests.get() 的用法极其简洁，与 curl https://url 类似。获取状态码、响应体、响应头也非常直观。

示例 7：使用 requests 发送一个带查询参数的 GET 请求

requests 使用 params 参数来传递查询字符串，它接收一个字典。requests 会自动对字典中的键值进行 URL 编码并构建查询字符串。

“`python
import requests

url = ‘https://api.example.com/users’
params = {‘id’: 123, ‘format’: ‘json’}

try:
# 发送带参数的 GET 请求
response = requests.get(url, params=params)

if response.status_code == 200:
    print(f"Request URL: {response.url}") # 打印实际请求的完整 URL
    print("\nResponse JSON:")
    # 如果响应是 JSON 格式，可以直接使用 .json() 方法解析
    try:
        print(response.json())
    except requests.exceptions.JSONDecodeError:
        print("Response is not valid JSON.")
        print(response.text)
else:
     print(f"Request failed with status code: {response.status_code}")

except requests.exceptions.RequestException as e:
print(f”An error occurred: {e}”)

“`

requests 自动处理 URL 编码和参数拼接，非常方便。

示例 8：使用 requests 发送一个带数据的 POST 请求

requests 提供了多种方式发送 POST 请求体：

data 参数: 发送 application/x-www-form-urlencoded 表单数据（字典）或任意字节串/字符串。
json 参数: 发送 application/json 数据（字典、列表等可序列化对象）。requests 会自动将对象序列化为 JSON 字符串，并设置 Content-Type 头为 application/json。

发送表单数据：
“`python
import requests

post_url = ‘https://httpbin.org/post’
payload = {‘username’: ‘testuser’, ‘password’: ‘testpass’}

try:
# 使用 data 参数发送表单数据
response = requests.post(post_url, data=payload)

if response.status_code == 200:
    print("Response JSON (Form data sent):")
    print(response.json())
else:
    print(f"Request failed with status code: {response.status_code}")

except requests.exceptions.RequestException as e:
print(f”An error occurred: {e}”)
“`

发送 JSON 数据：
“`python
import requests

post_url = ‘https://httpbin.org/post’
json_payload = {‘username’: ‘testuser’, ‘password’: ‘testpass’, ‘action’: ‘login’}

try:
# 使用 json 参数发送 JSON 数据
response = requests.post(post_url, json=json_payload)

if response.status_code == 200:
    print("Response JSON (JSON data sent):")
    print(response.json()) # httpbin.org 会在响应中包含请求发送的数据
else:
    print(f"Request failed with status code: {response.status_code}")

except requests.exceptions.RequestException as e:
print(f”An error occurred: {e}”)
`` 对比urllib.request的 POST 示例，requests的代码量显著减少，且更易于理解。data和json` 参数的设计考虑到了最常见的 POST 数据格式，并自动处理了编码和请求头设置。

示例 9：设置请求头

可以使用 headers 参数传递一个字典来设置请求头。

“`python
import requests

url = ‘https://httpbin.org/headers’ # 这个端点会返回你发送的请求头
custom_headers = {
‘User-Agent’: ‘MyPythonApp/1.0’,
‘Accept’: ‘application/json’
}

try:
response = requests.get(url, headers=custom_headers)

if response.status_code == 200:
    print("Sent and received headers:")
    print(response.json()['headers']) # httpbin.org 将请求头放在 'headers' 键下
else:
    print(f"Request failed with status code: {response.status_code}")

except requests.exceptions.RequestException as e:
print(f”An error occurred: {e}”)

“`

这与 curl -H 'Header-Name: Value' 异曲同工，但在 Python 代码中更加结构化。

requests 的更多强大特性：

文件上传 (files 参数): 轻松上传单文件或多文件。
“`python
import requests

upload_url = ‘https://httpbin.org/post’
files = {‘file’: open(‘report.txt’, ‘rb’)} # 以二进制模式打开文件

try:
response = requests.post(upload_url, files=files)
if response.status_code == 200:
print(“File upload response:”)
print(response.json()[‘files’]) # httpbin.org 会返回上传的文件内容
else:
print(f”Upload failed with status code: {response.status_code}”)
except requests.exceptions.RequestException as e:
print(f”An error occurred during upload: {e}”)
finally:
files[‘file’].close() # 记得关闭文件
“`
处理响应内容: 除了 .text (unicode) 和 .json() (解析 JSON)，还有 .content (字节串，用于图片、音频等二进制数据) 和 .raw (原始套接字响应，用于流式下载)。
Cookie 处理: requests 自动处理接收到的 Set-Cookie 头，并在后续请求中发送相应的 Cookie。响应对象有一个 .cookies 属性。
“`python
import requests

第一次请求可能会设置 Cookie

response1 = requests.get(‘https://httpbin.org/cookies/set/mycookie/myvalue’)
print(f”Response 1 Cookies: {response1.cookies.get_dict()}”)

第二次请求通常会自动带上之前设置的 Cookie (如果使用 Session)

对于requests.get/post等顶级函数，Cookie管理是基于response对象的，不是跨请求自动共享的

要跨请求共享Cookie，需要使用 Session

“`
Session 对象: requests.Session() 对象提供了一个更强大的方式来管理持久性会话。它可以在多个请求之间保持 Cookie、连接池，并允许设置默认的请求头、认证等。这对于需要登录状态或频繁访问同一站点的场景非常有用。
“`python
import requests

创建一个 Session 对象

with requests.Session() as session:
# 在 Session 中发送请求，Cookie 和连接会自动在 session 生命周期内保持
response1 = session.get(‘https://httpbin.org/cookies/set/mycookie/myvalue’)
print(f”Session Cookies after set: {session.cookies.get_dict()}”)
```
# 在同一个 Session 中发送另一个请求，会自动带上 mycookie
response2 = session.get('https://httpbin.org/cookies') # 这个端点返回接收到的所有 Cookie
print("Response 2 received cookies:")
print(response2.json()['cookies'])

# 也可以为 Session 设置默认头、认证等
session.headers.update({'X-Custom-Header': 'Awesome'})
response3 = session.get('https://httpbin.org/headers')
print("Response 3 headers:")
print(response3.json()['headers'])
```
`` 使用Session` 是进行复杂交互、处理登录状态和提高性能（通过连接重用）的标准做法。
重定向: requests 默认会自动处理 3xx 重定向。可以通过 allow_redirects=False 禁用。响应对象有一个 .history 属性，包含了重定向过程中经过的所有响应对象。
超时: 使用 timeout 参数设置请求超时时间（秒）。
“`python
import requests

try:
# 5秒内必须得到响应，否则抛出 Timeout 异常
response = requests.get(‘https://www.example.com’, timeout=5)
print(“Request successful within timeout.”)
except requests.exceptions.Timeout:
print(“Request timed out.”)
except requests.exceptions.RequestException as e:
print(f”An error occurred: {e}”)
“`
错误处理: requests 提供了更清晰的异常层次结构 (requests.exceptions.RequestException 是所有 requests 相关错误的基类)，方便捕获不同类型的错误（如 ConnectionError, Timeout, HTTPError 等）。当状态码是 4xx 或 5xx 时，可以通过调用 response.raise_for_status() 方法抛出一个 requests.exceptions.HTTPError 异常。
“`python
import requests

try:
# 访问一个不存在的页面，会返回 404
response = requests.get(‘https://httpbin.org/status/404’)
# raise_for_status() 会在状态码不是 2xx 时抛出异常
response.raise_for_status()
print(“Request successful (should not happen for 404).”) # 这行不会执行
except requests.exceptions.HTTPError as e:
print(f”HTTP error occurred: {e}”) # 捕获 HTTP 错误
except requests.exceptions.RequestException as e:
print(f”Other request error occurred: {e}”)
“`
SSL 证书验证: requests 默认会验证服务器的 SSL 证书，这对于安全性至关重要。如果需要禁用（仅在测试或特定场景下，不推荐在生产环境），可以使用 verify=False。
“`python
import requests

访问一个 HTTPS 网站，requests 会自动验证证书

try:
response = requests.get(‘https://www.google.com’)
response.raise_for_status() # 确保请求成功
print(“SSL certificate verified successfully.”)
except requests.exceptions.SSLError as e:
print(f”SSL certificate verification failed: {e}”)

禁用证书验证 (不推荐)

try:
response = requests.get(‘https://self-signed.badssl.com/’, verify=False)
# 禁用警告 (可选)
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
response.raise_for_status()
print(“SSL certificate verification disabled (warning suppressed).”)
except requests.exceptions.RequestException as e:
print(f”An error occurred: {e}”)
“`
代理设置: 通过 proxies 参数可以设置代理。
“`python
import requests

proxies = {
‘http’: ‘http://10.10.1.10:3128’,
‘https’: ‘http://10.10.1.10:1080’,
# ‘http’: ‘http://user:pass@host:port/’, # 带认证的代理
}

try:
response = requests.get(‘http://example.com’, proxies=proxies)
print(f”Request via proxy successful: {response.status_code}”)
except requests.exceptions.RequestException as e:
print(f”An error occurred while using proxy: {e}”)
“`

requests 提供了如此丰富且易用的功能，使得在 Python 中执行 HTTP 请求变得异常简单和愉快，这正是它被如此广泛采用的原因。

第四站：为什么 Requests 胜出？（对比 `urllib.request`）

回顾我们从 curl 到 urllib.request 再到 requests 的旅程，requests 的优势是显而易见的。将其与 urllib.request 进行直接对比，可以更清楚地看到 requests 为什么成为事实标准：

API 友好性与简洁性:
- urllib.request: 需要创建 Request 对象，手动处理数据编码 (urlencode) 和字节转换 (encode)，手动添加 Content-Type 头。
- requests: 直接调用 requests.get(), requests.post() 等函数，使用 params, data, json 参数，库自动处理编码、头设置和数据格式。代码更少，意图更清晰。
数据处理:
- urllib.request: 需要手动将参数编码为 application/x-www-form-urlencoded 字符串，发送 JSON 需要手动编码并设置头。
- requests: data 参数自动处理表单编码，json 参数自动处理 JSON 编码和 Content-Type 头设置。支持文件上传的 files 参数。
响应处理:
- urllib.request: 响应体是字节串，需要手动解码 (decode)。获取 JSON 需要先读取再使用 json 模块解析。
- requests: 响应体可以轻松获取 .text (Unicode 文本) 或 .content (字节串)， .json() 方法一键解析 JSON 响应（如果适用）。
Cookie 和 Session 管理:
- urllib.request: 需要使用 HTTPCookieProcessor 等复杂类来手动管理 Cookie 和 Session。
- requests: 自动处理 Cookie。Session 对象提供优雅的 Session 管理，自动保持 Cookie 和连接池。
错误处理:
- urllib.request: 需要捕获 URLError 和 HTTPError，并从异常对象中获取错误信息。
- requests: 提供了统一的 RequestException 基类和更具体的异常子类 (HTTPError, ConnectionError, Timeout 等)，错误信息更清晰。response.raise_for_status() 提供方便的状态码检查。
其他高级特性:
- urllib.request: 实现代理、认证等功能需要配置 OpenerDirector，步骤繁琐。
- requests: 通过简单的参数（proxies, auth, timeout, verify 等）即可配置代理、认证、超时、SSL 验证等。自动处理重定向。

总的来说，requests 将执行 HTTP 请求的许多底层细节和常见任务进行了高级封装，提供了一个更符合直觉、更“Pythonic”的 API。它的目标是让网络编程更接近自然语言和概念，而不是要求开发者与底层的协议细节和繁琐的API搏斗。这极大地提高了开发效率和代码的可读性。

结论：拥抱 Requests

从 curl 这个强大的命令行工具，我们看到了 HTTP 请求的基本构成。通过 urllib.request 标准库，我们了解了在 Python 中执行这些请求的“原生”方式，同时也体会到了其相对底层和繁琐的一面。最终，我们来到了 requests，它以前所未有的简洁和强大，将 Python 中的 HTTP 请求提升到了一个新的水平。

requests 库不仅仅是 urllib.request 的一个简单的封装，它是一个从头设计，以用户友好为核心理念的库。它抽象掉了许多复杂性，让开发者可以专注于业务逻辑，而不是网络协议的实现细节。无论是简单的网页抓取，还是与复杂的 RESTful API 交互，requests 都是 Python 中进行 HTTP 请求的首选库。

对于任何需要在 Python 中进行网络通信的开发者来说，掌握 requests 几乎是一项必备技能。它不仅能让你更高效地完成任务，还能让你编写出更清晰、更易于维护的代码。虽然了解 urllib.request 作为标准库的一部分也很有价值，但在实际开发中，强烈建议优先使用 requests 库。

所以，当你下次需要在 Python 中发起一个 HTTP 请求时，忘记 urllib 的繁琐吧，记住 pip install requests，然后用一行 requests.get(...) 或 requests.post(...) 优雅地完成你的任务！

Python 中执行 HTTP 请求：从 curl 到 Requests – wiki基地

Python 中执行 HTTP 请求：从 curl 到 Requests

引言：网络的基石 – HTTP 请求

第一站：理解 HTTP 请求 – 从 `curl` 开始

第二站：Python 的标准库 – `urllib` 的时代

假设这是 POST 的目标 URL

准备要发送的数据 (字典形式)

第三站：现代化的解决方案 – `requests` 库

`requests` 提供了统一的异常基类 `RequestException`

第一次请求可能会设置 Cookie

第二次请求通常会自动带上之前设置的 Cookie (如果使用 Session)

对于requests.get/post等顶级函数，Cookie管理是基于response对象的，不是跨请求自动共享的

要跨请求共享Cookie，需要使用 Session

创建一个 Session 对象

访问一个 HTTPS 网站，requests 会自动验证证书

禁用证书验证 (不推荐)

第四站：为什么 Requests 胜出？（对比 `urllib.request`）

结论：拥抱 Requests

发表评论取消回复

Python 中执行 HTTP 请求：从 curl 到 Requests

引言：网络的基石 – HTTP 请求

第一站：理解 HTTP 请求 – 从 curl 开始

第二站：Python 的标准库 – urllib 的时代

假设这是 POST 的目标 URL

准备要发送的数据 (字典形式)

第三站：现代化的解决方案 – requests 库

requests 提供了统一的异常基类 RequestException

第一次请求可能会设置 Cookie

第二次请求通常会自动带上之前设置的 Cookie (如果使用 Session)

对于requests.get/post等顶级函数，Cookie管理是基于response对象的，不是跨请求自动共享的

要跨请求共享Cookie，需要使用 Session

创建一个 Session 对象

访问一个 HTTPS 网站，requests 会自动验证证书

禁用证书验证 (不推荐)

第四站：为什么 Requests 胜出？（对比 urllib.request）

结论：拥抱 Requests

发表评论 取消回复

第一站：理解 HTTP 请求 – 从 `curl` 开始

第二站：Python 的标准库 – `urllib` 的时代

第三站：现代化的解决方案 – `requests` 库

`requests` 提供了统一的异常基类 `RequestException`

第四站：为什么 Requests 胜出？（对比 `urllib.request`）

发表评论取消回复