427 lines
503 KiB
HTML
427 lines
503 KiB
HTML
|
<!DOCTYPE html>
|
|||
|
<html lang="zh"><head><title>urllib URL 处理模块</title><meta charset="utf-8"/><link rel="preconnect" href="https://fonts.googleapis.com"/><link rel="preconnect" href="https://fonts.gstatic.com"/><link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=IBM Plex Mono&family=Noto Serif Simplified Chinese:wght@400;700&family=Source Sans Pro:ital,wght@0,400;0,600;1,400;1,600&display=swap"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><meta property="og:title" content="urllib URL 处理模块"/><meta property="og:description" content="urllib URL 处理模块."/><meta property="og:image" content="https://wiki.7wate.com/static/og-image.png"/><meta property="og:width" content="1200"/><meta property="og:height" content="675"/><link rel="icon" href="../../../../../static/icon.png"/><meta name="description" content="urllib URL 处理模块."/><meta name="generator" content="Quartz"/><link href="../../../../../index.css" rel="stylesheet" type="text/css" spa-preserve/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.9/katex.min.css" rel="stylesheet" type="text/css" spa-preserve/><script src="../../../../../prescript.js" type="application/javascript" spa-preserve></script><script type="application/javascript" spa-preserve>const fetchData = fetch("../../../../../static/contentIndex.json").then(data => data.json())</script></head><body data-slug="Technology/ProgrammingLanguage/Python/模块/网络处理/urllib-URL-处理模块"><div id="quartz-root" class="page"><div id="quartz-body"><div class="left sidebar"><h2 class="page-title"><a href="../../../../..">🪴 X·Eden</a></h2><div class="spacer mobile-only"></div><div class="search"><button class="search-button" id="search-button"><p>搜索</p><svg role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 19.9 19.7"><title>Search</title><g class="search-path" fill="none"><path stroke-linecap="square" d="M18.5 18.3l-5.4-5.4"></path><circle cx="8" cy="8" r="7"></circle></g></svg></button><div id="search-container"><div id="search-space"><input autocomplete="off" id="search-bar" name="search" type="text" aria-label="搜索些什么" placeholder="搜索些什么"/><div id="search-layout" data-preview="true"></div></div></div></div><button class="darkmode" id="darkmode"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" id="dayIcon" x="0px" y="0px" viewBox="0 0 35 35" style="enable-background:new 0 0 35 35" xml:space="preserve" aria-label="暗色模式"><title>暗色模式</title><path d="M6,17.5C6,16.672,5.328,16,4.5,16h-3C0.672,16,0,16.672,0,17.5 S0.672,19,1.5,19h3C5.328,19,6,18.328,6,17.5z M7.5,26c-0.414,0-0.789,0.168-1.061,0.439l-2,2C4.168,28.711,4,29.086,4,29.5 C4,30.328,4.671,31,5.5,31c0.414,0,0.789-0.168,1.06-0.44l2-2C8.832,28.289,9,27.914,9,27.5C9,26.672,8.329,26,7.5,26z M17.5,6 C18.329,6,19,5.328,19,4.5v-3C19,0.672,18.329,0,17.5,0S16,0.672,16,1.5v3C16,5.328,16.671,6,17.5,6z M27.5,9 c0.414,0,0.789-0.168,1.06-0.439l2-2C30.832,6.289,31,5.914,31,5.5C31,4.672,30.329,4,29.5,4c-0.414,0-0.789,0.168-1.061,0.44 l-2,2C26.168,6.711,26,7.086,26,7.5C26,8.328,26.671,9,27.5,9z M6.439,8.561C6.711,8.832,7.086,9,7.5,9C8.328,9,9,8.328,9,7.5 c0-0.414-0.168-0.789-0.439-1.061l-2-2C6.289,4.168,5.914,4,5.5,4C4.672,4,4,4.672,4,5.5c0,0.414,0.168,0.789,0.439,1.06 L6.439,8.561z M33.5,16h-3c-0.828,0-1.5,0.672-1.5,1.5s0.672,1.5,1.5,1.5h3c0.828,0,1.5-0.672,1.5-1.5S34.328,16,33.5,16z M28.561,26.439C28.289,26.168,27.914,26,27.5,26c-0.828,0-1.5,0.672-1.5,1.5c0,0.414,0.168,0.789,0.439,1.06l2,2 C28.711,30.832,29.086,31,29.5,31c0.828,0,1.5-0.672,1.5-1.5c0-0.414-0.168-0.789-0.439-1.061L28.561,26.439z M17.5,29 c-0.829,0-1.5,0.672-1.5,1.5v3c0,0.828,0.671,1.5,1.5,1.5s1.5-0.672,1.5-1.5v-3C19,29.672,18.329,29,17.5,29z M17.5,7 C11.71,7,7,11.71,7,17.5S11.71,28,17.5,28S28,23.29,28,17.5S23.29,7,17.5,7z M17.5,25c-4.136,0-7.5-3.364-7.5-7.5 c0-4.136,3.364-7.5,7.5-7.5c4.136,0,7.5,3.364,7.5,7.5C25,21.636,21.636,25,17.5,25z"></path></svg><svg xmlns="http://www.w3.org/2000/svg" xmlns:x
|
|||
|
<p><code>urllib</code> 是 Python 标准库中用于处理 URL(统一资源定位符)相关操作的模块,它提供了多个子模块,用于执行网络请求、解析 URL、处理错误以及解析 robots.txt 文件等。以下是 <code>urllib</code> 的子模块:</p>
|
|||
|
<h3 id="子模块">子模块<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#子模块" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<ul>
|
|||
|
<li><strong><code>urllib.request</code></strong>:提供打开和读取 URL 的功能。支持多种网络协议,如 HTTP、FTP 等。</li>
|
|||
|
<li><strong><code>urllib.error</code></strong>:包含与网络请求相关的异常类,用于处理错误和异常情况。</li>
|
|||
|
<li><strong><code>urllib.parse</code></strong>:用于解析和构建 URL,提供各种操作,如分割、组合、编码和解码。</li>
|
|||
|
<li><strong><code>urllib.robotparser</code></strong>:用于解析网站的 <code>robots.txt</code> 文件,确定哪些页面可以被爬取。</li>
|
|||
|
</ul>
|
|||
|
<h3 id="优点">优点<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#优点" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<ul>
|
|||
|
<li><strong>内置模块</strong>:作为 Python 标准库的一部分,无需单独安装。</li>
|
|||
|
<li><strong>全面功能</strong>:支持多种网络协议和操作,适用于多种网络操作需求。</li>
|
|||
|
<li><strong>高度可定制</strong>:用于处理 URL 的多个方面,如打开、读取、解析等。</li>
|
|||
|
</ul>
|
|||
|
<h3 id="缺点">缺点<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#缺点" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<ul>
|
|||
|
<li><strong>较低层次的 API</strong>:与一些第三方库相比(如 <code>requests</code>),<code>urllib</code> 的 API 较为底层,可能需要编写更多的代码。</li>
|
|||
|
<li><strong>繁琐的错误处理</strong>:错误处理需要额外的代码,相比使用像 <code>requests</code> 这样的库可能更复杂。</li>
|
|||
|
</ul>
|
|||
|
<h3 id="同类产品对比">同类产品对比<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#同类产品对比" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<div class="table-container"><table><thead><tr><th>产品</th><th>优点</th><th>缺点</th><th>适用背景</th><th>社区支持</th></tr></thead><tbody><tr><td>urllib</td><td>标准库,全面</td><td>API 较底层</td><td>网络请求,URL 操作</td><td>Python 社区</td></tr><tr><td>requests</td><td>API 简单</td><td>需要单独安装</td><td>HTTP 请求</td><td>Python 社区</td></tr><tr><td>httplib2</td><td>功能丰富</td><td>使用复杂</td><td>HTTP 请求</td><td>Python 社区</td></tr></tbody></table></div>
|
|||
|
<h2 id="urllibrequest"><code>urllib.request</code><a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#urllibrequest" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h2>
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<div class="table-container"><table><thead><tr><th>方法</th><th>功能描述</th><th>示例</th></tr></thead><tbody><tr><td><code>urlopen()</code></td><td>打开并读取一个 URL 的内容</td><td><code>urllib.request.urlopen(url)</code></td></tr><tr><td><code>urlretrieve()</code></td><td>将 URL 指向的文件下载到本地</td><td><code>urllib.request.urlretrieve(url, filename)</code></td></tr><tr><td><code>build_opener()</code></td><td>构建一个可自定义的 <code>Opener</code> 对象</td><td><code>opener = urllib.request.build_opener()</code></td></tr><tr><td><code>install_opener()</code></td><td>安装全局的 <code>Opener</code></td><td><code>urllib.request.install_opener(opener)</code></td></tr><tr><td><code>HTTPBasicAuthHandler()</code></td><td>HTTP 基础认证处理程序</td><td><code>handler = urllib.request.HTTPBasicAuthHandler()</code></td></tr><tr><td><code>HTTPCookieProcessor()</code></td><td>用于处理 HTTP cookies</td><td><code>handler = urllib.request.HTTPCookieProcessor()</code></td></tr><tr><td><code>ProxyHandler()</code></td><td>设置代理</td><td><code>proxy = urllib.request.ProxyHandler({'http': 'http://www.example.com:8080'})</code></td></tr><tr><td><code>Request()</code></td><td>创建一个请求对象,用于定制 HTTP 头等</td><td><code>req = urllib.request.Request(url, headers={...})</code></td></tr></tbody></table></div>
|
|||
|
<h3 id="urlopen-打开-url"><code>urlopen()</code> 打开 URL<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#urlopen-打开-url" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 打开一个网页</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">response </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.urlopen(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.example.com'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 读取网页内容</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">data </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> response.read()</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 输出网页内容</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">print</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">(data)</span></span></code></pre></figure>
|
|||
|
<h3 id="urlretrieve-下载文件"><code>urlretrieve()</code> 下载文件<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#urlretrieve-下载文件" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 从指定 URL 下载文件,并保存到本地</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">urllib.request.urlretrieve(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.example.com/file.txt'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">, </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'local_file.txt'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span></code></pre></figure>
|
|||
|
<h3 id="build_opener-和-install_opener"><code>build_opener()</code> 和 <code>install_opener()</code><a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#build_opener-和-install_opener" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<p><code>build_opener()</code> 传递一系列处理程序(handlers),这些处理程序用于定义如何处理各种 HTTP 功能,比如重定向、基础认证、cookies 等。一旦你使用 <code>build_opener()</code> 创建了一个 <code>Opener</code> 对象,你可以使用 <code>install_opener()</code> 来设置它作为默认的 <code>Opener</code>。</p>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 创建基础认证处理程序</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">auth_handler </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.HTTPBasicAuthHandler()</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">auth_handler.add_password(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'realm'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">, </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'host'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">, </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'username'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">, </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'password'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 创建代理处理程序</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">proxy_handler </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.ProxyHandler({</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">: </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.proxy.com:8080'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">})</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 创建 Opener</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">opener </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.build_opener(auth_handler, proxy_handler)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 安装 Opener</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">urllib.request.install_opener(opener)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 使用 urlopen() 方法,这样会应用我们之前设置的所有处理程序</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">response </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.urlopen(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.example.com'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span></code></pre></figure>
|
|||
|
<h3 id="http-基础认证-httpbasicauthhandler">HTTP 基础认证 (<code>HTTPBasicAuthHandler</code>)<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#http-基础认证-httpbasicauthhandler" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 创建一个 HTTPBasicAuthHandler 对象</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">auth_handler </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.HTTPBasicAuthHandler()</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 添加认证信息</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">auth_handler.add_password(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'realm'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">, </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'host'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">, </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'username'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">, </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'password'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 创建并安装 opener</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">opener </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.build_opener(auth_handler)</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">urllib.request.install_opener(opener)</span></span></code></pre></figure>
|
|||
|
<h3 id="httpcookieprocessor-处理-cookies"><code>HTTPCookieProcessor</code> 处理 Cookies<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#httpcookieprocessor-处理-cookies" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> http.cookiejar</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 创建一个 CookieJar 对象</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">cookie_jar </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> http.cookiejar.CookieJar()</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 创建一个 HTTPCookieProcessor 对象</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">cookie_handler </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.HTTPCookieProcessor(cookie_jar)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 构建和安装 opener</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">opener </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.build_opener(cookie_handler)</span></span></code></pre></figure>
|
|||
|
<h3 id="proxyhandler-设置代理"><code>ProxyHandler</code> 设置代理<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#proxyhandler-设置代理" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 创建一个 ProxyHandler 对象</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">proxy_handler </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.ProxyHandler({</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">: </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.proxy.com:8080'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">})</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 构建并安装 opener</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">opener </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.build_opener(proxy_handler)</span></span></code></pre></figure>
|
|||
|
<h3 id="request-自定义请求"><code>Request()</code> 自定义请求<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#request-自定义请求" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 创建一个 Request 对象</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">req </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.Request(</span><span style="--shiki-light:#E36209;--shiki-dark:#FFAB70;">url</span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.example.com'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">, </span><span style="--shiki-light:#E36209;--shiki-dark:#FFAB70;">headers</span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">{</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'User-Agent'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">: </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'MyApp/1.0'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">})</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 使用 urlopen 打开自定义的请求</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">response </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.urlopen(req)</span></span></code></pre></figure>
|
|||
|
<h2 id="urlliberror"><code>urllib.error</code><a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#urlliberror" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h2>
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<div class="table-container"><table><thead><tr><th>方法</th><th>功能描述</th></tr></thead><tbody><tr><td><code>URLError</code></td><td>所有 <code>urllib</code> 产生的异常的基类</td></tr><tr><td><code>HTTPError</code></td><td>处理 HTTP 错误状态,继承自 <code>URLError</code></td></tr><tr><td><code>ContentTooShortError</code></td><td>在下载过程中,数据不足时抛出的异常</td></tr></tbody></table></div>
|
|||
|
<h3 id="urlerror"><code>URLError</code><a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#urlerror" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<p>当使用 <code>urllib.request</code> 打开一个 URL 失败时,通常会抛出 <code>URLError</code> 异常。</p>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.error</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">try</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">:</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> response </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.urlopen(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.nonexistentwebsite.com'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">except</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.error.URLError </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">as</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> e:</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;"> print</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">(e.reason)</span></span></code></pre></figure>
|
|||
|
<h3 id="httperror"><code>HTTPError</code><a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#httperror" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<p>当服务器返回 HTTP 错误状态码(如 404、500 等)时,会抛出 <code>HTTPError</code>。</p>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.error</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">try</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">:</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> response </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.urlopen(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.example.com/404'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">except</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.error.HTTPError </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">as</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> e:</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;"> print</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">(</span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">f</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'HTTP Error Code: </span><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">{</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">e.code</span><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">}</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;"> print</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">(</span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">f</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'Reason: </span><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">{</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">e.reason</span><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">}</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span></code></pre></figure>
|
|||
|
<h3 id="contenttooshorterror"><code>ContentTooShortError</code><a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#contenttooshorterror" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<p>如果使用 <code>urlretrieve()</code> 函数,但获取的数据长度与 <code>Content-Length</code> 头中声明的长度不匹配时,会抛出 <code>ContentTooShortError</code>。</p>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.error</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">try</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">:</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.request.urlretrieve(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.example.com/file'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">, </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'local_file.txt'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">except</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.error.ContentTooShortError </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">as</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> e:</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;"> print</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'The downloaded data is less than expected.'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span></code></pre></figure>
|
|||
|
<h2 id="urllibparse"><code>urllib.parse</code><a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#urllibparse" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h2>
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<div class="table-container"><table><thead><tr><th>方法</th><th>功能描述</th><th>示例</th></tr></thead><tbody><tr><td><code>urlparse()</code></td><td>解析 URL,返回一个 ParseResult 对象</td><td><code>urllib.parse.urlparse(url)</code></td></tr><tr><td><code>urlunparse()</code></td><td>将 ParseResult 对象转回 URL</td><td><code>urllib.parse.urlunparse(parse_result)</code></td></tr><tr><td><code>urlsplit()</code></td><td>类似于 <code>urlparse()</code>,但不分割 params</td><td><code>urllib.parse.urlsplit(url)</code></td></tr><tr><td><code>urlunsplit()</code></td><td>将由 <code>urlsplit()</code> 返回的对象转回 URL</td><td><code>urllib.parse.urlunsplit(split_result)</code></td></tr><tr><td><code>urljoin()</code></td><td>合并两个 URL</td><td><code>urllib.parse.urljoin(base, url)</code></td></tr><tr><td><code>urlencode()</code></td><td>将字典或序列转换为 URL 查询字符串</td><td><code>urllib.parse.urlencode(query_dict)</code></td></tr><tr><td><code>quote()</code></td><td>将字符串进行 URL 编码</td><td><code>urllib.parse.quote(string)</code></td></tr><tr><td><code>unquote()</code></td><td>对 URL 编码的字符串进行解码</td><td><code>urllib.parse.unquote(encoded_string)</code></td></tr></tbody></table></div>
|
|||
|
<h3 id="解析和构建-url">解析和构建 URL<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#解析和构建-url" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">from</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.parse </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urlparse, urlunparse, urlsplit, urlunsplit, urljoin</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 解析URL并返回ParseResult对象</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">parsed_url </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urlparse(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.example.com/path?query=arg'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 将ParseResult对象转换回URL</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">new_url </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urlunparse(parsed_url)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 类似于urlparse(),但不分割params</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">split_result </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urlsplit(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.example.com/path?query=arg'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 将由urlsplit()返回的对象转换回URL</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">original_url </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urlunsplit(split_result)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 合并两个URL</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">new_url </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urljoin(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.example.com/path/'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">, </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'/anotherpath.html'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line> </span></code></pre></figure>
|
|||
|
<h3 id="转换查询字符串">转换查询字符串<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#转换查询字符串" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">from</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.parse </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urlencode</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 将字典或序列转换为URL查询字符串</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">query_dict </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> {</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'key1'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">: </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'value1'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">, </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'key2'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">: </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'value2'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">}</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">query_string </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urlencode(query_dict)</span></span>
|
|||
|
<span data-line> </span></code></pre></figure>
|
|||
|
<h3 id="url-编码和解码">URL 编码和解码<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#url-编码和解码" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">from</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.parse </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> quote, unquote</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 将字符串进行URL编码</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">encoded </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> quote(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'a string with / and ?'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 对URL编码的字符串进行解码</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">decoded </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> unquote(encoded)</span></span></code></pre></figure>
|
|||
|
<h2 id="urllibrobotparser"><code>urllib.robotparser</code><a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#urllibrobotparser" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h2>
|
|||
|
<p>通过使用 <code>urllib.robotparser</code>,你可以确保你的网络爬虫<strong>尊重网站的抓取策略,这是一种负责任的爬虫行为。</strong></p>
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
|
|||
|
<div class="table-container"><table><thead><tr><th>方法</th><th>功能描述</th><th>示例</th></tr></thead><tbody><tr><td><code>RobotFileParser()</code></td><td>创建一个 <code>RobotFileParser</code> 对象</td><td><code>rp = urllib.robotparser.RobotFileParser()</code></td></tr><tr><td><code>set_url()</code></td><td>设置 <code>robots.txt</code> 文件的 URL</td><td><code>rp.set_url('http://www.example.com/robots.txt')</code></td></tr><tr><td><code>read()</code></td><td>从设置的 URL 读取 <code>robots.txt</code> 文件</td><td><code>rp.read()</code></td></tr><tr><td><code>parse()</code></td><td>用于手动解析 <code>robots.txt</code> 文件的行</td><td><code>rp.parse(robots_txt_body.split("\n"))</code></td></tr><tr><td><code>can_fetch()</code></td><td>检查指定的 User-Agent 是否可以访问某个路径</td><td><code>rp.can_fetch('*', 'http://www.example.com/page')</code></td></tr><tr><td><code>mtime()</code></td><td>获取最后一次获取 <code>robots.txt</code> 文件的时间(Unix 时间戳)</td><td><code>rp.mtime()</code></td></tr><tr><td><code>modified()</code></td><td>设置最后一次获取 <code>robots.txt</code> 文件的时间</td><td><code>rp.modified()</code></td></tr></tbody></table></div>
|
|||
|
<h3 id="创建和设置-robotfileparser">创建和设置 RobotFileParser<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#创建和设置-robotfileparser" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<p>首先,你需要创建一个 <code>RobotFileParser</code> 对象,并设置要解析的 <code>robots.txt</code> 文件的 URL。</p>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.robotparser</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 创建 RobotFileParser 对象</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">rp </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> urllib.robotparser.RobotFileParser()</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 设置 robots.txt 文件的 URL</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">rp.set_url(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.example.com/robots.txt'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 从 URL 读取 robots.txt 文件</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">rp.read()</span></span></code></pre></figure>
|
|||
|
<h3 id="检查爬虫是否可以访问特定页面">检查爬虫是否可以访问特定页面<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#检查爬虫是否可以访问特定页面" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<p>使用 <code>can_fetch()</code> 方法,您可以检查指定的 User-Agent 是否被允许抓取特定的网页路径。</p>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 检查 '*'(所有 User-Agents)是否允许访问 '/page'</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">allowed </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> rp.can_fetch(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'*'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">, </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'http://www.example.com/page'</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">if</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> allowed:</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;"> print</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">"I can crawl this page."</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">else</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">:</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;"> print</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">"I cannot crawl this page."</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span></code></pre></figure>
|
|||
|
<h3 id="手动解析-robotstxt">手动解析 robots.txt<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#手动解析-robotstxt" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h3>
|
|||
|
<p>如果你需要手动解析 <code>robots.txt</code> 文件的内容,可以使用 <code>parse()</code> 方法。</p>
|
|||
|
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 假设 robots_txt_body 包含了 robots.txt 的文本内容</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">robots_txt_body </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;"> '''</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">User-agent: *</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">Disallow: /private/</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'''</span></span>
|
|||
|
<span data-line> </span>
|
|||
|
<span data-line><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 手动解析这些规则</span></span>
|
|||
|
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">rp.parse(robots_txt_body.split(</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">"</span><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">\n</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">"</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">))</span></span></code></pre></figure></article><hr/><div class="page-footer"></div></div><div class="right sidebar"><div class="graph"><h3>关系图谱</h3><div class="graph-outer"><div id="graph-container" data-cfg="{"drag":true,"zoom":true,"depth":1,"scale":1.1,"repelForce":0.5,"centerForce":0.3,"linkDistance":30,"fontSize":0.6,"opacityScale":1,"showTags":true,"removeTags":[],"focusOnHover":false}"></div><button id="global-graph-icon" aria-label="Global Graph"><svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" viewBox="0 0 55 55" fill="currentColor" xml:space="preserve"><path d="M49,0c-3.309,0-6,2.691-6,6c0,1.035,0.263,2.009,0.726,2.86l-9.829,9.829C32.542,17.634,30.846,17,29,17
|
|||
|
s-3.542,0.634-4.898,1.688l-7.669-7.669C16.785,10.424,17,9.74,17,9c0-2.206-1.794-4-4-4S9,6.794,9,9s1.794,4,4,4
|
|||
|
c0.74,0,1.424-0.215,2.019-0.567l7.669,7.669C21.634,21.458,21,23.154,21,25s0.634,3.542,1.688,4.897L10.024,42.562
|
|||
|
C8.958,41.595,7.549,41,6,41c-3.309,0-6,2.691-6,6s2.691,6,6,6s6-2.691,6-6c0-1.035-0.263-2.009-0.726-2.86l12.829-12.829
|
|||
|
c1.106,0.86,2.44,1.436,3.898,1.619v10.16c-2.833,0.478-5,2.942-5,5.91c0,3.309,2.691,6,6,6s6-2.691,6-6c0-2.967-2.167-5.431-5-5.91
|
|||
|
v-10.16c1.458-0.183,2.792-0.759,3.898-1.619l7.669,7.669C41.215,39.576,41,40.26,41,41c0,2.206,1.794,4,4,4s4-1.794,4-4
|
|||
|
s-1.794-4-4-4c-0.74,0-1.424,0.215-2.019,0.567l-7.669-7.669C36.366,28.542,37,26.846,37,25s-0.634-3.542-1.688-4.897l9.665-9.665
|
|||
|
C46.042,11.405,47.451,12,49,12c3.309,0,6-2.691,6-6S52.309,0,49,0z M11,9c0-1.103,0.897-2,2-2s2,0.897,2,2s-0.897,2-2,2
|
|||
|
S11,10.103,11,9z M6,51c-2.206,0-4-1.794-4-4s1.794-4,4-4s4,1.794,4,4S8.206,51,6,51z M33,49c0,2.206-1.794,4-4,4s-4-1.794-4-4
|
|||
|
s1.794-4,4-4S33,46.794,33,49z M29,31c-3.309,0-6-2.691-6-6s2.691-6,6-6s6,2.691,6,6S32.309,31,29,31z M47,41c0,1.103-0.897,2-2,2
|
|||
|
s-2-0.897-2-2s0.897-2,2-2S47,39.897,47,41z M49,10c-2.206,0-4-1.794-4-4s1.794-4,4-4s4,1.794,4,4S51.206,10,49,10z"></path></svg></button></div><div id="global-graph-outer"><div id="global-graph-container" data-cfg="{"drag":true,"zoom":true,"depth":-1,"scale":0.9,"repelForce":0.5,"centerForce":0.3,"linkDistance":30,"fontSize":0.6,"opacityScale":1,"showTags":true,"removeTags":[],"focusOnHover":true}"></div></div></div><div class="toc desktop-only"><button type="button" id="toc" class aria-controls="toc-content" aria-expanded="true"><h3>目录</h3><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="fold"><polyline points="6 9 12 15 18 9"></polyline></svg></button><div id="toc-content" class><ul class="overflow"><li class="depth-0"><a href="#概述" data-for="概述">概述</a></li><li class="depth-1"><a href="#子模块" data-for="子模块">子模块</a></li><li class="depth-1"><a href="#优点" data-for="优点">优点</a></li><li class="depth-1"><a href="#缺点" data-for="缺点">缺点</a></li><li class="depth-1"><a href="#同类产品对比" data-for="同类产品对比">同类产品对比</a></li><li class="depth-0"><a href="#urllibrequest" data-for="urllibrequest">urllib.request</a></li><li class="depth-1"><a href="#urlopen-打开-url" data-for="urlopen-打开-url">urlopen() 打开 URL</a></li><li class="depth-1"><a href="#urlretrieve-下载文件" data-for="urlretrieve-下载文件">urlretrieve() 下载文件</a></li><li class="depth-1"><a href="#build_opener-和-install_opener" data-for="build_opener-和-install_opener">build_opener() 和 install_opener()</a></li><li class="depth-1"><a href="#http-基础认证-httpbasicauthhandler" data-for="http-基础认证-httpbasicauthhandler">HTTP 基础认证 (HTTPBasicAuthHandler)</a></li><li class="depth-1"><a href="#httpcookieprocessor-处理-cookies" data-for="httpcookieprocessor-处理-cookies">HTTPCookieProcessor 处理 Cookies</a></li><li class="depth-1"><a href="#proxyhandler-设置代理" data-for="proxyhandler-设置代理">ProxyHandler 设置代理</a></li><li class="depth-1"><a href="#request-自定义请求" data-for="request-自定义请求">Request() 自定义请求</a></li><li class="depth-0"><a href="#urlliberror" data-for="urlliberror">urllib.error</a></li><li class="depth-1"><a href="#urlerror" data-for="urlerror">URLError</a></li><li class="depth-1"><a href="#httperror" data-for="httperror">HTTPError</a></li><li class="depth-1"><a href="#contenttooshorterror" data-for="contenttooshorterror">ContentTooShortError</a></li><li class="depth-0"><a href="#urllibparse" data-for="urllibparse">urllib.parse</a></li><li class="depth-1"><a href="#解析和构建-url" data-for="解析和构建-url">解析和构建 URL</a></li><li class="depth-1"><a href="#转换查询字符串" data-for="转换查询字符串">转换查询字符串</a></li><li class="depth-1"><a href="#url-编码和解码" data-for="url-编码和解码">URL 编码和解码</a></li><li class="depth-0"><a href="#urllibrobotparser" data-for="urllibrobotparser">urllib.robotparser</a></li><li class="depth-1"><a href="#创建和设置-robotfileparser" data-for="创建和设置-robotfileparser">创建和设置 RobotFileParser</a></li><li class="depth-1"><a href="#检查爬虫是否可以访问特定页面" data-for="检查爬虫是否可以访问特定页面">检查爬虫是否可以访问特定页面</a></li><li class="depth-1"><a href="#手动解析-robotstxt" data-for="手动解析-robotstxt">手动解析 robots.txt</a></li></ul></div></div><div class="explorer mobile-only"><button type="button" id="explorer" data-behavior="collapse" data-collapsed="collapsed" data-savestate="true" data-tree="[{"path":"Personal","collapsed":true},{"path":"Personal/Blog","collapsed":true},{"pat
|
|||
|
</script><script type="module">
|
|||
|
let mermaidImport = undefined
|
|||
|
document.addEventListener('nav', async () => {
|
|||
|
if (document.querySelector("code.mermaid")) {
|
|||
|
mermaidImport ||= await import('https://cdnjs.cloudflare.com/ajax/libs/mermaid/10.7.0/mermaid.esm.min.mjs')
|
|||
|
const mermaid = mermaidImport.default
|
|||
|
const darkMode = document.documentElement.getAttribute('saved-theme') === 'dark'
|
|||
|
mermaid.initialize({
|
|||
|
startOnLoad: false,
|
|||
|
securityLevel: 'loose',
|
|||
|
theme: darkMode ? 'dark' : 'default'
|
|||
|
})
|
|||
|
|
|||
|
await mermaid.run({
|
|||
|
querySelector: '.mermaid'
|
|||
|
})
|
|||
|
}
|
|||
|
});
|
|||
|
</script><script src="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.9/contrib/copy-tex.min.js" type="application/javascript"></script><script src="../../../../../postscript.js" type="module"></script></html>
|