1
0
wiki/Technology/ProgrammingLanguage/Python/模块/文本处理/re-----正则表达式操作.html

59 lines
458 KiB
HTML
Raw Normal View History

2024-10-08 11:37:55 +08:00
<!DOCTYPE html>
<html lang="zh"><head><title>re --- 正则表达式操作</title><meta charset="utf-8"/><link rel="preconnect" href="https://fonts.googleapis.com"/><link rel="preconnect" href="https://fonts.gstatic.com"/><link rel="stylesheet" href="https://fonts.googleapis.com/css2?family=IBM Plex Mono&amp;family=Noto Serif Simplified Chinese:wght@400;700&amp;family=Source Sans Pro:ital,wght@0,400;0,600;1,400;1,600&amp;display=swap"/><meta name="viewport" content="width=device-width, initial-scale=1.0"/><meta property="og:title" content="re --- 正则表达式操作"/><meta property="og:description" content="re --- 正则表达式操作."/><meta property="og:image" content="https://wiki.7wate.com/static/og-image.png"/><meta property="og:width" content="1200"/><meta property="og:height" content="675"/><link rel="icon" href="../../../../../static/icon.png"/><meta name="description" content="re --- 正则表达式操作."/><meta name="generator" content="Quartz"/><link href="../../../../../index.css" rel="stylesheet" type="text/css" spa-preserve/><link href="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.9/katex.min.css" rel="stylesheet" type="text/css" spa-preserve/><script src="../../../../../prescript.js" type="application/javascript" spa-preserve></script><script type="application/javascript" spa-preserve>const fetchData = fetch("../../../../../static/contentIndex.json").then(data => data.json())</script></head><body data-slug="Technology/ProgrammingLanguage/Python/模块/文本处理/re-----正则表达式操作"><div id="quartz-root" class="page"><div id="quartz-body"><div class="left sidebar"><h2 class="page-title"><a href="../../../../..">🪴 X·Eden</a></h2><div class="spacer mobile-only"></div><div class="search"><button class="search-button" id="search-button"><p>搜索</p><svg role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 19.9 19.7"><title>Search</title><g class="search-path" fill="none"><path stroke-linecap="square" d="M18.5 18.3l-5.4-5.4"></path><circle cx="8" cy="8" r="7"></circle></g></svg></button><div id="search-container"><div id="search-space"><input autocomplete="off" id="search-bar" name="search" type="text" aria-label="搜索些什么" placeholder="搜索些什么"/><div id="search-layout" data-preview="true"></div></div></div></div><button class="darkmode" id="darkmode"><svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" version="1.1" id="dayIcon" x="0px" y="0px" viewBox="0 0 35 35" style="enable-background:new 0 0 35 35" xml:space="preserve" aria-label="暗色模式"><title>暗色模式</title><path d="M6,17.5C6,16.672,5.328,16,4.5,16h-3C0.672,16,0,16.672,0,17.5 S0.672,19,1.5,19h3C5.328,19,6,18.328,6,17.5z M7.5,26c-0.414,0-0.789,0.168-1.061,0.439l-2,2C4.168,28.711,4,29.086,4,29.5 C4,30.328,4.671,31,5.5,31c0.414,0,0.789-0.168,1.06-0.44l2-2C8.832,28.289,9,27.914,9,27.5C9,26.672,8.329,26,7.5,26z M17.5,6 C18.329,6,19,5.328,19,4.5v-3C19,0.672,18.329,0,17.5,0S16,0.672,16,1.5v3C16,5.328,16.671,6,17.5,6z M27.5,9 c0.414,0,0.789-0.168,1.06-0.439l2-2C30.832,6.289,31,5.914,31,5.5C31,4.672,30.329,4,29.5,4c-0.414,0-0.789,0.168-1.061,0.44 l-2,2C26.168,6.711,26,7.086,26,7.5C26,8.328,26.671,9,27.5,9z M6.439,8.561C6.711,8.832,7.086,9,7.5,9C8.328,9,9,8.328,9,7.5 c0-0.414-0.168-0.789-0.439-1.061l-2-2C6.289,4.168,5.914,4,5.5,4C4.672,4,4,4.672,4,5.5c0,0.414,0.168,0.789,0.439,1.06 L6.439,8.561z M33.5,16h-3c-0.828,0-1.5,0.672-1.5,1.5s0.672,1.5,1.5,1.5h3c0.828,0,1.5-0.672,1.5-1.5S34.328,16,33.5,16z M28.561,26.439C28.289,26.168,27.914,26,27.5,26c-0.828,0-1.5,0.672-1.5,1.5c0,0.414,0.168,0.789,0.439,1.06l2,2 C28.711,30.832,29.086,31,29.5,31c0.828,0,1.5-0.672,1.5-1.5c0-0.414-0.168-0.789-0.439-1.061L28.561,26.439z M17.5,29 c-0.829,0-1.5,0.672-1.5,1.5v3c0,0.828,0.671,1.5,1.5,1.5s1.5-0.672,1.5-1.5v-3C19,29.672,18.329,29,17.5,29z M17.5,7 C11.71,7,7,11.71,7,17.5S11.71,28,17.5,28S28,23.29,28,17.5S23.29,7,17.5,7z M17.5,25c-4.136,0-7.5-3.364-7.5-7.5 c0-4.136,3.364-7.5,7.5-7.5c4.136,0,7.5,3.364,7.5,7.5C25,21.636,21.636,25,17.5,25z"></path></svg><svg xmlns="http://www
<h2 id="正则表达式">正则表达式<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#正则表达式" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h2>
<p>正则表达式是一种特殊的文本字符串,用于描述搜索或匹配文本的一个模式。例如,你可以用它来查找所有的电子邮件地址、电话号码或特定单词的出现。</p>
<p>编写正则表达式需要学习其特殊的语法,其中一些常用的元素包括:</p>
<ul>
<li><strong>字面字符</strong>:大多数字符在正则表达式中表示它们自己。</li>
<li><strong>元字符</strong>:如 <code>.</code>(匹配任何单个字符)、<code>^</code>(匹配字符串的开始)、<code>$</code>(匹配字符串的结尾)。</li>
<li><strong>字符类</strong>:用方括号定义,如 <code>[a-z]</code> 匹配任何小写字母。</li>
<li><strong>特殊序列</strong>:如 <code>\d</code> 匹配任何数字,<code>\s</code> 匹配任何空白字符。</li>
</ul>
<h2 id="python-中的-re-模块">Python 中的 <code>re</code> 模块<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#python-中的-re-模块" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h2>
<p>在 Python 中,<code>re</code> 模块提供了一系列函数和语法用于在字符串中应用正则表达式。首先,你需要导入模块:</p>
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">import</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> re</span></span></code></pre></figure>
<h2 id="基本函数">基本函数<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#基本函数" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h2>
<p><code>re</code> 模块提供了几个基本函数来处理正则表达式:</p>
<ul>
<li><strong><code>re.search(pattern, string)</code></strong>:在字符串中搜索第一个与模式匹配的部分。如果找到匹配,它返回一个匹配对象;否则返回 <code>None</code></li>
<li><strong><code>re.match(pattern, string)</code></strong>:如果字符串的开始处匹配模式,返回一个匹配对象。</li>
<li><strong><code>re.findall(pattern, string)</code></strong>:返回字符串中所有与模式匹配的部分的列表。</li>
<li><strong><code>re.sub(pattern, repl, string)</code></strong>:替换字符串中的模式匹配项。</li>
</ul>
<h2 id="分组捕获">分组捕获<a role="anchor" aria-hidden="true" tabindex="-1" data-no-popover="true" href="#分组捕获" class="internal"><svg width="18" height="18" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M10 13a5 5 0 0 0 7.54.54l3-3a5 5 0 0 0-7.07-7.07l-1.72 1.71"></path><path d="M14 11a5 5 0 0 0-7.54-.54l-3 3a5 5 0 0 0 7.07 7.07l1.71-1.71"></path></svg></a></h2>
<p>使用圆括号可以在正则表达式中创建“组”。组可以帮助你提取正则表达式匹配的部分。例如:</p>
<figure data-rehype-pretty-code-figure><pre tabindex="0" data-language="python" data-theme="github-light github-dark"><code data-language="python" data-theme="github-light github-dark" style="display:grid;"><span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">pattern </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;"> r</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'</span><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">(\b\w</span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">+</span><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">)</span><span style="--shiki-light:#032F62;--shiki-dark:#DBEDFF;">@</span><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">(\w</span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">+</span><span style="--shiki-light:#22863A;--shiki-dark:#85E89D;--shiki-light-font-weight:bold;--shiki-dark-font-weight:bold;">\.</span><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">\w</span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">+</span><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">)</span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">'</span></span>
<span data-line><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">match </span><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">=</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> re.search(pattern, </span><span style="--shiki-light:#032F62;--shiki-dark:#9ECBFF;">&quot;username@example.com&quot;</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)</span></span>
<span data-line><span style="--shiki-light:#D73A49;--shiki-dark:#F97583;">if</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;"> match:</span></span>
<span data-line><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;"> print</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">(match.group()) </span><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 整个匹配</span></span>
<span data-line><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;"> print</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">(match.group(</span><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">1</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)) </span><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 第一个括号内的匹配</span></span>
<span data-line><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;"> print</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">(match.group(</span><span style="--shiki-light:#005CC5;--shiki-dark:#79B8FF;">2</span><span style="--shiki-light:#24292E;--shiki-dark:#E1E4E8;">)) </span><span style="--shiki-light:#6A737D;--shiki-dark:#6A737D;"># 第二个括号内的匹配</span></span></code></pre></figure></article><hr/><div class="page-footer"></div></div><div class="right sidebar"><div class="graph"><h3>关系图谱</h3><div class="graph-outer"><div id="graph-container" data-cfg="{&quot;drag&quot;:true,&quot;zoom&quot;:true,&quot;depth&quot;:1,&quot;scale&quot;:1.1,&quot;repelForce&quot;:0.5,&quot;centerForce&quot;:0.3,&quot;linkDistance&quot;:30,&quot;fontSize&quot;:0.6,&quot;opacityScale&quot;:1,&quot;showTags&quot;:true,&quot;removeTags&quot;:[],&quot;focusOnHover&quot;:false}"></div><button id="global-graph-icon" aria-label="Global Graph"><svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" x="0px" y="0px" viewBox="0 0 55 55" fill="currentColor" xml:space="preserve"><path d="M49,0c-3.309,0-6,2.691-6,6c0,1.035,0.263,2.009,0.726,2.86l-9.829,9.829C32.542,17.634,30.846,17,29,17
s-3.542,0.634-4.898,1.688l-7.669-7.669C16.785,10.424,17,9.74,17,9c0-2.206-1.794-4-4-4S9,6.794,9,9s1.794,4,4,4
c0.74,0,1.424-0.215,2.019-0.567l7.669,7.669C21.634,21.458,21,23.154,21,25s0.634,3.542,1.688,4.897L10.024,42.562
C8.958,41.595,7.549,41,6,41c-3.309,0-6,2.691-6,6s2.691,6,6,6s6-2.691,6-6c0-1.035-0.263-2.009-0.726-2.86l12.829-12.829
c1.106,0.86,2.44,1.436,3.898,1.619v10.16c-2.833,0.478-5,2.942-5,5.91c0,3.309,2.691,6,6,6s6-2.691,6-6c0-2.967-2.167-5.431-5-5.91
v-10.16c1.458-0.183,2.792-0.759,3.898-1.619l7.669,7.669C41.215,39.576,41,40.26,41,41c0,2.206,1.794,4,4,4s4-1.794,4-4
s-1.794-4-4-4c-0.74,0-1.424,0.215-2.019,0.567l-7.669-7.669C36.366,28.542,37,26.846,37,25s-0.634-3.542-1.688-4.897l9.665-9.665
C46.042,11.405,47.451,12,49,12c3.309,0,6-2.691,6-6S52.309,0,49,0z M11,9c0-1.103,0.897-2,2-2s2,0.897,2,2s-0.897,2-2,2
S11,10.103,11,9z M6,51c-2.206,0-4-1.794-4-4s1.794-4,4-4s4,1.794,4,4S8.206,51,6,51z M33,49c0,2.206-1.794,4-4,4s-4-1.794-4-4
s1.794-4,4-4S33,46.794,33,49z M29,31c-3.309,0-6-2.691-6-6s2.691-6,6-6s6,2.691,6,6S32.309,31,29,31z M47,41c0,1.103-0.897,2-2,2
s-2-0.897-2-2s0.897-2,2-2S47,39.897,47,41z M49,10c-2.206,0-4-1.794-4-4s1.794-4,4-4s4,1.794,4,4S51.206,10,49,10z"></path></svg></button></div><div id="global-graph-outer"><div id="global-graph-container" data-cfg="{&quot;drag&quot;:true,&quot;zoom&quot;:true,&quot;depth&quot;:-1,&quot;scale&quot;:0.9,&quot;repelForce&quot;:0.5,&quot;centerForce&quot;:0.3,&quot;linkDistance&quot;:30,&quot;fontSize&quot;:0.6,&quot;opacityScale&quot;:1,&quot;showTags&quot;:true,&quot;removeTags&quot;:[],&quot;focusOnHover&quot;:true}"></div></div></div><div class="toc desktop-only"><button type="button" id="toc" class aria-controls="toc-content" aria-expanded="true"><h3>目录</h3><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="fold"><polyline points="6 9 12 15 18 9"></polyline></svg></button><div id="toc-content" class><ul class="overflow"><li class="depth-0"><a href="#正则表达式" data-for="正则表达式">正则表达式</a></li><li class="depth-0"><a href="#python-中的-re-模块" data-for="python-中的-re-模块">Python 中的 re 模块</a></li><li class="depth-0"><a href="#基本函数" data-for="基本函数">基本函数</a></li><li class="depth-0"><a href="#分组捕获" data-for="分组捕获">分组捕获</a></li></ul></div></div><div class="explorer mobile-only"><button type="button" id="explorer" data-behavior="collapse" data-collapsed="collapsed" data-savestate="true" data-tree="[{&quot;path&quot;:&quot;Personal&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Blog&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Blog/2018&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Blog/2020&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Blog/2021&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Blog/2022&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Blog/2023&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Blog/2024&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/个人成长&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/医学健康&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/历史&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/哲学宗教&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/心理&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/政治军事&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/教育学习&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/文学&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/生活百科&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/社会文化&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/科学技术&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/经济理财&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/艺术&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Book/计算机&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Journal&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Journal/2022&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Journal/2022/W34&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Journal/2022/W35&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Journal/2022/W36&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Journal/2022/W37&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Journal/2022/W38&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Journal/2022/W39&quot;,&quot;collapsed&quot;:true},{&quot;path&quot;:&quot;Personal/Journal/
</script><script type="module">
let mermaidImport = undefined
document.addEventListener('nav', async () => {
if (document.querySelector("code.mermaid")) {
mermaidImport ||= await import('https://cdnjs.cloudflare.com/ajax/libs/mermaid/10.7.0/mermaid.esm.min.mjs')
const mermaid = mermaidImport.default
const darkMode = document.documentElement.getAttribute('saved-theme') === 'dark'
mermaid.initialize({
startOnLoad: false,
securityLevel: 'loose',
theme: darkMode ? 'dark' : 'default'
})
await mermaid.run({
querySelector: '.mermaid'
})
}
});
</script><script src="https://cdnjs.cloudflare.com/ajax/libs/KaTeX/0.16.9/contrib/copy-tex.min.js" type="application/javascript"></script><script src="../../../../../postscript.js" type="module"></script></html>