Golang 正则表达式实用技巧汇总

正则表达式（Regular Expression，简称 Regex）是一种强大的文本处理工具，它使用特定的字符序列来描述、匹配和操作字符串。在 Golang 中，regexp 包提供了对正则表达式的全面支持。无论是进行文本验证、搜索、替换还是分割，掌握正则表达式都能显著提升开发效率。本文将深入探讨 Golang 中正则表达式的各种实用技巧，并通过丰富的示例代码进行详细说明。

1. 基础：`regexp` 包概览

Golang 的 regexp 包实现了 RE2 风格的正则表达式。RE2 引擎的一个重要特点是保证在最坏情况下的时间复杂度为线性时间（O(n)，其中 n 是输入字符串的长度），这避免了某些正则表达式引擎可能出现的灾难性回溯问题。

1.1. 编译正则表达式：`Compile` 和 `MustCompile`

要使用正则表达式，首先需要将其编译成一个 *regexp.Regexp 对象。regexp 包提供了两个主要的编译函数：

Compile(expr string) (*regexp.Regexp, error): 编译给定的正则表达式字符串 expr。如果编译成功，返回一个 *regexp.Regexp 对象；如果编译失败（例如，正则表达式语法错误），返回一个非空的错误。
MustCompile(expr string) *regexp.Regexp: 与 Compile 类似，但它在编译失败时会引发 panic，而不是返回错误。这适用于在程序初始化阶段编译已知正确的正则表达式，简化错误处理。

“`go
package main

import (
“fmt”
“regexp”
)

func main() {
// 使用 Compile，需要处理错误
re1, err := regexp.Compile(^[a-z]+\[[0-9]+\]$)
if err != nil {
fmt.Println(“正则表达式编译错误:”, err)
return
}

// 使用 MustCompile，编译失败会 panic
re2 := regexp.MustCompile(`\d{4}-\d{2}-\d{2}`) // 匹配日期格式 YYYY-MM-DD

fmt.Println(re1.MatchString("abc[123]")) // true
fmt.Println(re2.MatchString("2023-10-27")) // true

}
“`

最佳实践：

对于在运行时动态生成的正则表达式，使用 Compile 并妥善处理错误。
对于在程序启动时就确定的、且已知正确的正则表达式，使用 MustCompile 可以简化代码。
将常用的正则表达式编译结果缓存起来，避免重复编译，提高性能（尤其是在循环中）。可以把regexp对象做为全局变量

1.2. 基本匹配方法

*regexp.Regexp 对象提供了多种方法来检查一个字符串是否与正则表达式匹配：

Match(b []byte) bool: 检查字节切片 b 是否与正则表达式匹配。
MatchString(s string) bool: 检查字符串 s 是否与正则表达式匹配。
MatchReader(r io.RuneReader) bool: 检查 io.RuneReader 接口提供的字符流是否与正则表达式匹配（这对于处理大型文本流很有用）。

“`go
package main

import (
“fmt”
“regexp”
“strings”
)

func main() {
re := regexp.MustCompile(Go(lang)?)

fmt.Println(re.MatchString("Go"))          // true
fmt.Println(re.MatchString("Golang"))      // true
fmt.Println(re.MatchString("Python"))      // false

fmt.Println(re.Match([]byte("Golang")))   // true

reader := strings.NewReader("Go is awesome!")
fmt.Println(re.MatchReader(reader))      // true

}
“`

2. 进阶：捕获组和子匹配

捕获组是正则表达式中用圆括号 () 括起来的部分。它们允许你提取匹配文本的特定部分。

2.1. 提取捕获组：`FindStringSubmatch` 和 `FindAllStringSubmatch`

FindStringSubmatch(s string) []string: 返回一个字符串切片，其中第一个元素是整个匹配的文本，后续元素是每个捕获组匹配的文本（如果没有捕获组，则只有整个匹配的文本）。如果没有找到匹配，返回 nil。
FindAllStringSubmatch(s string, n int) [][]string: 返回一个二维字符串切片，其中每个内部切片对应一个匹配，其结构与 FindStringSubmatch 的返回值相同。n 参数控制返回的最大匹配数，如果 n 为 -1，则返回所有匹配。

“`go
package main

import (
“fmt”
“regexp”
)

func main() {
re := regexp.MustCompile((\w+)=(\w+)) // 匹配 key=value 形式的字符串
s := “name=John,age=30,city=New York”

// FindStringSubmatch：只查找第一个匹配
match := re.FindStringSubmatch(s)
if match != nil {
    fmt.Println("Full match:", match[0]) // Full match: name=John
    fmt.Println("Key:", match[1])       // Key: name
    fmt.Println("Value:", match[2])     // Value: John
}

// FindAllStringSubmatch：查找所有匹配
allMatches := re.FindAllStringSubmatch(s, -1)
for _, m := range allMatches {
    fmt.Printf("Full match: %s, Key: %s, Value: %s\n", m[0], m[1], m[2])
    // 输出：
    // Full match: name=John, Key: name, Value: John
    // Full match: age=30, Key: age, Value: 30
    // Full match: city=New York, Key: city, Value: New York
}

}
“`

2.2. 命名捕获组：`(?P<Name>...)`

Golang 支持命名捕获组，这使得你可以通过名称而不是索引来访问捕获组。语法是 (?P<Name>pattern)，其中 Name 是捕获组的名称，pattern 是要匹配的模式。

SubexpNames() []string: 返回一个字符串切片，其中包含所有捕获组的名称（包括未命名的捕获组，它们在切片中的名称为空字符串）。
FindStringSubmatch 和 FindAllStringSubmatch 的返回值中，命名捕获组的匹配结果仍然可以通过索引访问，也可以通过名称访问。但是, FindStringSubmatchMap只支持命名分组.

“`go
package main

import (
“fmt”
“regexp”
)

func main() {
re := regexp.MustCompile((?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2}))
s := “2023-10-27”

match := re.FindStringSubmatch(s)
if match != nil {
    fmt.Println("Full match:", match[0])     // Full match: 2023-10-27
    fmt.Println("Year:", match[1])          // Year: 2023 (通过索引)
    fmt.Println("Year:", match[re.SubexpIndex("Year")])  //Year: 2023
    fmt.Println("Month:", match[2])         // Month: 10 (通过索引)
    fmt.Println("Month:", match[re.SubexpIndex("Month")]) //Month: 10
    fmt.Println("Day:", match[3])           // Day: 27 (通过索引)
    fmt.Println("Day:", match[re.SubexpIndex("Day")]) //Day: 27

}

fmt.Printf("names: %v\n", re.SubexpNames()) // ["" "Year" "Month" "Day"]

//使用 FindStringSubmatchMap
matchMap := re.FindStringSubmatchMap(s)
fmt.Printf("matchMap: %v\n", matchMap) //map[Day:27 Month:10 Year:2023]
fmt.Printf("Year: %v\n", matchMap["Year"])

}
“`

注意： FindStringSubmatchMap和相关的*Map类型方法，只返回命名分组的结果。

3. 高级：替换、分割和查找

除了匹配和提取，regexp 包还提供了强大的文本替换、分割和查找功能。

3.1. 替换：`ReplaceAllString` 和 `ReplaceAllStringFunc`

ReplaceAllString(src, repl string) string: 将 src 中所有与正则表达式匹配的子串替换为 repl。repl 中可以使用 $n 或 ${name} 来引用捕获组（$0 表示整个匹配）。
ReplaceAllStringFunc(src string, repl func(string) string) string: 与 ReplaceAllString 类似，但它允许你使用一个函数来动态生成替换字符串。该函数接收每个匹配的子串作为参数，并返回要替换成的字符串。

“`go
package main

import (
“fmt”
“regexp”
“strconv”
“strings”
)

func main() {
re := regexp.MustCompile((\d+)\s*(°C|°F))
s := “It’s 25°C outside, which is 77°F.”

// 使用 ReplaceAllString 进行简单替换
s1 := re.ReplaceAllString(s, "$1 degrees $2")
fmt.Println(s1) // It's 25 degrees °C outside, which is 77 degrees °F.

// 使用 ReplaceAllStringFunc 进行复杂替换（例如，摄氏度转华氏度）
s2 := re.ReplaceAllStringFunc(s, func(match string) string {
    submatches := re.FindStringSubmatch(match)
    value, _ := strconv.Atoi(submatches[1])
    unit := submatches[2]

    if unit == "°C" {
        fahrenheit := (value * 9 / 5) + 32
        return fmt.Sprintf("%d°F", fahrenheit)
    } else if unit == "°F" {
        celsius := (value - 32) * 5 / 9
        return fmt.Sprintf("%d°C", celsius)
    }
    return match // 如果不是 °C 或 °F，则不替换
})
fmt.Println(s2) // It's 77°F outside, which is 25°C.

}
“`

3.2. 分割：`Split`

Split(s string, n int) []string: 使用正则表达式作为分隔符，将字符串 s 分割成多个子串。n 参数控制返回的最大子串数：
- n > 0: 最多分割成 n 个子串，最后一个子串包含剩余的所有未分割部分。
- n == 0: 返回一个空的切片。
- n < 0: 返回所有子串。

“`go
package main

import (
“fmt”
“regexp”
)

func main() {
re := regexp.MustCompile(\s*,\s*) // 匹配逗号，允许逗号前后有空格
s := “apple, banana,orange, grape”

parts := re.Split(s, -1)
fmt.Println(parts) // [apple banana orange grape]

}
“`

3.3 查找所有匹配的索引: `FindAllIndex`

FindAllIndex(b []byte, n int) [][]int: 查找所有匹配项，并返回其开始和结束索引。这对于需要精确控制匹配位置的场景非常有用。

“`go
package main

import (
“fmt”
“regexp”
)

func main() {
re := regexp.MustCompile(\w+)
text := []byte(“Hello, world!”)

indexes := re.FindAllIndex(text, -1)
for _, index := range indexes {
    start, end := index[0], index[1]
    fmt.Printf("Match: %s, Start: %d, End: %d\n", text[start:end], start, end)
}
//Match: Hello, Start: 0, End: 5
//Match: world, Start: 7, End: 12

}
“`

4. 其他实用技巧和注意事项

4.1. 忽略大小写：`(?i)`

要在正则表达式中忽略大小写，可以在模式的开头添加 (?i) 标志。

go re := regexp.MustCompile(`(?i)hello`) // 匹配 "hello", "Hello", "HELLO" 等

4.2. 多行模式：`(?m)`

默认情况下，^ 和 $ 分别匹配字符串的开头和结尾。在多行模式下（使用 (?m) 标志），它们分别匹配每一行的开头和结尾。

go re := regexp.MustCompile(`(?m)^line \d+$`) // 匹配以 "line " 开头，后跟数字，然后是行尾的行

4.3. 单行模式（点号匹配换行符）：`(?s)`

默认情况下，点号 . 匹配除换行符 \n 之外的任何字符。在单行模式下（使用 (?s) 标志），点号可以匹配包括换行符在内的任何字符。

go re := regexp.MustCompile(`(?s)start.*end`) // 匹配 "start" 和 "end" 之间的任何内容，包括换行符

4.4 预定义字符类

Golang 正则表达式支持许多预定义的字符类，可以简化模式的编写：

字符类	描述
`\d`	数字 (等价于 `[0-9]`)
`\D`	非数字 (等价于 `[^0-9]`)
`\s`	空白字符 (空格、制表符、换行符等)
`\S`	非空白字符
`\w`	单词字符 (字母、数字、下划线)
`\W`	非单词字符
`.`	除换行符外的任意字符

4.5. 贪婪 vs. 非贪婪

默认情况下，量词（*、+、?、{n,m}）是贪婪的，它们会尽可能多地匹配字符。要使它们变为非贪婪（尽可能少地匹配），可以在量词后面加上 ?。

go re := regexp.MustCompile(`<.*?>`) // 非贪婪匹配 HTML 标签

4.6. 避免不必要的分组

如果不需要捕获分组的内容，可以使用非捕获分组 (?:...)，这可以提高性能，因为引擎不需要存储捕获组的结果。
go re := regexp.MustCompile(`(?:\d{4})-(?:\d{2})-(?:\d{2})`) //匹配日期,但是不捕获年月日

4.7. 字符串字面量

在Go代码中写正则表达式时, 使用 ` 来定义原始字符串字面量，避免对反斜杠进行双重转义。

“`go
// bad
re, _ := regexp.Compile(“\d+”)

// good
re, _ := regexp.Compile(\d+)
“`

总结

Golang 的 regexp 包提供了强大而灵活的正则表达式功能。通过掌握本文介绍的各种技巧，你可以高效地处理各种文本处理任务，包括：

验证用户输入
从文本中提取数据
查找和替换文本
分割字符串
解析日志文件
处理配置文件
… 等等

希望这篇文章能帮助你更好地理解和应用 Golang 中的正则表达式。记住，实践是掌握正则表达式的关键，多写多练，你就能熟练运用这项强大的工具。

golang 正则表达式实用技巧汇总 – wiki基地

Golang 正则表达式实用技巧汇总

1. 基础：`regexp` 包概览

1.1. 编译正则表达式：`Compile` 和 `MustCompile`

1.2. 基本匹配方法

2. 进阶：捕获组和子匹配

2.1. 提取捕获组：`FindStringSubmatch` 和 `FindAllStringSubmatch`

2.2. 命名捕获组：`(?P<Name>...)`

3. 高级：替换、分割和查找

3.1. 替换：`ReplaceAllString` 和 `ReplaceAllStringFunc`

3.2. 分割：`Split`

3.3 查找所有匹配的索引: `FindAllIndex`

4. 其他实用技巧和注意事项

4.1. 忽略大小写：`(?i)`

4.2. 多行模式：`(?m)`

4.3. 单行模式（点号匹配换行符）：`(?s)`

4.4 预定义字符类

4.5. 贪婪 vs. 非贪婪

4.6. 避免不必要的分组

4.7. 字符串字面量

总结

发表评论取消回复

Golang 正则表达式实用技巧汇总

1. 基础：regexp 包概览

1.1. 编译正则表达式：Compile 和 MustCompile

1.2. 基本匹配方法

2. 进阶：捕获组和子匹配

2.1. 提取捕获组：FindStringSubmatch 和 FindAllStringSubmatch

2.2. 命名捕获组：(?P<Name>...)

3. 高级：替换、分割和查找

3.1. 替换：ReplaceAllString 和 ReplaceAllStringFunc

3.2. 分割：Split

3.3 查找所有匹配的索引: FindAllIndex

4. 其他实用技巧和注意事项

4.1. 忽略大小写：(?i)

4.2. 多行模式：(?m)

4.3. 单行模式（点号匹配换行符）：(?s)

4.4 预定义字符类

4.5. 贪婪 vs. 非贪婪

4.6. 避免不必要的分组

4.7. 字符串字面量

总结

发表评论 取消回复

1. 基础：`regexp` 包概览

1.1. 编译正则表达式：`Compile` 和 `MustCompile`

2.1. 提取捕获组：`FindStringSubmatch` 和 `FindAllStringSubmatch`

2.2. 命名捕获组：`(?P<Name>...)`

3.1. 替换：`ReplaceAllString` 和 `ReplaceAllStringFunc`

3.2. 分割：`Split`

3.3 查找所有匹配的索引: `FindAllIndex`

4.1. 忽略大小写：`(?i)`

4.2. 多行模式：`(?m)`

4.3. 单行模式（点号匹配换行符）：`(?s)`

发表评论取消回复