数学重学 - 13 排列组合与计数

lucy

2026-04-04

数学重学

数学, 编程数学

这是数学重学路线图阶段二的子页面

📌 排列组合是概率论的前置知识，也是安全领域（密码强度、暴力破解）和后端领域（测试用例、API设计）的基础工具。

一、两大基本原理

1.1 加法原理（分类计数）

做一件事有 多种方案，方案之间互斥（选了A就不选B），总数 = 各方案数之和

例子：从北京到上海，飞机3班、高铁5班、大巴2班 → 总共 3+5+2=10 种走法

关键词：”或”、”分类”

1.2 乘法原理（分步计数）

做一件事分 多个步骤，每步必须都完成，总数 = 各步骤方案数之积

例子：从北京到上海再到杭州，北京→上海10种，上海→杭州4种 → 总共 10×4=40 种走法

关键词：”且”、”分步”

1.3 区分加法和乘法

加法：一步完成，多种选择（选其一）

乘法：多步完成，每步都要做（全都做）

# 加法原理
flights = 3
trains = 5
buses = 2
total_or = flights + trains + buses  # 10种（选一种）

# 乘法原理
bj_to_sh = 10
sh_to_hz = 4
total_and = bj_to_sh * sh_to_hz  # 40种（都要走）

print(f"加法（分类）: {total_or}")
print(f"乘法（分步）: {total_and}")

二、排列：有顺序

2.1 公式

从n个不同元素中取k个，有顺序地排列

P(n,k) = n! / (n-k)! = n × (n-1) × … × (n-k+1)

特别地，P(n,n) = n!（全排列）

2.2 直觉

从10人中选3人排队（第一名、第二名、第三名）

第一名：10种选择

第二名：9种选择（少了一个人）

第三名：8种选择

P(10,3) = 10 × 9 × 8 = 720 种

2.3 Python 计算

import math

# P(n,k) = n! / (n-k)!
# Python 3.8+ 有 math.perm
print(f"P(10,3) = {math.perm(10, 3)}")  # 720

# 手动计算
def permutation(n, k):
return math.factorial(n) // math.factorial(n - k)

print(f"P(10,3) = {permutation(10, 3)}")  # 720

# 全排列
print(f"P(5,5) = 5! = {math.perm(5, 5)}")  # 120

2.4 枚举排列

from itertools import permutations

# 从 [A, B, C, D] 中取2个排列
items = ['A', 'B', 'C', 'D']
perms = list(permutations(items, 2))

print(f"P(4,2) = {len(perms)} 种")  # 12
for p in perms:
print(f"  {''.join(p)}", end="")
# AB AC AD BA BC BD CA CB CD DA DB DC
print()

# 注意：AB 和 BA 是不同的排列（顺序不同）

三、组合：无顺序

3.1 公式

从n个不同元素中取k个，不考虑顺序

C(n,k) = n! / [k! × (n-k)!] = P(n,k) / k!

直觉：先排列，再除以k个元素自身的排列数（因为顺序不重要）

3.2 直觉

从10人中选3人组队（没有先后之分）

先排列：P(10,3) = 720

3人之间的排列数：3! = 6（ABC, ACB, BAC, BCA, CAB, CBA 算同一组）

C(10,3) = 720 / 6 = 120 种

3.3 对称性：C(n,k) = C(n, n-k)

从10人中选3人 = 从10人中选7人不去（确定了去的就确定了不去的）

C(10,3) = C(10,7) = 120

实用意义：C(100,98) = C(100,2) = 4950，从大的那边算更快

3.4 Python 计算

import math

# C(n,k)
# Python 3.8+ 有 math.comb
print(f"C(10,3) = {math.comb(10, 3)}")  # 120

# 验证对称性
print(f"C(10,7) = {math.comb(10, 7)}")  # 120
print(f"C(100,98) = {math.comb(100, 98)}")  # 4950
print(f"C(100,2) = {math.comb(100, 2)}")    # 4950

# 杨辉三角（帕斯卡三角）：C(n,k) = C(n-1,k-1) + C(n-1,k)
def pascal_triangle(rows):
for n in range(rows):
    row = [math.comb(n, k) for k in range(n+1)]
    print(f"  n={n}: {row}")

pascal_triangle(6)
# n=0: [1]
# n=1: [1, 1]
# n=2: [1, 2, 1]
# n=3: [1, 3, 3, 1]
# n=4: [1, 4, 6, 4, 1]
# n=5: [1, 5, 10, 10, 5, 1]

3.5 枚举组合

from itertools import combinations

# 从 [A, B, C, D] 中取2个组合
items = ['A', 'B', 'C', 'D']
combs = list(combinations(items, 2))

print(f"C(4,2) = {len(combs)} 种")  # 6
for c in combs:
print(f"  {''.join(c)}", end="")
# AB AC AD BC BD CD
print()

# 注意：AB 和 BA 是同一个组合（不区分顺序）

四、重复排列：n^k

4.1 概念

每次选择都有n种选项，选k次，每次可以重复选

总数 = n^k

4.2 密码空间

密码类型	字符集大小n	位数k	空间n^k	数量级
纯数字	10	4	10,000	1万
纯数字	10	6	1,000,000	100万
纯小写字母	26	8	208,827,064,576	2千亿
大小写+数字	62	8	218,340,105,584,896	218万亿
可打印ASCII	95	8	6,634,204,312,890,625	6.6千万亿
可打印ASCII	95	12	~5.4×10^23	天文数字

4.3 Python 计算密码空间

import math

def password_space(charset_size, length):
"""计算密码空间大小"""
return charset_size ** length

# 各种密码的空间
configs = [
("4位纯数字(如PIN)", 10, 4),
("6位纯数字", 10, 6),
("8位纯小写", 26, 8),
("8位大小写+数字", 62, 8),
("8位全ASCII可打印", 95, 8),
("12位全ASCII可打印", 95, 12),
("16位全ASCII可打印", 95, 16),
]

for name, n, k in configs:
space = password_space(n, k)
print(f"  {name}: {space:,} 种 ({space:.2e})")

五、安全应用：密码强度与暴力破解

5.1 密码强度计算器

import math
import string

def analyze_password_strength(password):
"""分析密码强度"""
charset = 0
has_lower = any(c in string.ascii_lowercase for c in password)
has_upper = any(c in string.ascii_uppercase for c in password)
has_digit = any(c in string.digits for c in password)
has_special = any(c in string.punctuation for c in password)

if has_lower: charset += 26
if has_upper: charset += 26
if has_digit: charset += 10
if has_special: charset += 33  # 常见特殊字符约33个

length = len(password)
space = charset ** length
entropy = math.log2(space) if space > 0 else 0

print(f"  密码长度: {length}")
print(f"  字符集大小: {charset}")
print(f"    小写: {'✓' if has_lower else '✗'}")
print(f"    大写: {'✓' if has_upper else '✗'}")
print(f"    数字: {'✓' if has_digit else '✗'}")
print(f"    特殊: {'✓' if has_special else '✗'}")
print(f"  密码空间: {space:.2e}")
print(f"  信息熵: {entropy:.1f} bits")

if entropy < 28:
    print(f"  强度: 极弱")
elif entropy < 36:
    print(f"  强度: 弱")
elif entropy < 60:
    print(f"  强度: 中等")
elif entropy < 80:
    print(f"  强度: 强")
else:
    print(f"  强度: 极强")

return space, entropy

print("=== 密码强度分析 ===")
print("\n密码: 123456")
analyze_password_strength("123456")
print("\n密码: Password1")
analyze_password_strength("Password1")
print("\n密码: C0mpl3x!P@ss")
analyze_password_strength("C0mpl3x!P@ss")

5.2 暴力破解时间估算器

def estimate_crack_time(password_space, attempts_per_second):
"""估算暴力破解时间"""
seconds = password_space / attempts_per_second

if seconds < 1:
    return f"{seconds*1000:.2f} 毫秒"
elif seconds < 60:
    return f"{seconds:.1f} 秒"
elif seconds < 3600:
    return f"{seconds/60:.1f} 分钟"
elif seconds < 86400:
    return f"{seconds/3600:.1f} 小时"
elif seconds < 86400 * 365:
    return f"{seconds/86400:.1f} 天"
elif seconds < 86400 * 365 * 1000:
    return f"{seconds/(86400*365):.1f} 年"
else:
    return f"{seconds/(86400*365):.2e} 年"

# 不同攻击速度
speeds = {
"在线攻击(限速)": 100,             # 100次/秒
"离线攻击(MD5)": 10_000_000_000,   # 100亿次/秒
"离线攻击(bcrypt)": 50_000,         # 5万次/秒
"GPU集群(SHA256)": 100_000_000_000, # 1000亿次/秒
}

passwords = [
("6位纯数字", 10**6),
("8位纯数字", 10**8),
("8位复杂密码", 95**8),
("12位复杂密码", 95**12),
("16位复杂密码", 95**16),
]

print("=== 暴力破解时间估算（平均尝试一半空间）===")
for pwd_name, space in passwords:
print(f"\n{pwd_name} (空间: {space:.2e}):")
for attack_name, speed in speeds.items():
    avg_space = space / 2  # 平均需要尝试一半
    time_str = estimate_crack_time(avg_space, speed)
    print(f"  {attack_name}: {time_str}")

5.3 验证码碰撞概率

import math

# 生日悖论：n个人中至少两人同一天生日的概率
# 同理：验证码碰撞、哈希碰撞

def birthday_collision_prob(n, space):
"""n次随机选择在space大小的空间中碰撞的概率"""
if n > space:
    return 1.0
# P(无碰撞) = space/space × (space-1)/space × ... × (space-n+1)/space
log_no_collision = sum(math.log(1 - i/space) for i in range(n))
return 1 - math.exp(log_no_collision)

# 4位数字验证码(空间10000)，发了多少个后有50%概率碰撞？
space = 10000
for n in [10, 50, 100, 118, 200, 500]:
prob = birthday_collision_prob(n, space)
print(f"  {n}个验证码在{space}空间中碰撞概率: {prob:.2%}")

# 近似公式：约 sqrt(pi/2 × space) 次达到50%碰撞
n_50 = math.sqrt(math.pi / 2 * space)
print(f"\n  50%碰撞约需: {n_50:.0f} 次 (4位验证码空间)")

# 6位验证码空间
space_6 = 1_000_000
n_50_6 = math.sqrt(math.pi / 2 * space_6)
print(f"  50%碰撞约需: {n_50_6:.0f} 次 (6位验证码空间)")

5.4 彩虹表大小估算

# 彩虹表：预计算 密码→哈希 的映射表
# 大小 ≈ 密码空间 × 每条记录大小

def rainbow_table_size(charset_size, max_length, record_bytes=40):
"""估算彩虹表大小
record_bytes: 密码(~20B) + 哈希(~20B MD5/SHA1)
"""
total_space = sum(charset_size**l for l in range(1, max_length+1))
size_bytes = total_space * record_bytes

units = ["B", "KB", "MB", "GB", "TB", "PB"]
size = float(size_bytes)
for unit in units:
    if size < 1024:
        return f"{size:.1f} {unit}", total_space
    size /= 1024
return f"{size:.1f} EB", total_space

configs = [
("纯数字 1-8位", 10, 8),
("小写字母 1-8位", 26, 8),
("大小写+数字 1-8位", 62, 8),
("全ASCII 1-8位", 95, 8),
("全ASCII 1-10位", 95, 10),
]

print("=== 彩虹表大小估算 ===")
for name, charset, max_len in configs:
size_str, space = rainbow_table_size(charset, max_len)
print(f"  {name}: 空间{space:.2e}, 表大小约{size_str}")

# 结论：加盐(salt)让彩虹表失效，因为每个用户需要单独的表
print("\n加盐后：每个用户都需要独立的彩虹表，预计算不再可行")

六、后端应用

6.1 测试用例组合爆炸

from itertools import product
import math

# 场景：API有多个参数，每个参数有多个取值
params = {
"method": ["GET", "POST", "PUT", "DELETE"],
"auth": ["none", "token", "cookie"],
"content_type": ["json", "form", "xml"],
"encoding": ["utf-8", "gbk"],
}

# 全组合 = 乘法原理
total = math.prod(len(v) for v in params.values())
print(f"全组合测试用例: {total} 种")  # 4×3×3×2 = 72种

# 如果参数更多
# 5个参数各10种 = 10^5 = 100,000 种
# 10个参数各5种 = 5^10 = 9,765,625 种 → 不可能全测

# 实际做法：Pairwise Testing（两两组合覆盖）
# 不需要测全部，只需覆盖任意两个参数的所有组合
from itertools import combinations

# 两两组合数
param_names = list(params.keys())
pair_count = math.comb(len(param_names), 2)
print(f"参数对数: {pair_count}")  # C(4,2) = 6

# 每对的组合数
for p1, p2 in combinations(param_names, 2):
pairs = len(params[p1]) * len(params[p2])
print(f"  {p1} × {p2}: {pairs} 种")

6.2 API 参数组合验证

from itertools import product

# 场景：支付接口参数验证
# 金额: [0, 1, 100, -1, 99999999]
# 币种: ["CNY", "USD", "INVALID"]
# 渠道: ["alipay", "wechat", "unknown"]

amounts = [0, 1, 100, -1, 99999999]
currencies = ["CNY", "USD", "INVALID"]
channels = ["alipay", "wechat", "unknown"]

test_cases = list(product(amounts, currencies, channels))
print(f"测试用例总数: {len(test_cases)}")  # 5×3×3 = 45

# 标记预期结果
for amount, currency, channel in test_cases[:10]:
expected = "PASS"
if amount <= 0:
    expected = "FAIL: invalid amount"
elif currency == "INVALID":
    expected = "FAIL: invalid currency"
elif channel == "unknown":
    expected = "FAIL: invalid channel"
print(f"  金额={amount}, 币种={currency}, 渠道={channel} → {expected}")
print(f"  ... (还有{len(test_cases)-10}条)")

6.3 分页组合

import math

# 场景：列表页排序选项的组合数
sort_fields = ["create_time", "update_time", "price", "name", "popularity"]
sort_orders = ["asc", "desc"]
page_sizes = [10, 20, 50, 100]

# 单字段排序
single_sort = len(sort_fields) * len(sort_orders) * len(page_sizes)
print(f"单字段排序组合: {single_sort} 种")  # 5×2×4 = 40

# 多字段排序（最多3个字段）
# 选3个字段的排列 × 每个字段2种排序 × 分页
multi_sort = math.perm(5, 3) * (2**3) * len(page_sizes)
print(f"3字段排序组合: {multi_sort} 种")  # P(5,3)×8×4 = 1920

# 这就是为什么要限制排序字段数量

七、大数据应用

7.1 抽样方法

import math
import random

# 从1000万用户中抽5000人调研
N = 10_000_000  # 总体
n = 5000         # 样本

# 所有可能的样本数 = C(N, n)，天文数字
# 直接算会溢出，用对数
log_combinations = sum(math.log10(N-i) - math.log10(i+1) for i in range(n))
print(f"C(10000000, 5000) ≈ 10^{log_combinations:.0f}")
# 约10^几万，比宇宙中的原子还多

# 实际抽样方法
# 1. 简单随机抽样
population = list(range(N))
# sample = random.sample(population, n)  # 内存友好版本见下

# 2. 蓄水池抽样（Reservoir Sampling）：数据流场景
def reservoir_sampling(stream, k):
"""从未知大小的流中均匀抽k个样本"""
reservoir = []
for i, item in enumerate(stream):
    if i < k:
        reservoir.append(item)
    else:
        j = random.randint(0, i)
        if j < k:
            reservoir[j] = item
return reservoir

# 从生成器中抽样（不需要全部加载到内存）
stream = (x for x in range(10_000_000))
sample = reservoir_sampling(stream, 10)
print(f"蓄水池抽样结果: {sorted(sample)}")

7.2 特征选择中的组合数

import math

# 机器学习中从100个特征中选最优的10个
n_features = 100
k_select = 10

combinations_count = math.comb(n_features, k_select)
print(f"C(100,10) = {combinations_count:.2e}")
# ≈ 1.7 × 10^13，穷举不现实

# 所以需要启发式方法：
# 1. 前向选择：逐个加入最优特征
# 2. 后向淘汰：逐个移除最差特征
# 3. 随机森林特征重要性
# 4. L1正则化（自动特征选择）

# 前向选择的复杂度
# 第1轮：100个特征选1个 → 100次
# 第2轮：99个特征选1个 → 99次
# ...第10轮：91个特征选1个 → 91次
forward_steps = sum(n_features - i for i in range(k_select))
print(f"前向选择步数: {forward_steps}")  # 955，可以接受
print(f"穷举 vs 前向: {combinations_count/forward_steps:.2e}倍差距")

7.3 A/B测试分组

import math

# 场景：1000个用户分成2组做A/B测试
# 所有分法 = C(1000, 500)

# 用Stirling近似估算大阶乘
def stirling_log10(n):
"""Stirling近似：log10(n!)"""
if n <= 1:
    return 0
return n * math.log10(n) - n * math.log10(math.e) + 0.5 * math.log10(2 * math.pi * n)

log_c = stirling_log10(1000) - 2 * stirling_log10(500)
print(f"C(1000, 500) ≈ 10^{log_c:.0f}")
# 约10^299，可能的分组方式极多

# 所以随机分组在统计上是可靠的：
# 随机选的分组极不可能恰好偏向某一方
print("随机分组的统计可靠性：因为可能的分法极多，偏颇的分法占比极小")

八、排列组合常用公式速查

8.1 公式汇总

名称	公式	例子
排列 P(n,k)	n!/(n-k)!	10人选3人排队=720
组合 C(n,k)	n!/[k!(n-k)!]	10人选3人组队=120
重复排列	n^k	4位数字密码=10000
重复组合	C(n+k-1,k)	3种水果买5个=21
全排列	n!	5人排队=120
圆排列	(n-1)!	5人围桌=24

8.2 Python 工具汇总

import math
from itertools import (
permutations,          # 排列
combinations,          # 组合
combinations_with_replacement,  # 可重复组合
product,               # 笛卡尔积（重复排列）
)

# 计算
print(f"P(10,3) = {math.perm(10, 3)}")       # 720
print(f"C(10,3) = {math.comb(10, 3)}")       # 120
print(f"10! = {math.factorial(10)}")           # 3628800

# 枚举
print(f"排列: {list(permutations('ABC', 2))}")
print(f"组合: {list(combinations('ABC', 2))}")
print(f"可重复组合: {list(combinations_with_replacement('AB', 2))}")
print(f"笛卡尔积: {list(product('AB', repeat=2))}")

# 输出:
# 排列: [('A','B'),('A','C'),('B','A'),('B','C'),('C','A'),('C','B')]
# 组合: [('A','B'),('A','C'),('B','C')]
# 可重复组合: [('A','A'),('A','B'),('B','B')]
# 笛卡尔积: [('A','A'),('A','B'),('B','A'),('B','B')]

九、练习题

练习1：基础排列组合

(a) 从5本不同的书中选3本排在书架上，有多少种排法？

(b) 从5本不同的书中选3本送给朋友（不区分顺序），有多少种选法？

答案：

import math

# (a) 排列：P(5,3)
print(f"排在书架: P(5,3) = {math.perm(5, 3)} 种")  # 60

# (b) 组合：C(5,3)
print(f"送朋友: C(5,3) = {math.comb(5, 3)} 种")    # 10

练习2：密码强度

计算以下密码的空间大小和暴力破解时间（假设每秒100亿次MD5）

(a) 6位纯数字

(b) 8位大小写字母+数字

答案：

speed = 10_000_000_000  # 10B/s

configs = [
("6位纯数字", 10, 6),
("8位大小写+数字", 62, 8),
("12位全ASCII", 95, 12),
]

for name, charset, length in configs:
space = charset ** length
avg_time = space / 2 / speed

if avg_time < 1:
    t = f"{avg_time*1000:.3f} 毫秒"
elif avg_time < 3600:
    t = f"{avg_time:.1f} 秒"
elif avg_time < 86400*365:
    t = f"{avg_time/86400:.1f} 天"
else:
    t = f"{avg_time/(86400*365):.1f} 年"

print(f"  {name}: 空间={space:.2e}, 破解约{t}")
# 6位纯数字: 0.050 毫秒 → 瞬间破解
# 8位大小写+数字: 约 10918 秒 ≈ 3小时
# 12位全ASCII: 约 8570年

练习3：组合爆炸

你的API有以下参数，计算全组合测试用例数：

method: 4种，status: 5种，role: 3种，format: 2种

如果再加一个 locale: 10种，总数变成多少？这说明了什么？

答案：

import math

# 原始
original = 4 * 5 * 3 * 2
print(f"原始: {original} 种")  # 120

# 加locale
with_locale = original * 10
print(f"加locale: {with_locale} 种")  # 1200

# 组合爆炸：每增加一个维度，测试量乘以该维度的取值数
# 10个参数各10种 = 10^10 = 100亿种
print(f"10参数各10种: {10**10:,} 种")
print("→ 这就是为什么需要 Pairwise Testing 而不是全组合测试")

练习4：安全实战 - 验证码强度

你的系统发送4位数字验证码，有效期5分钟

(a) 攻击者每秒能尝试10次，5分钟内能尝试多少次？能覆盖多少比例的空间？

(b) 如果改成6位数字，情况如何？

答案：

# (a) 4位数字
space_4 = 10**4  # 10000
attempts_5min = 10 * 60 * 5  # 3000次
coverage_4 = attempts_5min / space_4
print(f"4位: 空间{space_4}, 5分钟尝试{attempts_5min}次, 覆盖{coverage_4:.0%}")
# 覆盖30%，非常不安全

# (b) 6位数字
space_6 = 10**6  # 1000000
coverage_6 = attempts_5min / space_6
print(f"6位: 空间{space_6}, 5分钟尝试{attempts_5min}次, 覆盖{coverage_6:.2%}")
# 覆盖0.30%，好多了

# (c) 限制5次
coverage_limited = 5 / space_4
print(f"4位+限制5次: 命中概率{coverage_limited:.2%}")  # 0.05%
coverage_limited_6 = 5 / space_6
print(f"6位+限制5次: 命中概率{coverage_limited_6:.4%}")  # 0.0005%

print("\n结论：限流比增加位数更有效，两者结合最佳")

练习5：概率思维

一个系统有100个API接口，每个接口有3个可能的漏洞点

(a) 总共有多少个潜在漏洞点？

(b) 如果安全测试只能覆盖60%的接口，每个被测接口能发现80%的漏洞，预期能发现多少漏洞？

答案：

apis = 100
vuln_points_per_api = 3

# (a)
total_points = apis * vuln_points_per_api
print(f"潜在漏洞点: {total_points}")  # 300

# (b)
tested_apis = apis * 0.6  # 60个接口
tested_points = tested_apis * vuln_points_per_api  # 180个点
found_points = tested_points * 0.8  # 发现144个点
print(f"测试覆盖: {tested_points:.0f}个点, 能发现: {found_points:.0f}个点")

# (c)
real_vulns = total_points * 0.05  # 15个真实漏洞
expected_found = real_vulns * 0.6 * 0.8  # 7.2个
print(f"真实漏洞约: {real_vulns:.0f}个")
print(f"预期发现: {expected_found:.1f}个")
print(f"漏网: {real_vulns - expected_found:.1f}个")
print("→ 这就是为什么安全不能只靠测试，需要多层防御")

十、本页小结

加法原理（分类/或）vs 乘法原理（分步/且）是一切计数的基础

排列 = 有顺序选择：P(n,k) = n!/(n-k)!

组合 = 无顺序选择：C(n,k) = n!/[k!(n-k)!]

重复排列 n^k 直接决定密码空间大小

安全核心：密码强度 = 字符集^长度，破解时间 = 空间/速度，限流比复杂度更有效

后端核心：测试用例组合爆炸，需要用 Pairwise Testing 等方法降低

大数据：特征选择的组合数天文级别，需要启发式方法

Python工具：math.perm/comb/factorial + itertools四件套

上一篇：12-递归与数学归纳法

下一篇：继续阶段二的后续内容

上一章	目录	下一章
12-递归与数学归纳法	数学重学路线图	14-函数增长与大O