1000ms

Tried: 22

Accepted: 4

Difficulty: 5

所属公司 : 美团

时间 :2025年5月31日-算法岗

算法标签>

暴力枚举

题目描述

实现一个基于关联规则的市场篮子分析系统，具体要求如下：

读取输入的交易数据，每个交易包含若干商品。
计算每个商品对的支持度（ $support$ ）。
根据支持度阈值筛选频繁商品对。
输出每个频繁商品对及其支持度。

输入为一个二维列表，表示若干交易，每个交易包含若干商品。
支持度阈值为 $0.5$ 。
输出为一个包含频繁商品对及其支持度的列表，每个元素是一个三元组 $(\text{商品}_1,\ \text{商品}_2,\ \text{支持度})$ ，使用 $\text{round}(x,2)$ 保留两位小数。
输出按照支持度降序排列。

思路

统计事务总数
设有 $N$ 条交易，记为
$N = \text{len}(\text{transactions})$
枚举所有可能的商品对
对于每一条交易 $T$ ，其中包含若干商品，记为 $\{i_1, i_2, \dots\}$ ，使用组合函数枚举所有长度为 $2$ 的商品对。累积所有交易中出现的商品对，并对每个商品对计数。
计算支持度
对于某个商品对 $(a,b)$ ，在所有 $N$ 条交易中出现的次数记为 $\text{count}(a,b)$ ，其支持度定义为
筛选频繁商品对
只保留满足
$\text{support}(a,b) \;\ge\; 0.5$
的商品对，即 $\text{support}(a,b)\ge 0.5$ 。
整理输出
将所有频繁商品对及对应的支持度组成三元组，并按支持度从大到小排序。支持度使用
$\text{round}(\text{support},\,2)$
保留两位小数。

Python 实现

from itertools import combinations
def parse_transactions_from_stdin():
    import sys
    lines = []
    for line in sys.stdin:
        # 读取所有行直到 EOF
        lines.append(line.rstrip())
    transactions = []
    for raw in lines:
        s = raw.strip()
        # 跳过空行和只包含左右中括号的行
        if s == "" or s == "[" or s == "]":
            continue
        # 有可能行尾带逗号，也去掉
        if s.endswith(","):
            s = s[:-1].strip()
        # 此时 s 应该是类似 ["milk","bread","butter"] 这样的格式
        if s.startswith("[") and s.endswith("]"):
            inner = s[1:-1].strip()  # 去掉最外层中括号
            if inner == "":
                # 如果是空的子列表，就加一个空列表
                transactions.append([])
            else:
                # 将 item 按逗号分割，再去掉引号
                parts = inner.split(",")
                items = []
                for token in parts:
                    t = token.strip()
                    # 去掉单/双引号
                    if (t.startswith("'") and t.endswith("'")) or (t.startswith('"') and t.endswith('"')):
                        t = t[1:-1]
                    items.append(t)
                transactions.append(items)
        else:
            # 如果格式不对，就跳过（也可以报错）
            continue
    return transactions
def find_frequent_pairs(transactions, min_support=0.5):
    """
    transactions: 二维列表，每个子列表是一次交易包含的商品
    min_support: 支持度阈值，默认为 0.5
    """
    # 交易总数
    num_transactions = len(transactions)
    # 用于计数每个商品对出现的次数，key = (item1, item2)，value = 出现次数
    pair_counts = {}
    # 遍历每一条交易，枚举其中所有可能的商品对并计数
    for transaction in transactions:
        # 对本次交易中的商品排序，用于保证(key)的一致性
        sorted_items = sorted(transaction)
        for item1, item2 in combinations(sorted_items, 2):
            pair = (item1, item2)
            if pair not in pair_counts:
                pair_counts[pair] = 0
            pair_counts[pair] += 1
    # 计算支持度并筛选频繁商品对
    frequent_pairs = []
    for pair, count in pair_counts.items():
        support = count / num_transactions  # 计算支持度
        if support >= min_support:
            # 保留两位小数
            frequent_pairs.append((pair[0], pair[1], round(support, 2)))
    # 按支持度降序排序
    frequent_pairs.sort(key=lambda x: x[2], reverse=True)
    return frequent_pairs
if __name__ == "__main__":
    # 先从 stdin 里把所有输入读完并解析成 transactions
    transactions = parse_transactions_from_stdin()
    # 如果读到的 transactions 为空，可以提示用户输入格式错误
    if not transactions:
        print("没有读取到合法的交易列表，请检查输入格式。")
    else:
        # 计算频繁商品对，支持度阈值设为 0.5
        result = find_frequent_pairs(transactions, min_support=0.5)
        # 输出结果
        for item1, item2, support in result:
            print(f'("{item1}", "{item2}", {support})')

题目内容

实现一个基于关联规则的市场篮子分析系统，具体要求如下：

1.读取输入的交易数据，每个交易包含若干商品。

2.计算每个商品对的支持度（ $support$ ）。

3.根据支持度阈值筛选频繁商品对。

4.输出每个频繁商品对及其支持度。

输入描述

输入为一个二维列表，表示若干交易，每个交易包含若干商品。

输出描述

输出为一个包含频繁商品对及其支持度的列表，每个元素是一个三元组(商品 $1$ ，商品 $2$ ,支持度)，使用 $round(x,2)$ 保留小数位。

补充说明

支持度阈值 $=0.5$

输出按照支持度降序排列。

样例1

输入

[
["milk","bread","butter"]
["bread","butter"]
["milk","bread"]
["milk","butter"]
["bread","butter","jam"]
]

输出

("bread","butter",0.6)

#P3283. 第2题-市场篮子分析系统

第2题-市场篮子分析系统

Python 实现

题目内容

输入描述

输出描述

补充说明

样例1

Status

Development

Support

About