make_multiplexer_dataset：用于创建多路复用器数据的函数

生成由 n 位布尔多路复用器生成的数据集的函数，用于评估监督学习算法。

> `from mlxtend.data import make_multiplexer_dataset`

概述

make_multiplexer_dataset 函数创建了一个由 n 位布尔多路复用器生成的数据集。这样的数据集表示由一个简单规则生成的样本，基于电多路复用器的行为，但对于监督学习算法来说，它呈现出相对具有挑战性的分类问题，特征间的交互（表型）在许多现实场景中可能会遇到 [1]。

下面的插图展示了一个6位多路复用器，它由2个地址位和4个寄存器位组成。地址位转换为十进制表示，指向寄存器位中的一个位置。例如，如果地址位是“00”（十进制为0），那么地址位指向寄存器位的位置0。指向的寄存器位置的值决定了类标签。例如，如果寄存器位在位置0的值为0，则类标签为0。反之，若寄存器位在位置0的值为1，则类标签为1。

在上面的示例中，地址位“10”（十进制为2）指向第3个寄存器位置（因为我们从索引0开始计数），该位置的位值为1。因此，类标签为1。

以下是更多示例：

地址位: [0, 1], 寄存器位: [1, 0, 1, 1], 类标签: 0
地址位: [0, 1], 寄存器位: [1, 1, 1, 0], 类标签: 1
地址位: [1, 0], 寄存器位: [1, 0, 0, 1], 类标签: 0
地址位: [1, 1], 寄存器位: [1, 1, 1, 0], 类标签: 0
地址位: [0, 1], 寄存器位: [0, 1, 1, 0], 类标签: 1
地址位: [0, 1], 寄存器位: [1, 0, 0, 1], 类标签: 0
地址位: [0, 1], 寄存器位: [0, 1, 1, 1], 类标签: 1
地址位: [0, 1], 寄存器位: [0, 0, 0, 0], 类标签: 0
地址位: [1, 0], 寄存器位: [1, 0, 1, 1], 类标签: 1
地址位: [0, 1], 寄存器位: [1, 1, 1, 1], 类标签: 1

请注意，在多路复用器函数的实现中，如果地址位的数量设置为2，那么这将导致一个6位多路复用器，因为两个地址位可以有2^2=4个不同的寄存器位置（2位 + 4位 = 6位）。然而，如果我们选择3个地址位，那么将覆盖2^3=8个位置，从而得到一个11位（3位 + 8位 = 11位）多路复用器，依此类推。

参考文献

[1] Urbanowicz, R. J., & Browne, W. N. (2017). 学习分类器系统导论. 施普林格.

示例 1 -- 6位多路复用器

这个简单的示例演示了如何从一个6位多路复用器创建数据集。

import numpy as np
from mlxtend.data import make_multiplexer_dataset


X, y = make_multiplexer_dataset(address_bits=2, 
                                sample_size=10,
                                positive_class_ratio=0.5, 
                                shuffle=False,
                                random_seed=123)

print('Features:\n', X)
print('\nClass labels:\n', y)

Features:
 [[0 1 0 1 0 1]
 [1 0 0 0 1 1]
 [0 1 1 1 0 0]
 [0 1 1 1 0 0]
 [0 0 1 1 0 0]
 [0 1 0 0 0 0]
 [0 1 1 0 1 1]
 [1 0 1 0 0 0]
 [1 0 0 1 0 1]
 [1 0 1 0 0 1]]

Class labels:
 [1 1 1 1 1 0 0 0 0 0]

API

make_multiplexer_dataset(address_bits=2, sample_size=100, positive_class_ratio=0.5, shuffle=False, random_seed=None)

Function to create a binary n-bit multiplexer dataset.

New in mlxtend v0.9

Parameters

address_bits : int (default: 2)

A positive integer that determines the number of address bits in the multiplexer, which in turn determine the n-bit capacity of the multiplexer and therefore the number of features. The number of features is determined by the number of address bits. For example, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). If address_bits=3, then this results in an 11-bit multiplexer as (2 + 2^3 = 11) with 11 features.
sample_size : int (default: 100)

The total number of samples generated.
positive_class_ratio : float (default: 0.5)

The fraction (a float between 0 and 1) of samples in the sample_sized dataset that have class label 1. If positive_class_ratio=0.5 (default), then the ratio of class 0 and class 1 samples is perfectly balanced.
shuffle : Bool (default: False)

Whether or not to shuffle the features and labels. If False (default), the samples are returned in sorted order starting with sample_size/2 samples with class label 0 and followed by sample_size/2 samples with class label 1.
random_seed : int (default: None)

Random seed used for generating the multiplexer samples and shuffling.

Returns

X, y : [n_samples, n_features], [n_class_labels]

X is the feature matrix with the number of samples equal to sample_size. The number of features is determined by the number of address bits. For instance, 2 address bits will result in a 6 bit multiplexer and consequently 6 features (2 + 2^2 = 6). All features are binary (values in {0, 1}). y is a 1-dimensional array of class labels in {0, 1}.

Examples

For usage examples, please see https://rasbt.github.io/mlxtend/user_guide/data/make_multiplexer_dataset