scipy.sparse.

`csr_matrix`#

class scipy.sparse.csr_matrix(arg1, shape=None, dtype=None, copy=False)[源代码][源代码]#

压缩稀疏行矩阵。

这可以通过几种方式实例化：

csr_matrix(D): 其中 D 是一个 2-D ndarray
csr_matrix(S): 与另一个稀疏数组或矩阵 S 结合（相当于 S.tocsr()）
csr_matrix((M, N), [dtype]): 构造一个形状为 (M, N) 的空矩阵，dtype 是可选的，默认为 dtype=’d’。
csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)]): 其中 data、row_ind 和 col_ind 满足关系 a[row_ind[k], col_ind[k]] = data[k]。
csr_matrix((data, indices, indptr), [shape=(M, N)]): 是标准的CSR表示，其中行i的列索引存储在``indices[indptr[i]:indptr[i+1]]``中，它们对应的值存储在``data[indptr[i]:indptr[i+1]]``中。如果未提供形状参数，则从索引数组推断矩阵维度。

属性:

dtypedtype: 矩阵的数据类型
shape2元组: 矩阵的形状
ndim整数: 维度数量（这总是2）
nnz: 存储值的数量，包括显式零。
size: 存储值的数量。
数据: CSR 格式矩阵的数据数组
索引: CSR 格式的矩阵索引数组
indptr: CSR 格式矩阵的索引指针数组
has_sorted_indices: 索引是否排序
has_canonical_format: 数组/矩阵是否具有排序的索引且没有重复项
T: 转置。

方法

`__getitem__`(key)
`__len__`()
`__mul__`(other)
`arcsin`()	逐元素反正弦。
`arcsinh`()	逐元素的反双曲正弦函数。
`arctan`()	逐元素计算反正切。
`arctanh`()	逐元素计算反双曲正切。
`argmax`([axis, out])	返回沿某个轴的最大元素的索引。
`argmin`([axis, out])	返回沿某个轴的最小元素的索引。
`asformat`(format[, copy])	以传递的格式返回此数组/矩阵。
`asfptype`()	将矩阵向上转换为浮点格式（如果需要）
`astype`(dtype[, casting, copy])	将数组/矩阵元素转换为指定类型。
`ceil`()	逐元素向上取整。
`check_format`([full_check])	检查数组/矩阵是否符合CSR或CSC格式。
`conj`([copy])	逐元素复共轭。
`conjugate`([copy])	逐元素复共轭。
`copy`()	返回此数组/矩阵的副本。
`count_nonzero`()	非零条目的数量，相当于
`deg2rad`()	逐元素的度到弧度转换。
`diagonal`([k])	返回数组/矩阵的第 k 个对角线。
`dot`(other)	普通点积
`eliminate_zeros`()	从数组/矩阵中移除零项
`expm1`()	逐元素计算 expm1。
`floor`()	逐元素取整。
`getH`()	返回此矩阵的厄米转置。
`get_shape`()	获取矩阵的形状
`getcol`(j)	返回矩阵的第 j 列的副本，作为一个 (m x 1) 的稀疏矩阵（列向量）。
`getformat`()	矩阵存储格式
`getmaxprint`()	打印时显示的最大元素数量。
`getnnz`([axis])	存储值的数量，包括显式零。
`getrow`(i)	返回矩阵第 i 行的副本，作为一个 (1 x n) 的稀疏矩阵（行向量）。
`log1p`()	逐元素计算 log1p。
`max`([axis, out])	返回数组/矩阵的最大值或沿某个轴的最大值。
`maximum`(other)	此数组/矩阵与另一个数组/矩阵之间的逐元素最大值。
`mean`([axis, dtype, out])	计算指定轴上的算术平均值。
`min`([axis, out])	返回数组/矩阵的最小值或沿轴的最大值。
`minimum`(other)	此数组/矩阵与另一个数组/矩阵之间的逐元素最小值。
`multiply`(other)	通过数组/矩阵、向量或标量进行逐点乘法。
`nanmax`([axis, out])	返回数组/矩阵的最大值或沿轴的最大值，忽略任何 NaN。
`nanmin`([axis, out])	返回数组/矩阵的最小值或沿轴的最小值，忽略任何 NaN。
`nonzero`()	数组/矩阵的非零索引。
`power`(n[, dtype])	此函数执行逐元素幂运算。
`prune`()	移除所有非零元素后的空格。
`rad2deg`()	逐元素的 rad2deg 转换。
`reshape`(self, shape[, order, copy])	在不改变其数据的情况下，给稀疏数组/矩阵赋予一个新的形状。
`resize`(*shape)	将数组/矩阵就地调整为 `shape` 给定的尺寸
`rint`()	逐元素取整。
`set_shape`(shape)	就地设置矩阵的形状
`setdiag`(values[, k])	设置数组/矩阵的对角线或非对角线元素。
`sign`()	逐元素符号。
`sin`()	逐元素正弦。
`sinh`()	逐元素的双曲正弦函数。
`sort_indices`()	对这个数组/矩阵的索引进行原地排序
`sorted_indices`()	返回此数组/矩阵的排序索引副本
`sqrt`()	逐元素平方根。
`sum`([axis, dtype, out])	对数组/矩阵元素沿指定轴求和。
`sum_duplicates`()	通过将它们相加来消除重复条目
`tan`()	逐元素的 tan 函数。
`tanh`()	逐元素 tanh。
`toarray`([order, out])	返回此稀疏数组/矩阵的密集 ndarray 表示形式。
`tobsr`([blocksize, copy])	将此数组/矩阵转换为块稀疏行格式。
`tocoo`([copy])	将此数组/矩阵转换为 COOrdinate 格式。
`tocsc`([copy])	将此数组/矩阵转换为压缩稀疏列格式。
`tocsr`([copy])	将此数组/矩阵转换为压缩稀疏行格式。
`todense`([order, out])	返回此稀疏数组/矩阵的密集表示。
`todia`([copy])	将此数组/矩阵转换为稀疏对角格式。
`todok`([copy])	将此数组/矩阵转换为字典键格式。
`tolil`([copy])	将此数组/矩阵转换为列表的列表格式。
`trace`([offset])	返回稀疏数组/矩阵对角线上的和。
`transpose`([axes, copy])	反转稀疏数组/矩阵的维度。
`trunc`()	逐元素截断。

注释

稀疏矩阵可以用于算术运算：它们支持加法、减法、乘法、除法和矩阵幂运算。

CSR 格式的优势

高效的算术运算 CSR + CSR, CSR * CSR 等。
高效的行切片
快速矩阵向量乘积

CSR 格式的缺点

慢速列切片操作（考虑使用CSC）
稀疏结构的变化是昂贵的（考虑使用LIL或DOK）

标准格式

在每一行中，索引按列排序。
没有重复的条目。

示例

>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> csr_matrix((3, 4), dtype=np.int8).toarray()
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]], dtype=int8)

>>> row = np.array([0, 0, 1, 2, 2, 2])
>>> col = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

>>> indptr = np.array([0, 2, 3, 6])
>>> indices = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, indices, indptr), shape=(3, 3)).toarray()
array([[1, 0, 2],
       [0, 0, 3],
       [4, 5, 6]])

重复条目会被加总在一起：

>>> row = np.array([0, 1, 2, 0])
>>> col = np.array([0, 1, 1, 0])
>>> data = np.array([1, 2, 4, 8])
>>> csr_matrix((data, (row, col)), shape=(3, 3)).toarray()
array([[9, 0, 0],
       [0, 2, 0],
       [0, 4, 0]])

作为一个如何增量构建CSR矩阵的示例，以下代码片段从文本构建了一个词项-文档矩阵：

>>> docs = [["hello", "world", "hello"], ["goodbye", "cruel", "world"]]
>>> indptr = [0]
>>> indices = []
>>> data = []
>>> vocabulary = {}
>>> for d in docs:
...     for term in d:
...         index = vocabulary.setdefault(term, len(vocabulary))
...         indices.append(index)
...         data.append(1)
...     indptr.append(len(indices))
...
>>> csr_matrix((data, indices, indptr), dtype=int).toarray()
array([[2, 1, 0, 0],
       [0, 1, 1, 1]])

csr_matrix#

`csr_matrix`#