文件 I/O (scipy.io)#

MATLAB 文件#

loadmat(file_name[, mdict, appendmat])

Load MATLAB file.

savemat(file_name, mdict[, appendmat, ...])

Save a dictionary of names and arrays into a MATLAB-style .mat file.

whosmat(file_name[, appendmat])

List variables inside a MATLAB file.

基本函数#

我们将从导入 scipy.io 并将其称为 sio 开始,以便于使用:

>>> import scipy.io as sio

如果你使用的是 IPython,尝试在 sio 上进行 Tab 补全。在众多选项中,你会发现:

sio.loadmat
sio.savemat
sio.whosmat

这些是你最有可能在处理 MATLAB 文件时使用的高级函数。你还会发现:

sio.matlab

这是 loadmatsavematwhosmat 导入的包。在 sio.matlab 中,你会发现 mio 模块。这个模块包含了 loadmatsavemat 使用的机制。有时你可能会发现自己重复使用这个机制。

我该如何开始?#

你可能有一个想要读入 SciPy 的 .mat 文件。或者,你想将一些变量从 SciPy / NumPy 传递到 MATLAB。

为了不使用 MATLAB 许可证,让我们从 Octave 开始。Octave 有与 MATLAB 兼容的保存和加载函数。启动 Octave(对我来说,在命令行输入 octave):

octave:1> a = 1:12
a =

   1   2   3   4   5   6   7   8   9  10  11  12

octave:2> a = reshape(a, [1 3 4])
a =

ans(:,:,1) =

   1   2   3

ans(:,:,2) =

   4   5   6

ans(:,:,3) =

   7   8   9

ans(:,:,4) =

   10   11   12

octave:3> save -6 octave_a.mat a % MATLAB 6 兼容
octave:4> ls octave_a.mat
octave_a.mat

现在,转到 Python:

>>> mat_contents = sio.loadmat('octave_a.mat')
>>> mat_contents
{'__header__': b'MATLAB 5.0 MAT-file, written
 by Octave 3.2.3, 2010-05-30 02:13:40 UTC',
 '__version__': '1.0',
 '__globals__': [],
 'a': array([[[ 1.,  4.,  7., 10.],

[ 2., 5., 8., 11.], [ 3., 6., 9., 12.]]])}

>>> oct_a = mat_contents['a']
>>> oct_a
array([[[  1.,   4.,   7.,  10.],
        [  2.,   5.,   8.,  11.],
        [  3.,   6.,   9.,  12.]]])
>>> oct_a.shape
(1, 3, 4)

现在让我们反过来试试:

>>> import numpy as np
>>> vect = np.arange(10)
>>> vect.shape
(10,)
>>> sio.savemat('np_vector.mat', {'vect':vect})

然后回到 Octave:

octave:8> load np_vector.mat
octave:9> vect
vect =

  0  1  2  3  4  5  6  7  8  9

octave:10> size(vect)
ans =

    1   10

如果你想在不将数据读入内存的情况下检查 MATLAB 文件的内容,可以使用 whosmat 命令:

>>> sio.whosmat('octave_a.mat')
[('a', (1, 3, 4), 'double')]

whosmat 返回一个元组列表,每个元组对应文件中的一个数组(或其他对象)。每个元组包含数组的名称、形状和数据类型。

MATLAB 结构体#

MATLAB structs are a little bit like Python dicts, except the field names must be strings. Any MATLAB object can be a value of a field. As for all objects in MATLAB, structs are, in fact, arrays of structs, where a single struct is an array of shape (1, 1).

octave:11> my_struct = struct('field1', 1, 'field2', 2)
my_struct =
{
  field1 =  1
  field2 =  2
}

octave:12> save -6 octave_struct.mat my_struct

We can load this in Python:

>>> mat_contents = sio.loadmat('octave_struct.mat')
>>> mat_contents
 {'__header__': b'MATLAB 5.0 MAT-file, written by Octave 3.2.3, 2010-05-30 02:00:26 UTC',
  '__version__': '1.0',
  '__globals__': [],
  'my_struct': array([[(array([[1.]]), array([[2.]]))]], dtype=[('field1', 'O'), ('field2', 'O')])
 }
>>> oct_struct = mat_contents['my_struct']
>>> oct_struct.shape
(1, 1)
>>> val = oct_struct[0,0]
>>> val
([[1.0]], [[2.0]])
>>> val['field1']
array([[ 1.]])
>>> val['field2']
array([[ 2.]])
>>> val.dtype
dtype([('field1', 'O'), ('field2', 'O')])

In the SciPy versions from 0.12.0, MATLAB structs come back as NumPy structured arrays, with fields named for the struct fields. You can see the field names in the dtype output above. Note also:

>>> val = oct_struct[0,0]

and:

octave:13> size(my_struct)
ans =

   1   1

So, in MATLAB, the struct array must be at least 2-D, and we replicate that when we read into SciPy. If you want all length 1 dimensions squeezed out, try this:

>>> mat_contents = sio.loadmat('octave_struct.mat', squeeze_me=True)
>>> oct_struct = mat_contents['my_struct']
>>> oct_struct.shape
()

Sometimes, it’s more convenient to load the MATLAB structs as Python objects rather than NumPy structured arrays - it can make the access syntax in Python a bit more similar to that in MATLAB. In order to do this, use the struct_as_record=False parameter setting to loadmat.

>>> mat_contents = sio.loadmat('octave_struct.mat', struct_as_record=False)
>>> oct_struct = mat_contents['my_struct']
>>> oct_struct[0,0].field1
array([[ 1.]])

struct_as_record=False works nicely with squeeze_me:

>>> mat_contents = sio.loadmat('octave_struct.mat', struct_as_record=False, squeeze_me=True)
>>> oct_struct = mat_contents['my_struct']
>>> oct_struct.shape # but no - it's a scalar
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'mat_struct' object has no attribute 'shape'
>>> type(oct_struct)
<class 'scipy.io.matlab._mio5_params.mat_struct'>
>>> oct_struct.field1
1.0

Saving struct arrays can be done in various ways. One simple method is to use dicts:

>>> a_dict = {'field1': 0.5, 'field2': 'a string'}
>>> sio.savemat('saved_struct.mat', {'a_dict': a_dict})

loaded as:

octave:21> load saved_struct
octave:22> a_dict
a_dict =

  scalar structure containing the fields:

    field2 = a string
    field1 =  0.50000

You can also save structs back again to MATLAB (or Octave in our case) like this:

>>> dt = [('f1', 'f8'), ('f2', 'S10')]
>>> arr = np.zeros((2,), dtype=dt)
>>> arr
array([(0.0, ''), (0.0, '')],
      dtype=[('f1', '<f8'), ('f2', 'S10')])
>>> arr[0]['f1'] = 0.5
>>> arr[0]['f2'] = 'python'
>>> arr[1]['f1'] = 99
>>> arr[1]['f2'] = 'not perl'
>>> sio.savemat('np_struct_arr.mat', {'arr': arr})

MATLAB cell arrays#

MATLAB中的单元数组(Cell Arrays)有点类似于Python中的列表,因为数组中的元素可以包含任何类型的MATLAB对象。实际上,它们最类似于NumPy的对象数组(object arrays),这也是我们将其加载到NumPy中的方式。

octave:14> my_cells = {1, [2, 3]}
my_cells =
{
  [1,1] =  1
  [1,2] =

     2   3

}

octave:15> save -6 octave_cells.mat my_cells

回到Python:

>>> mat_contents = sio.loadmat('octave_cells.mat')
>>> oct_cells = mat_contents['my_cells']
>>> print(oct_cells.dtype)
object
>>> val = oct_cells[0,0]
>>> val
array([[ 1.]])
>>> print(val.dtype)
float64

保存到MATLAB单元数组只需创建一个NumPy对象数组:

>>> obj_arr = np.zeros((2,), dtype=object)
>>> obj_arr[0] = 1
>>> obj_arr[1] = 'a string'
>>> obj_arr
array([1, 'a string'], dtype=object)
>>> sio.savemat('np_cells.mat', {'obj_arr': obj_arr})
octave:16> load np_cells.mat
octave:17> obj_arr
obj_arr =
{
  [1,1] = 1
  [2,1] = a string
}

IDL文件#

readsav(file_name[, idict, python_dict, ...])

Read an IDL .sav file.

Matrix Market文件#

mminfo(source)

Return size and storage parameters from Matrix Market file-like 'source'.

mmread(source)

Reads the contents of a Matrix Market file-like 'source' into a matrix.

mmwrite(target, a[, comment, field, ...])

Writes the sparse or dense array a to Matrix Market file-like target.

Wav声音文件 (scipy.io.wavfile)#

read(filename[, mmap])

Open a WAV file.

write(filename, rate, data)

Write a NumPy array as a WAV file.

Arff文件 (scipy.io.arff)#

loadarff(f)

Read an arff file.

Netcdf#

netcdf_file(filename[, mode, mmap, version, ...])

A file object for NetCDF data.

允许读取NetCDF文件(基于pupynere_包的版本)