pandas.json_normalize#

pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[源代码][源代码]#

将半结构化的 JSON 数据规范化成一个扁平的表格。

参数:
数据字典、字典列表或字典系列

未序列化的 JSON 对象。

record_pathstr 或 str 列表,默认 None

每个对象中的路径到记录列表。如果没有传递,数据将被假定为记录数组。

meta路径列表(str 或 str 列表),默认为 None

用于作为结果表中每个记录的元数据的字段。

meta_prefixstr, 默认为 None

如果为真,则使用带点的路径作为记录前缀,例如如果 meta 是 [‘foo’, ‘bar’],则为 foo.bar.field。

record_prefixstr, 默认为 None

如果为真,则使用带点的路径作为记录前缀,例如如果记录路径是 [‘foo’, ‘bar’],则前缀为 foo.bar.field。

错误{‘raise’, ‘ignore’}, 默认 ‘raise’

配置错误处理。

  • ‘ignore’ : 如果meta中列出的键并不总是存在,将会忽略KeyError。

  • ‘raise’ : 如果meta中列出的键不总是存在,将引发KeyError。

sepstr, 默认 ‘.’

嵌套记录将生成由 sep 分隔的名称。例如,对于 sep=’.’,{‘foo’: {‘bar’: 0}} -> foo.bar。

max_levelint, 默认为 None

最大归一化层级数(字典深度)。如果为 None,则归一化所有层级。

返回:
frameDataFrame
将半结构化的 JSON 数据规范化成一个扁平的表格。

例子

>>> data = [
...     {"id": 1, "name": {"first": "Coleen", "last": "Volk"}},
...     {"name": {"given": "Mark", "family": "Regner"}},
...     {"id": 2, "name": "Faye Raker"},
... ]
>>> pd.json_normalize(data)
    id name.first name.last name.given name.family        name
0  1.0     Coleen      Volk        NaN         NaN         NaN
1  NaN        NaN       NaN       Mark      Regner         NaN
2  2.0        NaN       NaN        NaN         NaN  Faye Raker
>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> pd.json_normalize(data, max_level=0)
    id        name                        fitness
0  1.0   Cole Volk  {'height': 130, 'weight': 60}
1  NaN    Mark Reg  {'height': 130, 'weight': 60}
2  2.0  Faye Raker  {'height': 130, 'weight': 60}

规范化嵌套数据到第1级。

>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> pd.json_normalize(data, max_level=1)
    id        name  fitness.height  fitness.weight
0  1.0   Cole Volk             130              60
1  NaN    Mark Reg             130              60
2  2.0  Faye Raker             130              60
>>> data = [
...     {
...         "id": 1,
...         "name": "Cole Volk",
...         "fitness": {"height": 130, "weight": 60},
...     },
...     {"name": "Mark Reg", "fitness": {"height": 130, "weight": 60}},
...     {
...         "id": 2,
...         "name": "Faye Raker",
...         "fitness": {"height": 130, "weight": 60},
...     },
... ]
>>> series = pd.Series(data, index=pd.Index(["a", "b", "c"]))
>>> pd.json_normalize(series)
    id        name  fitness.height  fitness.weight
a  1.0   Cole Volk             130              60
b  NaN    Mark Reg             130              60
c  2.0  Faye Raker             130              60
>>> data = [
...     {
...         "state": "Florida",
...         "shortname": "FL",
...         "info": {"governor": "Rick Scott"},
...         "counties": [
...             {"name": "Dade", "population": 12345},
...             {"name": "Broward", "population": 40000},
...             {"name": "Palm Beach", "population": 60000},
...         ],
...     },
...     {
...         "state": "Ohio",
...         "shortname": "OH",
...         "info": {"governor": "John Kasich"},
...         "counties": [
...             {"name": "Summit", "population": 1234},
...             {"name": "Cuyahoga", "population": 1337},
...         ],
...     },
... ]
>>> result = pd.json_normalize(
...     data, "counties", ["state", "shortname", ["info", "governor"]]
... )
>>> result
         name  population    state shortname info.governor
0        Dade       12345   Florida    FL    Rick Scott
1     Broward       40000   Florida    FL    Rick Scott
2  Palm Beach       60000   Florida    FL    Rick Scott
3      Summit        1234   Ohio       OH    John Kasich
4    Cuyahoga        1337   Ohio       OH    John Kasich
>>> data = {"A": [1, 2]}
>>> pd.json_normalize(data, "A", record_prefix="Prefix.")
    Prefix.0
0          1
1          2

返回带有给定字符串前缀的列的规范化数据。