可用设置

为了自定义 ydata-profiling 的行为和生成的报告的外观，提供了一组选项。这种深度的自定义允许创建针对特定分析数据集的高度定制的行为。以下列出了可用的设置。要了解如何更改它们，请查看 :doc:changing_settings。

通用设置

全局报告设置：

Parameter	Type	Default	Description
`title`	string	Pandas Profiling Report	Title for the report, shown in the header and title bar.
`pool_size`	integer	0	Number of workers in thread pool. When set to zero, it is set to the number of CPUs available.
`progress_bar`	boolean	`True`	If `True`, `ydata-profiling` will display a progress bar.

变量摘要设置

与每个变量显示的信息相关的设置。

Parameter	Type	Default	Description
`sort`	None, asc or desc	nan	Sort the variables asc (ending), desc (ending) or None (leaves original sorting).
`variables.descriptions`	dict	{}	Ability to display a description alongside the descriptive statistics of each variable ({'var_name': 'Description'}).
`vars.num.quantiles`	list[float]	[0.05,0.25,0.5,0.75,0.95]	The quantiles to calculate. Note that .25, .5 and .75 are required for the computation of other metrics (median and IQR).
`vars.num.skewness_threshold`	integer	20	Warn if the skewness is above this threshold.
`vars.num.low_categorical_threshold`	integer	5	If the number of distinct values is smaller than this number, then the series is considered to be categorical. Set to 0 to disable.
`vars.num.chi_squared_threshold`	float	0.999	Set to 0 to disable chi-squared calculation.
`vars.cat.length`	boolean	`True`	Check the string length and aggregate values (min, max, mean, media).
`vars.cat.characters`	boolean	`False`	Check the distribution of characters and their Unicode properties. Often informative, but may be computationally expensive.
`vars.cat.words`	boolean	`False`	Check the distribution of words. Often informative, but may be computationally expensive.
`vars.cat.cardinality_threshold`	integer	50	Warn if the number of distinct values is above this threshold.
`vars.cat.imbalance_threshold`	float	0.5	Warn if the imbalance score is above this threshold.
`vars.cat.n_obs`	integer	5	Display this number of observations.
`vars.cat.chi_squared_threshold`	float	0.999	Same as above, but for categorical variables.
`vars.bool.n_obs`	integer	3	Same as above, but for boolean variables.
`vars.bool.imbalance_threshold`	float	0.5	Warn if the imbalance score is above this threshold.

配置示例
  profile = df.profile_report(
      sort="ascending",
      vars={
          "num": {"low_categorical_threshold": 0},
          "cat": {
              "length": True,
              "characters": False,
              "words": False,
              "n_obs": 5,
          },
      },
  )

  profile.config.variables.descriptions = {
      "files": "文件系统中的文件",
      "datec": "创建日期",
      "datem": "修改日期",
  }

  profile.to_file("report.html")

设置数据集模式类型

为给定数据集配置模式类型。

设置变量类型模式以生成配置文件报告
  import json
  import pandas as pd

  from ydata_profiling import ProfileReport
  from ydata_profiling.utils.cache import cache_file

  file_name = cache_file(
      "titanic.csv",
      "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv",
  )
  df = pd.read_csv(file_name)

  type_schema = {"Survived": "categorical", "Embarked": "categorical"}

  # 我们只能为确定类型的变量设置 type_schema。
  # 所有其他变量将自动推断。
  report = ProfileReport(df, title="Titanic EDA", type_schema=type_schema)

  report.to_file("report.html")

缺失数据概览图

与缺失数据部分及其包含的可视化相关的设置。

Parameter	Type	Default	Description
`missing_diagrams.bar`	boolean	`True`	Display a bar chart with counts of missing values for each column.
`missing_diagrams.matrix`	boolean	`True`	Display a matrix of missing values. Similar to the bar chart, but might provide overview of the co-occurrence of missing values in rows.
`missing_diagrams.heatmap`	boolean	`True`	Display a heatmap of missing values, that measures nullity correlation (i.e. how strongly the presence or absence of one variable affects the presence of another).

配置示例：禁用大型数据集的热图
  profile = df.profile_report(
      missing_diagrams={
          "heatmap": False,
      }
  )
  profile.to_file("report.html")

相关性

关于相关性度量和阈值的设置。默认值为 auto，但以下相关性矩阵可用：

Parameter	Description
`auto`	Calculates the column pairwise correlation depending on the type schema:
	- numerical to numerical variable: Spearman correlation coefficient
	- categorical to categorical variable: Cramer's V association coefficient
	- numerical to categorical: Cramer's V association coefficient with the numerical variable discretized automatically
`spearman`	Spearman's correlation measures the strength and direction of monotonic association between two variables. Great to evaluate the strength of the relation between categorical or ordinal variables.
`pearson`	The Pearson correlation coefficient is the most common way of measuring a linear correlation. It is a number between –1 and 1 that measures the strength and direction of the relationship between two variables.
`kendall`	Kendall rank correlation coefficient is a statistic used to measure the ordinal association between two measured quantities. Kendall's is often used when data doesn't meet one of the requirements of Pearson's correlation.
`phi_k`	Phi K is especially suitable for working with mixed-type variables. Using this coefficient we can find (un)expected correlation and evaluate their statistical significance.
`cramers`	Cramers is a correlation matrix that is commonly used to examine the association between categorical variables when there is more than 2x2 contingency.

对于每个相关性矩阵，您可以使用以下配置：

Parameter	Type	Default	Description
`correlations.auto.calculate`	boolean	`True`	Whether to compute 'auto' correlation
`correlations.auto.warn_high_correlations`	boolean	`True`	Show warning for correlations higher than the threshold
`correlations.auto.threshold`	float	0.9	Warning threshold
`correlations.pearson.calculate`	boolean	`False`	Whether to calculate Pearson correlation
`correlations.pearson.warn_high_correlations`	boolean	`True`	Show warning for correlations higher than the threshold
`correlations.pearson.threshold`	float	0.9	Warning threshold
`correlations.spearman.calculate`	boolean	`False`	Whether to calculate Spearman correlation
`correlations.spearman.warn_high_correlations`	boolean	`False`	Show warning for correlations higher than the threshold
`correlations.spearman.threshold`	float	0.9	Warning threshold
`correlations.kendall.calculate`	boolean	`False`	Whether to calculate Kendall rank correlation
`correlations.kendall.warn_high_correlations`	boolean	`False`	Show warning for correlations higher than the threshold
`correlations.kendall.threshold`	float	0.9	Warning threshold
`correlations.phi_k.calculate`	boolean	`False`	Whether to calculate Phi K correlation
`correlations.phi_k.warn_high_correlations`	boolean	`False`	Show warning for correlations higher than the threshold
`correlations.phi_k.threshold`	float	0.9	Warning threshold
`correlations.cramers.calculate`	boolean	`False`	Whether to calculate Cramer's V association coefficient
`correlations.cramers.warn_high_correlations`	boolean	`True`	Show warning for correlations higher than the threshold
`correlations.cramers.threshold`	float	0.9	Warning threshold

例如，禁用所有相关性计算（对于大型数据集可能相关）：

禁用所有相关性矩阵
    profile = df.profile_report(
        title="无相关性的报告",
        correlations={
            "auto": {"calculate": False},
            "pearson": {"calculate": False},
            "spearman": {"calculate": False},
            "kendall": {"calculate": False},
            "phi_k": {"calculate": False},
            "cramers": {"calculate": False},
        },
    )

    # 或者使用相关性可用的简写
    profile = df.profile_report(
        title="无相关性的报告",
        correlations=None,
    )

交互

与交互部分相关的设置。

Parameter	Type	Default	Description
`interactions.continuous`	boolean	`True`	Generate a 2D scatter plot (or hexagonal binned plot) for all continuous variable pairs.
`interactions.targets`	list	[]	When a list of variable names is given, only interactions between these and all other variables are computed.

报告的外观

与报告的外观和样式相关的设置。

Parameter	Type	Default	Description
`html.minify_html`	bool	`True`	If `True`, the output HTML is minified using the `htmlmin` package.
`html.use_local_assets`	bool	`True`	If `True`, all assets (stylesheets, scripts, images) are stored locally. If `False`, a CDN is used for some stylesheets and scripts.
`html.inline`	boolean	`True`	If `True`, all assets are contained in the report. If `False`, then a web export is created, where all assets are stored in the '[REPORT_NAME]_assets/' directory.
`html.navbar_show`	boolean	`True`	Whether to include a navigation bar in the report
`html.style.theme`	string	`None`	Select a bootswatch theme. Available options: flatly (dark) and united (orange)
`html.style.logo`	string	nan	A base64 encoded logo, to display in the navigation bar.
`html.style.primary_color`	string	#337ab7	The primary color to use in the report.
`html.style.full_width`	boolean	`False`	By default, the width of the report is fixed. If set to `True`, the full width of the screen is used.