在 C++¶
设置¶
In [ ]:
Copied!
pip install ydf -U
pip install ydf -U
在C++中服务¶
YDF模型可以通过C++库直接服务。由于Python API和C++ API共享相同的服务代码,因此模型是完全互操作的。
使用C++服务的好处
- 优化推理速度:C++ API提供对服务代码的完全控制,可以最大化YDF的性能。
- 优化二进制大小:由于C++服务代码不依赖于训练代码,因此只需链接YDF的一小部分。
何时不使用C++ API
- C++ API的使用不如Python API简单。
- 预处理(如果有的话)必须在C++中重新生成。
训练一个小模型¶
下一个单元将创建一个非常小的YDF模型。
In [2]:
Copied!
# 加载库
import ydf # Yggdrasil决策森林
import pandas as pd # 我们使用 Pandas 加载小型数据集。
# 下载一个分类数据集,并将其加载为Pandas DataFrame。
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
train_ds = pd.read_csv(f"{ds_path}/iris.csv")
label = "class"
model = ydf.RandomForestLearner(label=label, num_trees=10).train(train_ds)
model.describe()
# 加载库
import ydf # Yggdrasil决策森林
import pandas as pd # 我们使用 Pandas 加载小型数据集。
# 下载一个分类数据集,并将其加载为Pandas DataFrame。
ds_path = "https://raw.githubusercontent.com/google/yggdrasil-decision-forests/main/yggdrasil_decision_forests/test_data/dataset"
train_ds = pd.read_csv(f"{ds_path}/iris.csv")
label = "class"
model = ydf.RandomForestLearner(label=label, num_trees=10).train(train_ds)
model.describe()
Train model on 150 examples Model trained in 0:00:00.003721
Out[2]:
Name : RANDOM_FOREST
Task : CLASSIFICATION
Label : class
Features (4) : Sepal.Length Sepal.Width Petal.Length Petal.Width
Weights : None
Trained with tuner : No
Model size : 29 kB
Task : CLASSIFICATION
Label : class
Features (4) : Sepal.Length Sepal.Width Petal.Length Petal.Width
Weights : None
Trained with tuner : No
Model size : 29 kB
Number of records: 150 Number of columns: 5 Number of columns by type: NUMERICAL: 4 (80%) CATEGORICAL: 1 (20%) Columns: NUMERICAL: 4 (80%) 1: "Sepal.Length" NUMERICAL mean:5.84333 min:4.3 max:7.9 sd:0.825301 2: "Sepal.Width" NUMERICAL mean:3.05733 min:2 max:4.4 sd:0.434411 3: "Petal.Length" NUMERICAL mean:3.758 min:1 max:6.9 sd:1.7594 4: "Petal.Width" NUMERICAL mean:1.19933 min:0.1 max:2.5 sd:0.759693 CATEGORICAL: 1 (20%) 0: "class" CATEGORICAL has-dict vocab-size:4 zero-ood-items most-frequent:"setosa" 50 (33.3333%) Terminology: nas: Number of non-available (i.e. missing) values. ood: Out of dictionary. manually-defined: Attribute whose type is manually defined by the user, i.e., the type was not automatically inferred. tokenized: The attribute value is obtained through tokenization. has-dict: The attribute is attached to a string dictionary e.g. a categorical attribute stored as a string. vocab-size: Number of unique values.
The following evaluation is computed on the validation or out-of-bag dataset.
Number of predictions (without weights): 149 Number of predictions (with weights): 149 Task: CLASSIFICATION Label: class Accuracy: 0.919463 CI95[W][0.872779 0.952873] LogLoss: : 0.798053 ErrorRate: : 0.0805369 Default Accuracy: : 0.33557 Default LogLoss: : 1.09857 Default ErrorRate: : 0.66443 Confusion Table: truth\prediction setosa versicolor virginica setosa 50 0 0 versicolor 0 47 3 virginica 0 9 40 Total: 149
Variable importances measure the importance of an input feature for a model.
1. "Petal.Length" 0.595238 ################ 2. "Petal.Width" 0.578035 ############### 3. "Sepal.Width" 0.280786 4. "Sepal.Length" 0.279107
1. "Petal.Length" 5.000000 2. "Petal.Width" 5.000000
1. "Petal.Length" 18.000000 ################ 2. "Petal.Width" 15.000000 ############ 3. "Sepal.Width" 5.000000 ## 4. "Sepal.Length" 3.000000
1. "Petal.Length" 870.339292 ################ 2. "Petal.Width" 676.225185 ############ 3. "Sepal.Width" 12.636705 4. "Sepal.Length" 12.459391
Those variable importances are computed during training. More, and possibly more informative, variable importances are available when analyzing a model on a test dataset.
Num trees : 10
Only printing the first tree.
Tree #0: "Petal.Length">=2.6 [s:0.673012 n:150 np:90 miss:1] ; val:"setosa" prob:[0.4, 0.266667, 0.333333] ├─(pos)─ "Petal.Width">=1.75 [s:0.512546 n:90 np:45 miss:0] ; val:"virginica" prob:[0, 0.444444, 0.555556] | ├─(pos)─ val:"virginica" prob:[0, 0, 1] | └─(neg)─ "Petal.Length">=4.95 [s:0.139839 n:45 np:7 miss:0] ; val:"versicolor" prob:[0, 0.888889, 0.111111] | ├─(pos)─ val:"virginica" prob:[0, 0.428571, 0.571429] | └─(neg)─ "Sepal.Length">=5.55 [s:0.0505512 n:38 np:32 miss:1] ; val:"versicolor" prob:[0, 0.973684, 0.0263158] | ├─(pos)─ val:"versicolor" prob:[0, 1, 0] | └─(neg)─ val:"versicolor" prob:[0, 0.833333, 0.166667] └─(neg)─ val:"setosa" prob:[1, 0, 0]
生成C++代码¶
通过model.to_cpp()
,YDF创建一个可以在现有C++项目中导入的工作C++文件。C++代码的命名空间通过key=
参数进行控制。
In [3]:
Copied!
# 将模型代码保存为model.h并显示它
with open("ydf_tutorial_model.h", "w") as f:
f.write(model.to_cpp(key="ydf_tutorial"))
!cat ydf_tutorial_model.h
# 将模型代码保存为model.h并显示它
with open("ydf_tutorial_model.h", "w") as f:
f.write(model.to_cpp(key="ydf_tutorial"))
!cat ydf_tutorial_model.h
// Automatically generated code running an Yggdrasil Decision Forests model in // C++. This code was generated with "model.to_cpp()". // // Date of generation: 2023-11-01 13:06:59.075973 // YDF Version: 0.0.3 // // How to use this code: // // 1. Copy this code in a new .h file. // 2. If you use Bazel/Blaze, use the following dependencies: // //third_party/absl/status:statusor // //third_party/absl/strings // //external/ydf_cc/yggdrasil_decision_forests/api:serving // 3. In your existing code, include the .h file and do: // // Load the model (to do only once). // namespace ydf = yggdrasil_decision_forests; // const auto model = ydf::exported_model_123::Load(<path to model>); // // Run the model // predictions = model.Predict(); // 4. By default, the "Predict" function takes no inputs and creates fake // examples. In practice, you want to add your input data as arguments to // "Predict" and call "examples->Set..." functions accordingly. // 4. (Bonus) // Allocate one "examples" and "predictions" per thread and reuse them to // speed-up the inference. // #ifndef YGGDRASIL_DECISION_FORESTS_GENERATED_MODEL_ydf_tutorial #define YGGDRASIL_DECISION_FORESTS_GENERATED_MODEL_ydf_tutorial #include <memory> #include <vector> #include "third_party/absl/status/statusor.h" #include "third_party/absl/strings/string_view.h" #include "external/ydf_cc/yggdrasil_decision_forests/api/serving.h" namespace yggdrasil_decision_forests { namespace exported_model_ydf_tutorial { struct ServingModel { std::vector<float> Predict() const; // Compiled model. std::unique_ptr<serving_api::FastEngine> engine; // Index of the input features of the model. // // Non-owning pointer. The data is owned by the engine. const serving_api::FeaturesDefinition* features; // Number of output predictions for each example. // Equal to 1 for regression, ranking and binary classification with compact // format. Equal to the number of classes for classification. int NumPredictionDimension() const { return engine->NumPredictionDimension(); } // Indexes of the input features. serving_api::NumericalFeatureId feature_Sepal_Length; serving_api::NumericalFeatureId feature_Sepal_Width; serving_api::NumericalFeatureId feature_Petal_Length; serving_api::NumericalFeatureId feature_Petal_Width; }; // TODO: Pass input feature values to "Predict". inline std::vector<float> ServingModel::Predict() const { // Allocate memory for 2 examples. Alternatively, for speed-sensitive code, // an "examples" object can be allocated for each thread and reused. It is // okay to allocate more examples than needed. const int num_examples = 2; auto examples = engine->AllocateExamples(num_examples); // Set all the values to be missing. The values may then be overridden by the // "Set*" methods. If all the values are set with "Set*" methods, // "FillMissing" can be skipped. examples->FillMissing(*features); // Example #0 examples->SetNumerical(/*example_idx=*/0, feature_Sepal_Length, 1.f, *features); examples->SetNumerical(/*example_idx=*/0, feature_Sepal_Width, 1.f, *features); examples->SetNumerical(/*example_idx=*/0, feature_Petal_Length, 1.f, *features); examples->SetNumerical(/*example_idx=*/0, feature_Petal_Width, 1.f, *features); // Example #1 examples->SetNumerical(/*example_idx=*/1, feature_Sepal_Length, 2.f, *features); examples->SetNumerical(/*example_idx=*/1, feature_Sepal_Width, 2.f, *features); examples->SetNumerical(/*example_idx=*/1, feature_Petal_Length, 2.f, *features); examples->SetNumerical(/*example_idx=*/1, feature_Petal_Width, 2.f, *features); // Run the model on the two examples. // // For speed-sensitive code, reuse the same predictions. std::vector<float> predictions; engine->Predict(*examples, num_examples, &predictions); return predictions; } inline absl::StatusOr<ServingModel> Load(absl::string_view path) { ServingModel m; // Load the model ASSIGN_OR_RETURN(auto model, serving_api::LoadModel(path)); // Compile the model into an inference engine. ASSIGN_OR_RETURN(m.engine, model->BuildFastEngine()); // Index the input features of the model. m.features = &m.engine->features(); // Index the input features. ASSIGN_OR_RETURN(m.feature_Sepal_Length, m.features->GetNumericalFeatureId("Sepal.Length")); ASSIGN_OR_RETURN(m.feature_Sepal_Width, m.features->GetNumericalFeatureId("Sepal.Width")); ASSIGN_OR_RETURN(m.feature_Petal_Length, m.features->GetNumericalFeatureId("Petal.Length")); ASSIGN_OR_RETURN(m.feature_Petal_Width, m.features->GetNumericalFeatureId("Petal.Width")); return m; } } // namespace exported_model_ydf_tutorial } // namespace yggdrasil_decision_forests #endif // YGGDRASIL_DECISION_FORESTS_GENERATED_MODEL_ydf_tutorial
使用C++代码¶
要在项目中使用C++代码,请按照以下步骤进行操作。
- 如果您使用Bazel/Blaze,请创建一个规则并添加依赖项:
//third_party/absl/status:statusor,
//third_party/absl/strings,
//third_party/yggdrasil_decision_forests/api:serving,
- 在您的C++代码中,包含.h文件并用以下代码调用模型:
// 加载模型(仅需执行一次)。
namespace ydf = yggdrasil_decision_forests;
const auto model = ydf::exported_model_ydf_tutorial::Load(<模型路径>);
// 运行模型
predictions = model.Predict();
- 生成的“Predict”函数不接受任何输入。相反,它用占位符值填充输入特征。因此,您需要将输入作为参数添加到“Predict”函数中,并相应地用于填充“examples->Set...”部分。
进一步改进¶
您可以通过预分配和重用每个运行模型的线程的示例和预测来进一步优化推理速度。