google · rstz · Jun 18, 2024 · Jun 18, 2024 · Aug 22, 2024 · Sep 24, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -14,10 +14,11 @@ Changelog under `yggdrasil_decision_forests/port/python/CHANGELOG.md`.
 - Allow configuring the truncation of NDCG losses.
 - Add support for distributed training for ranking gradient boosted tree
  models.
+- Add support for AVRO data file using the "avro:" prefix.
 
 ### Misc
 
-- Loss options are now defined 
+- Loss options are now defined
  model/gradient_boosted_trees/gradient_boosted_trees.proto (previously
  learner/gradient_boosted_trees/gradient_boosted_trees.proto)
 

diff --git a/LICENSE b/LICENSE
@@ -223,7 +223,11 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 THE SOFTWARE.
 
-rapidjson
+nlohmann_json
+MIT License 
+
+Copyright (c) 2013-2022 Niels Lohmann
+
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights

diff --git a/WORKSPACE_WITH_TF b/WORKSPACE_WITH_TF
@@ -20,8 +20,8 @@ load("//third_party/tensorflow:workspace.bzl", tensorflow = "deps")
 #load("//third_party/farmhash:workspace.bzl", farmhash = "deps")
 load("//third_party/boost_math:workspace.bzl", boost_math = "deps")
 # load("//third_party/grpc:workspace.bzl", grpc = "deps")
-load("//third_party/rapidjson:workspace.bzl", rapidjson = "deps")
 load("//third_party/eigen3:workspace.bzl", eigen = "deps")
+load("//third_party/nlohmann_json:workspace.bzl", nlohmann_json = "deps")
 
 gtest()
 # absl() # We use the abseil linked in tensorflow to avoid package clashes
@@ -31,8 +31,8 @@ tensorflow()
 #farmhash()
 boost_math()
 # grpc() # We use the protobuf linked in tensorflow.
-rapidjson()
 # eigen()
+nlohmann_json()
 
 # The initialization of YDF dependencies is commented. TensorFlow
 # is in charge of initializing them.

diff --git a/documentation/public/docs/blog/.authors.yml b/documentation/public/docs/blog/.authors.yml
@@ -0,0 +1,5 @@
+authors:
+ gbm:
+ name: Mathieu Guillame-Bert
+ description: Creator
+ avatar: https://avatars.githubusercontent.com/u/52443
diff --git a/documentation/public/docs/blog/index.md b/documentation/public/docs/blog/index.md
@@ -0,0 +1,4 @@
+# ✒️ Blog
+
+Here, we share playful, surprising or interesting experiments, stories, and
+facts about machine learning in general and decision forests in particular.
diff --git a/documentation/public/docs/blog/posts/1_how_ml_models_generalize.md b/documentation/public/docs/blog/posts/1_how_ml_models_generalize.md
diff --git a/documentation/public/docs/cli_quickstart.md b/documentation/public/docs/cli_quickstart.md
@@ -222,7 +222,7 @@ A few remarks:
  provide reasonable results in most situations. We will discuss alternative
  default values (called hyperparameter templates) and automated tuning of
  hyperparameters later. The list of all hyperparameters and their default
- values is available in the [hyperparameters page](hyper_parameters).
+ values is available in the [hyperparameters page](hyperparameters.md).
 
 - No validation dataset was provided for the training. Not all learners
  require a validation dataset. However, the `GRADIENT_BOOSTED_TREES` learner
@@ -442,7 +442,7 @@ One vs other classes:
 - The test accuracy is 0.874399 with 95% confidence interval boundaries of
  [0.86875; 0.879882].
 - The test AUC is 0.929207 with 95% confidence interval boundaries of
- [0.924358 0.934056](when computed with a closed form) and [0.973397
+ [0.924358 0.934056] when computed with a closed form and [0.973397
  0.977947] when computed with bootstrapping.
 - The PR-AUC and AP metrics are also available.
 

diff --git a/documentation/public/docs/cli_user_manual.md b/documentation/public/docs/cli_user_manual.md
@@ -4,7 +4,7 @@ This page contains an in-depth introduction to Yggdrasil Decision Forests (YDF)
 CLI API. The content presented on this page is generally not necessary to use
 YDF, but it will help users improve their understanding and use advance options.
 
-New users should first check out the [Quick start](cli_quick_start).
+New users should first check out the [Quick start](cli_quick_start.md).
 
 Most concepts presented here apply to the other APIs, notably the C++ API.
 
@@ -280,7 +280,7 @@ The **generic hyper-parameters** (GHPs) are an alternative representation to the
 quick configuration, and automated hyper-parameter optimization. GHPs are used
 by TensorFlow Decision Forests (TF-DF):
 
-The [hyper-parameter](hyper_parameters) page lists the learners and their
+The [hyper-parameter](hyperparameters.md) page lists the learners and their
 hyper-parameters.
 
 Optionally, a learner can be configured with a **deployment specification**. A
@@ -468,7 +468,7 @@ The available variable importances are:
 
 Optimizing the hyperparameters of a learner can improve the quality of a model.
 Selecting the optimal hyper-parameters can be done manually (see
-[how to improve a model](improve_model)) or using the automated hyper-parameter
+[how to improve a model](guide_how_to_improve_model.md)) or using the automated hyper-parameter
 optimizer (HPO). The HPO automatically selects the best hyper-parameters through
 a sequence of trial-and-error computations.
 

diff --git a/documentation/public/docs/guide_how_to_improve_model.md b/documentation/public/docs/guide_how_to_improve_model.md
@@ -15,7 +15,7 @@ Having a basic understanding of how decision forests work is useful to optimize
 them. For more information, please refer to
 [Google's Decision Forests class](https://developers.google.com/machine-learning/decision-forests).
 
-The [hyper-parameter page](hyperparameters) lists and explains the available
+The [hyper-parameter page](hyperparameters.md) lists and explains the available
 hyper-parameters.
 
 ## Random Forest or Gradient Boosted Trees?
@@ -45,7 +45,7 @@ Automated hyperparameter tuning is a simple but expensive solution to improve
 the quality of a model. When full hyper-parameter tuning is too expensive,
 combining hyper-parameter tuning and manual tuning is a good solution.
 
-See the [Tuning notebook](tutorial/tuning/) for details.
+See the [Tuning notebook](tutorial/tuning.ipynb) for details.
 
 ## Hyper-parameter templates
 
@@ -58,7 +58,7 @@ without having understood those hyper-parameters and without having to run
 hyper-parameter tuning, YDF have pre-configured **hyper-parameter templates**.
 
 The hyper-parameter templates are available by calling
-[hyperparameter_templates](py_api/GradientBoostedTreesLearner/#ydf.GradientBoostedTreesLearner.hyperparameter_templates)
+[hyperparameter_templates](py_api/GradientBoostedTreesLearner.md#ydf.GradientBoostedTreesLearner.hyperparameter_templates)
 on a learner.
 
 ```python

diff --git a/documentation/public/docs/index.md b/documentation/public/docs/index.md
@@ -133,10 +133,10 @@ model.save("/tmp/my_model")
 
 **Modeling**
 
-- Train [Random Forest](py_api/RandomForestLearner),
- [Gradient Boosted Trees](py_api/GradientBoostedTreesLearner),
- [Cart](py_api/CartLearner), and
- [Isolation Forest](py_api/IsolationForestLearner) models.
+- Train [Random Forest](py_api/RandomForestLearner.md),
+ [Gradient Boosted Trees](py_api/GradientBoostedTreesLearner.md),
+ [Cart](py_api/CartLearner.md), and
+ [Isolation Forest](py_api/IsolationForestLearner.md) models.
 - Train [classification](tutorial/classification.ipynb),
  [regression](tutorial/regression.ipynb), [ranking](tutorial/ranking.ipynb),
  [uplifting](tutorial/uplifting.ipynb), and
@@ -159,12 +159,12 @@ model.save("/tmp/my_model")
 
 **Serving**
 
-- [Benchmark](tutorial/getting_started/#benchmark-model-speed) model
+- [Benchmark](tutorial/getting_started.md#benchmark-model-speed) model
  inference.
 - Run models in Python, [C++](tutorial/cpp.ipynb),
  [Go](https://github.com/google/yggdrasil-decision-forests/tree/main/yggdrasil_decision_forests/port/go),
  [JavaScript](https://github.com/google/yggdrasil-decision-forests/tree/main/yggdrasil_decision_forests/port/javascript),
- and [CLI](cli_commands).
+ and [CLI](cli_commands.md).
 - Online inference with REST API with
  [TensorFlow Serving and Vertex AI](tutorial/tf_serving.ipynb).
 

diff --git a/documentation/public/docs/javascript.md b/documentation/public/docs/javascript.md
@@ -0,0 +1,104 @@
+# Javascript
+
+YDF offers two different npm packages to run on the web:
+
+* [ydf-inference](https://www.npmjs.com/package/ydf-inference) Only for 
+ generating predictions using an existing model. Models can be trained with
+ ydf-training (see below), YDF python, or any other YDF API. If you only need
+ model predictions, use this package instead of ydf-training to save on
+ binary size.
+* [ydf-training](https://www.npmjs.com/package/ydf-training) for both training
+ models and generating predictions.
+
+Both packages are compatible with NodeJS+CommonJS, NodeJS+ES6 and Browser JS.
+
+## ydf-inference
+
+`ydf-inference` is YDF's interface for model inference on the Web.
+See the [Readme on npmjs.com](https://www.npmjs.com/package/ydf-inference) for
+information about downloading and testing the package.
+
+The following example shows how to download a YDF model and make predictions
+on a Javascript dictionary of arrays.
+
+```html
+<script src="./node_modules/ydf-inference/dist/inference.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/jszip/3.10.0/jszip.min.js"></script>
+<script>
+YDFInference()
+ .then(ydf => ydf.loadModelFromUrl("http://localhost:3000/model.zip"))
+ .then(model => {
+ let examples = {
+ "age": [39, 40, 40, 35],
+ "workclass": ["State-gov", "Private", "Private", "Federal-gov"],
+ "fnlwgt": [77516, 121772, 193524, 76845],
+ "education": ["Bachelors", "Assoc-voc", "Doctorate", "9th"],
+ "education_num": ["13", "11", "16", "5"],
+ "marital_status": ["Never-married", "Married-civ-spouse", "Married-civ-spouse", "Married-civ-spouse"],
+ "occupation": ["Adm-clerical", "Craft-repair", "Prof-specialty", "Farming-fishing"],
+ "relationship": ["Not-in-family", "Husband", "Husband", "Husband"],
+ "race": ["White", "Asian-Pac-Islander", "White", "Black"],
+ "sex": ["Male", "Male", "Male", "Male"],
+ "capital_gain": [2174, 0, 0, 0],
+ "capital_loss": [0, 0, 0, 0],
+ "hours_per_week": [40, 40, 60, 40],
+ "native_country": ["United-States", null, "United-States", "United-States"]
+ };
+ predictions = model.predict(examples);
+ model.unload();
+ });
+</script>
+```
+
+## ydf-training
+
+`ydf-training` is YDF's interface for training and inspecting models in
+Javascript. It is implemented with Javascript and WebAssembly.
+See the [Readme on npmjs.com](https://www.npmjs.com/package/ydf-training) for
+information about downloading and testing the package.
+
+The following example shows how to train a Gradient Boosted Trees model on a 
+first csv dataset, and then use this model to make predictions on a second csv
+dataset.
+
+```html
+<script src="./node_modules/ydf-training/dist/training.js"></script>
+<script src="https://cdnjs.cloudflare.com/ajax/libs/jszip/3.10.0/jszip.min.js"></script>
+<script>
+YDFTraining()
+ .then( async (ydf) => {
+ // Download the datasets.
+ const rawTrain = await fetch("http://localhost:3000/train.csv");
+ const train = await rawTrain.text();
+ const rawTest = await fetch("http://localhost:3000/test.csv");
+ const test = await rawTest.text();
+
+ // Prepare the training configuration.
+ const task = "CLASSIFICATION";
+ const label = "label";
+
+ // Train the model.
+ const model = new ydf.GradientBoostedTreesLearner(label, task).train(data);
+
+ // Make predictions.
+ const predictions = model.predict(data);
+
+ // Print the description of the model.
+ console.log(model.describe());
+
+ // Save the model to later. This model can also be run with ydf-inference
+ // or Python YDF.
+ const modelAsZipBlob = await model.save();
+ model.unload();
+ });
+</script>
+```
+
+### Known limitations
+
+`ydf-training` currently only supports a subset of the functionality of YDF's
+Python surface, namely **supervised learning** with Random Forests and 
+Gradient Boosted Trees. Hyperparameter configuration is not yet supported.
+Additionally, model evaluation and model analysis are not yet supported.
+
+For feature requests, please open an issue [on GitHub](https://github.com/google/yggdrasil-decision-forests).
diff --git a/documentation/public/docs/py_api/util.md b/documentation/public/docs/py_api/util.md
@@ -2,6 +2,6 @@
 
 [TOC]
 
-::: ydf.util.read_tf_record
+::: ydf.util.tf_example.read_tf_record
 
-::: ydf.util.write_tf_record
+::: ydf.util.tf_example.write_tf_record
diff --git a/documentation/public/docs/style/extra.css b/documentation/public/docs/style/extra.css
@@ -11,6 +11,10 @@ h1#_1 {
  margin: 0 auto;
 }
 
+.halfsize {
+ transform: scale(0.5);
+}
+
 .logo {
  width: 397px;
  margin-top: 20px;