Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable #141

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
7a64c83
Merge pull request #106 from google/main
achoum Jun 18, 2024
da18a55
Merge pull request #107 from google/main
achoum Jun 18, 2024
7a0415c
Merge branch 'main' into stable
achoum Aug 22, 2024
ddba189
Merge pull request #133 from google/main
rstz Sep 24, 2024
03ccea3
[YDF] Expose MRR
rstz Sep 24, 2024
4287f36
reverses definition order for data members
a-googler Sep 25, 2024
1708111
internal
maxwillzq Sep 26, 2024
961680d
Fix flaky tests
achoum Sep 30, 2024
2e5da2e
Improve `predict` documentation and create "model.predict_class"
achoum Oct 1, 2024
9aa09ac
Fix documentation build
achoum Oct 1, 2024
719267e
Create inference subfolder for Javascript build
rstz Oct 1, 2024
561a0d1
[YDF] Fix JS inference tests
rstz Oct 1, 2024
19b5f31
Blog post: Seeing the Forest Through the Trees
achoum Oct 4, 2024
f1d4415
[YDF] Breaking: Fix typo partial_depepence_plot --> partial_dependenc…
rstz Oct 5, 2024
b265958
Read Avro files without dependencies (part 1)
achoum Oct 8, 2024
919aa92
Read Avro fiels without dependencies (part 2)
achoum Oct 8, 2024
a009ffa
Read Avro files without dependencies (part 3)
achoum Oct 8, 2024
7c39c43
Internal change
achoum Oct 8, 2024
14780a0
Read avro files without dependencies (part 4)
achoum Oct 8, 2024
37fa463
Read avro files without dependencies (part 5)
achoum Oct 9, 2024
76b42c1
Read Avro files without dependencies (part 6)
achoum Oct 9, 2024
54e7283
[YDF] Install FastAPI dependencies
rstz Oct 10, 2024
1c59e9d
Read avro files without dependencies (part 7; last one!)
achoum Oct 10, 2024
17d6eca
Expose Avro file format in python.
achoum Oct 10, 2024
a9134b0
Replace rapidjson with nlohmannjson in Avro reader
achoum Oct 10, 2024
49c5b84
Remove rapidjson
achoum Oct 10, 2024
908b8fc
Improve / simplify the migration guide.
achoum Oct 10, 2024
d2d5030
[YDF] Add YDF training
rstz Oct 11, 2024
c4bf508
[YDF] JS: Release Training API
rstz Oct 15, 2024
9cde224
[YDF] JS: Rename inference package, fix README
rstz Oct 15, 2024
a3f2c03
[YDF] JS: Add Changelog
rstz Oct 15, 2024
84c1e3c
[YDF] Add Javascript to public documentation
rstz Oct 18, 2024
47afcdc
[YDF] Fix Javascript docs location
rstz Oct 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,11 @@ Changelog under `yggdrasil_decision_forests/port/python/CHANGELOG.md`.
- Allow configuring the truncation of NDCG losses.
- Add support for distributed training for ranking gradient boosted tree
models.
- Add support for AVRO data file using the "avro:" prefix.

### Misc

- Loss options are now defined
- Loss options are now defined
model/gradient_boosted_trees/gradient_boosted_trees.proto (previously
learner/gradient_boosted_trees/gradient_boosted_trees.proto)

Expand Down
6 changes: 5 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,11 @@ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

rapidjson
nlohmann_json
MIT License

Copyright (c) 2013-2022 Niels Lohmann

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
Expand Down
4 changes: 2 additions & 2 deletions WORKSPACE_WITH_TF
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ load("//third_party/tensorflow:workspace.bzl", tensorflow = "deps")
#load("//third_party/farmhash:workspace.bzl", farmhash = "deps")
load("//third_party/boost_math:workspace.bzl", boost_math = "deps")
# load("//third_party/grpc:workspace.bzl", grpc = "deps")
load("//third_party/rapidjson:workspace.bzl", rapidjson = "deps")
load("//third_party/eigen3:workspace.bzl", eigen = "deps")
load("//third_party/nlohmann_json:workspace.bzl", nlohmann_json = "deps")

gtest()
# absl() # We use the abseil linked in tensorflow to avoid package clashes
Expand All @@ -31,8 +31,8 @@ tensorflow()
#farmhash()
boost_math()
# grpc() # We use the protobuf linked in tensorflow.
rapidjson()
# eigen()
nlohmann_json()

# The initialization of YDF dependencies is commented. TensorFlow
# is in charge of initializing them.
Expand Down
5 changes: 5 additions & 0 deletions documentation/public/docs/blog/.authors.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
authors:
gbm:
name: Mathieu Guillame-Bert
description: Creator
avatar: https://avatars.githubusercontent.com/u/52443
4 changes: 4 additions & 0 deletions documentation/public/docs/blog/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# ✒️ Blog

Here, we share playful, surprising or interesting experiments, stories, and
facts about machine learning in general and decision forests in particular.
253 changes: 253 additions & 0 deletions documentation/public/docs/blog/posts/1_how_ml_models_generalize.md

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions documentation/public/docs/cli_quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ A few remarks:
provide reasonable results in most situations. We will discuss alternative
default values (called hyperparameter templates) and automated tuning of
hyperparameters later. The list of all hyperparameters and their default
values is available in the [hyperparameters page](hyper_parameters).
values is available in the [hyperparameters page](hyperparameters.md).

- No validation dataset was provided for the training. Not all learners
require a validation dataset. However, the `GRADIENT_BOOSTED_TREES` learner
Expand Down Expand Up @@ -442,7 +442,7 @@ One vs other classes:
- The test accuracy is 0.874399 with 95% confidence interval boundaries of
[0.86875; 0.879882].
- The test AUC is 0.929207 with 95% confidence interval boundaries of
[0.924358 0.934056](when computed with a closed form) and [0.973397
[0.924358 0.934056] when computed with a closed form and [0.973397
0.977947] when computed with bootstrapping.
- The PR-AUC and AP metrics are also available.

Expand Down
6 changes: 3 additions & 3 deletions documentation/public/docs/cli_user_manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This page contains an in-depth introduction to Yggdrasil Decision Forests (YDF)
CLI API. The content presented on this page is generally not necessary to use
YDF, but it will help users improve their understanding and use advance options.

New users should first check out the [Quick start](cli_quick_start).
New users should first check out the [Quick start](cli_quick_start.md).

Most concepts presented here apply to the other APIs, notably the C++ API.

Expand Down Expand Up @@ -280,7 +280,7 @@ The **generic hyper-parameters** (GHPs) are an alternative representation to the
quick configuration, and automated hyper-parameter optimization. GHPs are used
by TensorFlow Decision Forests (TF-DF):

The [hyper-parameter](hyper_parameters) page lists the learners and their
The [hyper-parameter](hyperparameters.md) page lists the learners and their
hyper-parameters.

Optionally, a learner can be configured with a **deployment specification**. A
Expand Down Expand Up @@ -468,7 +468,7 @@ The available variable importances are:

Optimizing the hyperparameters of a learner can improve the quality of a model.
Selecting the optimal hyper-parameters can be done manually (see
[how to improve a model](improve_model)) or using the automated hyper-parameter
[how to improve a model](guide_how_to_improve_model.md)) or using the automated hyper-parameter
optimizer (HPO). The HPO automatically selects the best hyper-parameters through
a sequence of trial-and-error computations.

Expand Down
6 changes: 3 additions & 3 deletions documentation/public/docs/guide_how_to_improve_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Having a basic understanding of how decision forests work is useful to optimize
them. For more information, please refer to
[Google's Decision Forests class](https://developers.google.com/machine-learning/decision-forests).

The [hyper-parameter page](hyperparameters) lists and explains the available
The [hyper-parameter page](hyperparameters.md) lists and explains the available
hyper-parameters.

## Random Forest or Gradient Boosted Trees?
Expand Down Expand Up @@ -45,7 +45,7 @@ Automated hyperparameter tuning is a simple but expensive solution to improve
the quality of a model. When full hyper-parameter tuning is too expensive,
combining hyper-parameter tuning and manual tuning is a good solution.

See the [Tuning notebook](tutorial/tuning/) for details.
See the [Tuning notebook](tutorial/tuning.ipynb) for details.

## Hyper-parameter templates

Expand All @@ -58,7 +58,7 @@ without having understood those hyper-parameters and without having to run
hyper-parameter tuning, YDF have pre-configured **hyper-parameter templates**.

The hyper-parameter templates are available by calling
[hyperparameter_templates](py_api/GradientBoostedTreesLearner/#ydf.GradientBoostedTreesLearner.hyperparameter_templates)
[hyperparameter_templates](py_api/GradientBoostedTreesLearner.md#ydf.GradientBoostedTreesLearner.hyperparameter_templates)
on a learner.

```python
Expand Down
12 changes: 6 additions & 6 deletions documentation/public/docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,10 +133,10 @@ model.save("/tmp/my_model")

**Modeling**

- Train [Random Forest](py_api/RandomForestLearner),
[Gradient Boosted Trees](py_api/GradientBoostedTreesLearner),
[Cart](py_api/CartLearner), and
[Isolation Forest](py_api/IsolationForestLearner) models.
- Train [Random Forest](py_api/RandomForestLearner.md),
[Gradient Boosted Trees](py_api/GradientBoostedTreesLearner.md),
[Cart](py_api/CartLearner.md), and
[Isolation Forest](py_api/IsolationForestLearner.md) models.
- Train [classification](tutorial/classification.ipynb),
[regression](tutorial/regression.ipynb), [ranking](tutorial/ranking.ipynb),
[uplifting](tutorial/uplifting.ipynb), and
Expand All @@ -159,12 +159,12 @@ model.save("/tmp/my_model")

**Serving**

- [Benchmark](tutorial/getting_started/#benchmark-model-speed) model
- [Benchmark](tutorial/getting_started.md#benchmark-model-speed) model
inference.
- Run models in Python, [C++](tutorial/cpp.ipynb),
[Go](https://github.com/google/yggdrasil-decision-forests/tree/main/yggdrasil_decision_forests/port/go),
[JavaScript](https://github.com/google/yggdrasil-decision-forests/tree/main/yggdrasil_decision_forests/port/javascript),
and [CLI](cli_commands).
and [CLI](cli_commands.md).
- Online inference with REST API with
[TensorFlow Serving and Vertex AI](tutorial/tf_serving.ipynb).

Expand Down
104 changes: 104 additions & 0 deletions documentation/public/docs/javascript.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Javascript

YDF offers two different npm packages to run on the web:

* [ydf-inference](https://www.npmjs.com/package/ydf-inference) Only for
generating predictions using an existing model. Models can be trained with
ydf-training (see below), YDF python, or any other YDF API. If you only need
model predictions, use this package instead of ydf-training to save on
binary size.
* [ydf-training](https://www.npmjs.com/package/ydf-training) for both training
models and generating predictions.

Both packages are compatible with NodeJS+CommonJS, NodeJS+ES6 and Browser JS.

## ydf-inference

`ydf-inference` is YDF's interface for model inference on the Web.
See the [Readme on npmjs.com](https://www.npmjs.com/package/ydf-inference) for
information about downloading and testing the package.

The following example shows how to download a YDF model and make predictions
on a Javascript dictionary of arrays.

```html
<script src="./node_modules/ydf-inference/dist/inference.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jszip/3.10.0/jszip.min.js"></script>
<script>
YDFInference()
.then(ydf => ydf.loadModelFromUrl("http://localhost:3000/model.zip"))
.then(model => {
let examples = {
"age": [39, 40, 40, 35],
"workclass": ["State-gov", "Private", "Private", "Federal-gov"],
"fnlwgt": [77516, 121772, 193524, 76845],
"education": ["Bachelors", "Assoc-voc", "Doctorate", "9th"],
"education_num": ["13", "11", "16", "5"],
"marital_status": ["Never-married", "Married-civ-spouse", "Married-civ-spouse", "Married-civ-spouse"],
"occupation": ["Adm-clerical", "Craft-repair", "Prof-specialty", "Farming-fishing"],
"relationship": ["Not-in-family", "Husband", "Husband", "Husband"],
"race": ["White", "Asian-Pac-Islander", "White", "Black"],
"sex": ["Male", "Male", "Male", "Male"],
"capital_gain": [2174, 0, 0, 0],
"capital_loss": [0, 0, 0, 0],
"hours_per_week": [40, 40, 60, 40],
"native_country": ["United-States", null, "United-States", "United-States"]
};
predictions = model.predict(examples);
model.unload();
});
</script>
```

## ydf-training

`ydf-training` is YDF's interface for training and inspecting models in
Javascript. It is implemented with Javascript and WebAssembly.
See the [Readme on npmjs.com](https://www.npmjs.com/package/ydf-training) for
information about downloading and testing the package.

The following example shows how to train a Gradient Boosted Trees model on a
first csv dataset, and then use this model to make predictions on a second csv
dataset.

```html
<script src="./node_modules/ydf-training/dist/training.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jszip/3.10.0/jszip.min.js"></script>
<script>
YDFTraining()
.then( async (ydf) => {
// Download the datasets.
const rawTrain = await fetch("http://localhost:3000/train.csv");
const train = await rawTrain.text();
const rawTest = await fetch("http://localhost:3000/test.csv");
const test = await rawTest.text();

// Prepare the training configuration.
const task = "CLASSIFICATION";
const label = "label";

// Train the model.
const model = new ydf.GradientBoostedTreesLearner(label, task).train(data);

// Make predictions.
const predictions = model.predict(data);

// Print the description of the model.
console.log(model.describe());

// Save the model to later. This model can also be run with ydf-inference
// or Python YDF.
const modelAsZipBlob = await model.save();
model.unload();
});
</script>
```

### Known limitations

`ydf-training` currently only supports a subset of the functionality of YDF's
Python surface, namely **supervised learning** with Random Forests and
Gradient Boosted Trees. Hyperparameter configuration is not yet supported.
Additionally, model evaluation and model analysis are not yet supported.

For feature requests, please open an issue [on GitHub](https://github.com/google/yggdrasil-decision-forests).
4 changes: 2 additions & 2 deletions documentation/public/docs/py_api/util.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@

[TOC]

::: ydf.util.read_tf_record
::: ydf.util.tf_example.read_tf_record

::: ydf.util.write_tf_record
::: ydf.util.tf_example.write_tf_record
4 changes: 4 additions & 0 deletions documentation/public/docs/style/extra.css
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ h1#_1 {
margin: 0 auto;
}

.halfsize {
transform: scale(0.5);
}

.logo {
width: 397px;
margin-top: 20px;
Expand Down
Loading