From bc0669357857fe34908de28a2e2c72fcb18c9fb0 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Fri, 25 Oct 2024 03:34:06 +0000 Subject: [PATCH] build based on a1287dc --- dev/.documenter-siteinfo.json | 2 +- dev/index.html | 2 +- dev/lib/bipartition/index.html | 2 +- dev/lib/helper_methods/index.html | 2 +- dev/man/installation/index.html | 2 +- dev/model/basic/index.html | 2 +- dev/model/dbscan/index.html | 2 +- dev/model/gmm/index.html | 2 +- dev/model/hclust/index.html | 2 +- dev/model/kmeans/index.html | 2 +- 10 files changed, 10 insertions(+), 10 deletions(-) diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index cdefdd9..7083bee 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.8.5","generation_timestamp":"2024-10-25T03:32:09","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.7.3","generation_timestamp":"2024-10-25T03:33:58","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/dev/index.html b/dev/index.html index 2db42b5..c316929 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -Home · PhyloClustering.jl

PhyloClustering.jl

PhyloClustering.jl is a Julia package for performing unsupervised learning on phylogenetic trees. The algorithms currently included are K-means, Hierarchical Clustering, Gaussian Mixture Model (GMM), and Density-based Spatial Clustering of Applications with Noise (DBSCAN).

Citation

If you use PhyloClustering.jl in your work, we kindly ask that you cite the following paper:

  • Kong, Y., Tiley, G. P., Solís-Lemus, C. (2023). Unsupervised learning of phylogenetic trees via split-weight embedding. arXiv:2312.16074.
+Home · PhyloClustering.jl

PhyloClustering.jl

PhyloClustering.jl is a Julia package for performing unsupervised learning on phylogenetic trees. The algorithms currently included are K-means, Hierarchical Clustering, Gaussian Mixture Model (GMM), and Density-based Spatial Clustering of Applications with Noise (DBSCAN).

Citation

If you use PhyloClustering.jl in your work, we kindly ask that you cite the following paper:

  • Kong, Y., Tiley, G. P., Solís-Lemus, C. (2023). Unsupervised learning of phylogenetic trees via split-weight embedding. arXiv:2312.16074.
diff --git a/dev/lib/bipartition/index.html b/dev/lib/bipartition/index.html index f555312..bac8103 100644 --- a/dev/lib/bipartition/index.html +++ b/dev/lib/bipartition/index.html @@ -22,4 +22,4 @@ 4 => "P2" 2 => "O" 3 => "P1" - 1 => "HYB"source + 1 => "HYB"source diff --git a/dev/lib/helper_methods/index.html b/dev/lib/helper_methods/index.html index c2f91d8..507f708 100644 --- a/dev/lib/helper_methods/index.html +++ b/dev/lib/helper_methods/index.html @@ -1,2 +1,2 @@ -Helper Functions · PhyloClustering.jl

Pre-process data before input models

PhyloClustering.standardize_treeFunction
standardize_tree(tree::AbstractMatrix{<:Real})

Standardize tree Matrix that returned by split_weight. It is recommended to standardize the data before inputting it into the model.

Arguments

  • tree: a N * B Matrix containing trees (each row is a B-dimensional tree in bipartiton format).

Output

A standardized B * N tree Matrix with a mean of about 0 and a standard deviation of about 1. This tree Matrix can be the input of model.

source
PhyloClustering.distanceFunction
distance(tree::AbstractMatrix{<:Real})

Get the distance Matrix of a tree Matrix returned by split_weight.

Arguments

  • tree: a B * N tree Matrix (each column of tree Matrix is a B-dimensional tree in bipartiton format).

Output

A pairwise distance Matrix that can be the input of hc_label.

source
+Helper Functions · PhyloClustering.jl

Pre-process data before input models

PhyloClustering.standardize_treeFunction
standardize_tree(tree::AbstractMatrix{<:Real})

Standardize tree Matrix that returned by split_weight. It is recommended to standardize the data before inputting it into the model.

Arguments

  • tree: a N * B Matrix containing trees (each row is a B-dimensional tree in bipartiton format).

Output

A standardized B * N tree Matrix with a mean of about 0 and a standard deviation of about 1. This tree Matrix can be the input of model.

source
PhyloClustering.distanceFunction
distance(tree::AbstractMatrix{<:Real})

Get the distance Matrix of a tree Matrix returned by split_weight.

Arguments

  • tree: a B * N tree Matrix (each column of tree Matrix is a B-dimensional tree in bipartiton format).

Output

A pairwise distance Matrix that can be the input of hc_label.

source
diff --git a/dev/man/installation/index.html b/dev/man/installation/index.html index bcab2b9..b0e78d7 100644 --- a/dev/man/installation/index.html +++ b/dev/man/installation/index.html @@ -1,3 +1,3 @@ Installation · PhyloClustering.jl

Installation

Installation of Julia

Julia is a high-level and interactive programming language (like R or Matlab), but it is also high-performance (like C). To install Julia, follow instructions here. For a quick & basic tutorial on Julia, see learn x in y minutes.

Editors:

  • Visual Studio Code provides an editor and an integrated development environment (IDE) for Julia: highly recommended!
  • You can also run Julia within a Jupyter notebook (formerly IPython notebook).

IMPORTANT: Julia code is just-in-time compiled. This means that the first time you run a function, it will be compiled at that moment. So, please be patient! Future calls to the function will be much much faster. Trying out toy examples for the first calls is a good idea.

Installation of the PhyloClustering.jl package

To install the package, type inside Julia:

]
-add PhyloClustering

The first step can take a few minutes, be patient.

The PhyloClustering.jl package has dependencies like Distributions and StatsBase (see the Project.toml file for the full list), but everything is installed automatically.

Loading the Package

To check that your installation worked, type this in Julia to load the package. This is something to type every time you start a Julia session:

using PhyloClustering

This step can also take a while, if Julia needs to pre-compile the code (after a package update for instance).

Press ? inside Julia to switch to help mode, followed by the name of a function (or type) to get more details about it.

+add PhyloClustering

The first step can take a few minutes, be patient.

The PhyloClustering.jl package has dependencies like Distributions and StatsBase (see the Project.toml file for the full list), but everything is installed automatically.

Loading the Package

To check that your installation worked, type this in Julia to load the package. This is something to type every time you start a Julia session:

using PhyloClustering

This step can also take a while, if Julia needs to pre-compile the code (after a package update for instance).

Press ? inside Julia to switch to help mode, followed by the name of a function (or type) to get more details about it.

diff --git a/dev/model/basic/index.html b/dev/model/basic/index.html index 98a5b41..56d093d 100644 --- a/dev/model/basic/index.html +++ b/dev/model/basic/index.html @@ -1,2 +1,2 @@ -Basics · PhyloClustering.jl
+Basics · PhyloClustering.jl
diff --git a/dev/model/dbscan/index.html b/dev/model/dbscan/index.html index f6834ea..d154a30 100644 --- a/dev/model/dbscan/index.html +++ b/dev/model/dbscan/index.html @@ -1,2 +1,2 @@ -DBSCAN · PhyloClustering.jl

DBSCAN

Density-based Spatial Clustering of Applications with Noise (DBSCAN) is a data clustering algorithm that finds clusters through density-based expansion of seed points. The algorithm was proposed in:

PhyloClustering.dbscan_labelFunction
dbscan_label(tree::AbstractMatrix{<:Real}, radius::Real; min_neighbors::Int64 = 1, min_cluster_size::Int64 = 1)

Get predicted labels from density-based spatial clustering of applications with noise for a group of phylogenetic trees.

Arguments

  • tree: a B * N tree Matrix (each column of tree Matrix is a B-dimensional tree in bipartiton format) and B < N.
  • radius: neighborhood radius; points within this distance are considered neighbors.
  • min_neighbors: minimal number of neighbors required to assign a point to a cluster.
  • min_cluster_size: minimal number of points in a cluster.

Output

A Vector object with length of N containing predicted labels for each tree (the cluster it belongs to). 0 means the tree is noise.

source

Reference

The implementation of DBSCAN is provided by Clustering.jl.

Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96). AAAI Press, 226–231.

+DBSCAN · PhyloClustering.jl

DBSCAN

Density-based Spatial Clustering of Applications with Noise (DBSCAN) is a data clustering algorithm that finds clusters through density-based expansion of seed points. The algorithm was proposed in:

PhyloClustering.dbscan_labelFunction
dbscan_label(tree::AbstractMatrix{<:Real}, radius::Real; min_neighbors::Int64 = 1, min_cluster_size::Int64 = 1)

Get predicted labels from density-based spatial clustering of applications with noise for a group of phylogenetic trees.

Arguments

  • tree: a B * N tree Matrix (each column of tree Matrix is a B-dimensional tree in bipartiton format) and B < N.
  • radius: neighborhood radius; points within this distance are considered neighbors.
  • min_neighbors: minimal number of neighbors required to assign a point to a cluster.
  • min_cluster_size: minimal number of points in a cluster.

Output

A Vector object with length of N containing predicted labels for each tree (the cluster it belongs to). 0 means the tree is noise.

source

Reference

The implementation of DBSCAN is provided by Clustering.jl.

Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD'96). AAAI Press, 226–231.

diff --git a/dev/model/gmm/index.html b/dev/model/gmm/index.html index 3724a5b..69c31f2 100644 --- a/dev/model/gmm/index.html +++ b/dev/model/gmm/index.html @@ -1,2 +1,2 @@ -Gaussian Mixture Model (GMM) · PhyloClustering.jl

Gaussian Mixture Model (GMM)

Gaussian Mixture Model (GMM) a model-based probabilistic method that assumes data points are generated from a mixture of several Gaussian distributions with unknown parameters. It uses the Expectation Maximization (EM) algorithm to update the parameters iteratively in order to optimize the log-likelihood of the data until convergence.

PhyloClustering.gmm_labelFunction
gmm_label(tree::AbstractMatrix{<:Real}, n::Int64; method::Symbol=:kmeans, kind::Symbol=:diag)

Get predicted labels from Gaussian mixture model for a group of phylogenetic trees.

Arguments

  • tree: a B * N tree Matrix (each column of tree Matrix is a B-dimensional tree in bipartiton format).
  • n: the number of clusters.
  • method(defaults to :kmeans): intialization method to find n starting centers: * :kmeans: use K-means clustering from Clustering.jl to initialize with n centers. * :split: initialize a single Gaussian with tree and subsequently splitting the Gaussians followed by retraining using the EM algorithm until n Gaussians are obtained.
  • kind(defaults to :diag): covariance type, :diag or :full.

Output

A Tuple{Vector{Int64}, Vector{Int64}} where the first Vector contains predicted labels for each tree based on the posterior probability and the second Vector contain predicted labels for each tree based on the Log Likelihood.

source

Reference

The implementation of GMM is provided by GaussianMixtures.jl.

+Gaussian Mixture Model (GMM) · PhyloClustering.jl

Gaussian Mixture Model (GMM)

Gaussian Mixture Model (GMM) a model-based probabilistic method that assumes data points are generated from a mixture of several Gaussian distributions with unknown parameters. It uses the Expectation Maximization (EM) algorithm to update the parameters iteratively in order to optimize the log-likelihood of the data until convergence.

PhyloClustering.gmm_labelFunction
gmm_label(tree::AbstractMatrix{<:Real}, n::Int64; method::Symbol=:kmeans, kind::Symbol=:diag)

Get predicted labels from Gaussian mixture model for a group of phylogenetic trees.

Arguments

  • tree: a B * N tree Matrix (each column of tree Matrix is a B-dimensional tree in bipartiton format).
  • n: the number of clusters.
  • method(defaults to :kmeans): intialization method to find n starting centers: * :kmeans: use K-means clustering from Clustering.jl to initialize with n centers. * :split: initialize a single Gaussian with tree and subsequently splitting the Gaussians followed by retraining using the EM algorithm until n Gaussians are obtained.
  • kind(defaults to :diag): covariance type, :diag or :full.

Output

A Tuple{Vector{Int64}, Vector{Int64}} where the first Vector contains predicted labels for each tree based on the posterior probability and the second Vector contain predicted labels for each tree based on the Log Likelihood.

source

Reference

The implementation of GMM is provided by GaussianMixtures.jl.

diff --git a/dev/model/hclust/index.html b/dev/model/hclust/index.html index 7c7fbfe..8a9d079 100644 --- a/dev/model/hclust/index.html +++ b/dev/model/hclust/index.html @@ -47,4 +47,4 @@ 2 1 2 - 2

Reference

The implementation of hierarchical clustering is provided by Clustering.jl.

Joe H. Ward Jr. (1963) Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, 58:301, 236-244, DOI: 10.1080/01621459.1963.10500845

+ 2

Reference

The implementation of hierarchical clustering is provided by Clustering.jl.

Joe H. Ward Jr. (1963) Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, 58:301, 236-244, DOI: 10.1080/01621459.1963.10500845

diff --git a/dev/model/kmeans/index.html b/dev/model/kmeans/index.html index d954ad3..43dcbeb 100644 --- a/dev/model/kmeans/index.html +++ b/dev/model/kmeans/index.html @@ -48,4 +48,4 @@ 1 1 1 - 1

Reference

The implementation of Yinyang K-means is provided by ParallelKMeans.jl.

Yufei Ding et al. 2015. Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup. Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015

+ 1

Reference

The implementation of Yinyang K-means is provided by ParallelKMeans.jl.

Yufei Ding et al. 2015. Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup. Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015