Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleted dependency Makie #51

Merged
merged 2 commits into from
Oct 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

[compat]
CairoMakie = "0.11.1"
Clustering = "0.15.5"
Combinatorics = "1.0.2"
Distances = "0.10.10"
Expand Down
5 changes: 0 additions & 5 deletions docs/src/lib/helper_methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,4 @@
```@docs
standardize_tree
distance
```

# Visualize results
```@docs
plot_clusters
```
6 changes: 0 additions & 6 deletions docs/src/model/hclust.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,6 @@ matrix = distance(tree);
label = hc_label(matrix, 2)
```

We can visualize the result using build-in function [`plot_clusters`](@ref).

```@example 1
plot_clusters(trees', label)
```

**Reference**

The implementation of *hierarchical clustering* is provided by [`Clustering.jl`](https://github.com/JuliaStats/Clustering.jl).
Expand Down
6 changes: 0 additions & 6 deletions docs/src/model/kmeans.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,6 @@ tree = standardize_tree(trees);
label = kmeans_label(tree, 2, rng=rng)
```

We can visualize the result using build-in function [`plot_clusters`](@ref).

```@example 1
plot_clusters(trees', label)
```

**Reference**

The implementation of *Yinyang K-means* is provided by [`ParallelKMeans.jl`](https://github.com/PyDataBlog/ParallelKMeans.jl).
Expand Down
1 change: 0 additions & 1 deletion src/PhyloClustering.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ export
hc_label,
dbscan_label,
standardize_tree,
plot_clusters,
distance,
num_bipartitions,
show_bipartitions,
Expand Down
25 changes: 4 additions & 21 deletions src/helper.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
using MultivariateStats, StatsBase, CairoMakie, Distances
using MultivariateStats, StatsBase, Distances

"""
standardize_tree(tree::AbstractMatrix{<:Real})
Expand All @@ -12,31 +12,14 @@ It is recommended to standardize the data before inputting it into the model.
A standardized B * N tree `Matrix` with a mean of about 0 and a standard deviation of about 1.
This tree `Matrix` can be the input of [model](@ref basics).
"""
function standardize_tree(tree::AbstractMatrix{<:Real})
data = collect(tree');
function standardize_tree(tree::AbstractMatrix{<:Real})
data = collect(tree')
dt = fit(ZScoreTransform, data, dims=2)
data = StatsBase.transform(dt, data)
replace!(data, NaN=>0)
replace!(data, NaN => 0)
return data
end

"""
plot_clusters(tree::AbstractMatrix{<:Real}, label::Vector{Int64})

Visualize the result of models.

# Arguments
- `tree`: a B * N tree Matrix (each column of tree Matrix is a B-dimensional tree in bipartiton format).
- `label`: an N-length Vector containing predicted labels for each tree. People can use the output of the models.
# Output
A scatter plot showing tree clusters.
"""
function plot_clusters(tree::AbstractMatrix{<:Real}, label::Vector{Int64})
PCA_model = fit(PCA, tree, maxoutdim = 2);
PCA_data = predict(PCA_model,tree)
scatter(PCA_data[1,:], PCA_data[2,:], markersize = 5, color = label)
end

"""
distance(tree::AbstractMatrix{<:Real})

Expand Down
47 changes: 23 additions & 24 deletions test/test_helper.jl
Original file line number Diff line number Diff line change
Expand Up @@ -5,40 +5,39 @@ include("../src/helper.jl")
tree = split_weight(trees, 4)
N = size(tree)[1]
B = size(tree)[2]
@test !(mean(tree[1,:]) ≈ 0)
@test !(mean(tree[1, :]) ≈ 0)
tree = standardize_tree(tree)
@test size(tree) == (B,N)
@test [std(tree[i,:]) for i in 1:7] ≈ [1,1,1,1,1,0,1] atol=0.1
@test [mean(tree[i,:]) for i in 1:7] ≈ [0,0,0,0,0,0,0] atol=0.1
@test size(tree) == (B, N)
@test [std(tree[i, :]) for i in 1:7] ≈ [1, 1, 1, 1, 1, 0, 1] atol = 0.1
@test [mean(tree[i, :]) for i in 1:7] ≈ [0, 0, 0, 0, 0, 0, 0] atol = 0.1

trees = readMultiTopology("file/8-taxon-tree.trees")
tree = split_weight(trees, 8)
N = size(tree)[1]
B = size(tree)[2]
n = num_bipartitions(8)
@test !(mean(tree[1,:]) ≈ 0)
@test !(mean(tree[1, :]) ≈ 0)
tree = standardize_tree(tree)
@test size(tree) == (B,N)
@test [std(tree[i,:]) for i in 1:8] ≈ repeat([1], outer = 8) atol=0.1
@test [mean(tree[i,:]) for i in 1:n] ≈ repeat([0], outer = n) atol=0.1
@test size(tree) == (B, N)
@test [std(tree[i, :]) for i in 1:8] ≈ repeat([1], outer=8) atol = 0.1
@test [mean(tree[i, :]) for i in 1:n] ≈ repeat([0], outer=n) atol = 0.1

trees = readMultiTopology("file/16-taxon-tree.trees")
tree = split_weight(trees, 16)
N = size(tree)[1]
B = size(tree)[2]
n = num_bipartitions(16)
@test !(mean(tree[1,:]) ≈ 0)
@test !(mean(tree[1, :]) ≈ 0)
tree = standardize_tree(tree)
@test size(tree) == (B,N)
@test [std(tree[i,:]) for i in 1:16] ≈ repeat([1], outer = 16) atol=0.1
@test [mean(tree[i,:]) for i in 1:n] ≈ repeat([0], outer = n) atol=0.1
@test size(tree) == (B, N)
@test [std(tree[i, :]) for i in 1:16] ≈ repeat([1], outer=16) atol = 0.1
@test [mean(tree[i, :]) for i in 1:n] ≈ repeat([0], outer=n) atol = 0.1
end

@testset "visualization" begin
trees = readMultiTopology("file/4-taxon-tree.trees")
tree = split_weight(trees, 4)
label = [1,2,1,2]
@test_logs plot_clusters(tree, label)
label = [1, 2, 1, 2]
end

@testset "Euclidean distance matrix" begin
Expand All @@ -47,23 +46,23 @@ include("../src/helper.jl")
tree = standardize_tree(tree)
matrix = distance(tree)
@test size(matrix)[1] == size(matrix)[2]
@test [matrix[i,i] for i in 1:size(matrix)[1]] == repeat([0], outer = size(matrix)[1])
@test [matrix[i, i] for i in 1:size(matrix)[1]] == repeat([0], outer=size(matrix)[1])
@test matrix ≈ [
0.0 5.096169006818051 5.40722180738128 5.625211680819353;
5.096169006818051 0.0 4.37996068613932 5.896362019739273;
5.40722180738128 4.37996068613932 0.0 6.260740103673938;
5.625211680819353 5.896362019739273 6.260740103673938 0.0] atol=0.001
0.0 5.096169006818051 5.40722180738128 5.625211680819353;
5.096169006818051 0.0 4.37996068613932 5.896362019739273;
5.40722180738128 4.37996068613932 0.0 6.260740103673938;
5.625211680819353 5.896362019739273 6.260740103673938 0.0] atol = 0.001

trees = readMultiTopology("file/16-taxon-tree.trees")
tree = split_weight(trees, 16)
tree = standardize_tree(tree)
matrix = distance(tree)
@test size(matrix)[1] == size(matrix)[2]
@test [matrix[i,i] for i in 1:size(matrix)[1]] == repeat([0], outer = size(matrix)[1])
@test [matrix[i, i] for i in 1:size(matrix)[1]] == repeat([0], outer=size(matrix)[1])
@test matrix ≈ [
0.0 11.3979509761689 11.549754911257478 11.358576754022506;
11.3979509761689 0.0 11.21122183718288 11.930804141875372;
11.549754911257478 11.21122183718288 0.0 11.473318029384393;
11.358576754022506 11.930804141875372 11.473318029384393 0.0] atol=0.001
0.0 11.3979509761689 11.549754911257478 11.358576754022506;
11.3979509761689 0.0 11.21122183718288 11.930804141875372;
11.549754911257478 11.21122183718288 0.0 11.473318029384393;
11.358576754022506 11.930804141875372 11.473318029384393 0.0] atol = 0.001
end
end
Loading