Given an initial graph G₀ and 50 sequences G₁ᵢ, … G₂₀ᵢ :
- We embed all of the graphs using NetLSD into R²⁵⁰.
- We take each generation i ∈ {1, … 20} and compute a 1-dimensional PCA for all of the embedded points for that generation.
- We fit a Gaussian distribution for each generation.
- The idea is that a "good" model would produce points that are tightly clustered around the mean, while a bad model would produce a widely-dispersed point cloud or several clusters.
- We can then summarize these distributions by computing a z-score for each generation.
- We can finally summarize the sequence of z-scores for a given graph-model pair by computing a weighted sum or weighted average of the z-scores across the generations.