Skip to content

Latest commit

 

History

History
40 lines (34 loc) · 2.66 KB

README.md

File metadata and controls

40 lines (34 loc) · 2.66 KB

Clustering Approaches for Global Mininum Variance Portfolio

Example

  • Using raw data without scaling methods and dimensional reduction methods
python main.py --data_period test --max_cluster_size 75 --scaling_method none --dim_reduction_method none
  • Using PCA without scaling methods (If PCA_components is not specfied, the default number 3 is used)
python main.py --data_period test --max_cluster_size 75 --scaling_method none --dim_reduction_method PCA
  • Using t-sne with standard scaling and t-sne components = 10
python main.py --data_period test --max_cluster_size 75 --scaling_method standard_scale --dim_reduction_method tsne --tsne_components 10

Hyper-Parameters

  1. data_period: Daily returns of stocks from validation period or test period (validation or test)
    • We use validation period to choose the parameters which produces the best portfolio optimization performance.
    • Portfolio performance from test period is the true score of the proposed algorithm.
  2. max_cluster_size: Maximum clustering size allowed for individual clusters (integer numbers)
  3. scaling_method : Whether scaling data to follow a normal distribution or not (standard_scale or none)
  4. dim_reduction_method : Whether reducing dimensionality of 252-long vectors of daily returns of stocks with PCA or T-SNE or not (PCA, tsne or none)
  5. PCA_components : Number of points to embed a 252-long vector using PCA. (If no value is specified, the default value 3 would be used.)
  6. tsne_components : Number of points to embed a 252-long vector using t-sne. (If no value is specified, the default value 3 would be used.)

Datasets

  • Datasets should be downloaded and preprocessed in accordance with instructions in 0. preparing_data.ipynb, located in data folder.

Updates as of May 16th, 2021

  1. Codes are fixed and improved to prevent errors. (For example, global variables are not used anymore.)
  2. Number of PCA_components and tsne_components can be provided using argparse, which makes it easier to use dimensionality reduction methods.
  3. The library cudf is now replaced with sklearn, due to more ease of use.