- This is the source code used for experiments for the research paper "Clustering Approaches for Global Minimum Variance Portfolio"
- The academic paper utilizes 'constrained K-means clustering' to group stocks showing similar price movements before performing 'within cluster portfolio optimization'.
- Using raw data without scaling methods and dimensional reduction methods
python main.py --data_period test --max_cluster_size 75 --scaling_method none --dim_reduction_method none
- Using PCA without scaling methods (If
PCA_components
is not specfied, the default number 3 is used)
python main.py --data_period test --max_cluster_size 75 --scaling_method none --dim_reduction_method PCA
- Using t-sne with standard scaling and t-sne components = 10
python main.py --data_period test --max_cluster_size 75 --scaling_method standard_scale --dim_reduction_method tsne --tsne_components 10
data_period
: Daily returns of stocks from validation period or test period (validation or test)- We use validation period to choose the parameters which produces the best portfolio optimization performance.
- Portfolio performance from test period is the true score of the proposed algorithm.
max_cluster_size
: Maximum clustering size allowed for individual clusters (integer numbers)scaling_method
: Whether scaling data to follow a normal distribution or not (standard_scale or none)dim_reduction_method
: Whether reducing dimensionality of 252-long vectors of daily returns of stocks with PCA or T-SNE or not (PCA, tsne or none)PCA_components
: Number of points to embed a 252-long vector using PCA. (If no value is specified, the default value 3 would be used.)tsne_components
: Number of points to embed a 252-long vector using t-sne. (If no value is specified, the default value 3 would be used.)
- Datasets should be downloaded and preprocessed in accordance with instructions in
0. preparing_data.ipynb
, located in data folder.
- Codes are fixed and improved to prevent errors. (For example, global variables are not used anymore.)
- Number of
PCA_components
andtsne_components
can be provided using argparse, which makes it easier to use dimensionality reduction methods. - The library
cudf
is now replaced withsklearn
, due to more ease of use.