Universe data frames normalization #8385

jhonabreul · 2024-10-28T17:12:17Z

Description

Universe data frames normalization by adding some generalization to the Pandas DataFrame conversion logic (PandasData and PandasConverter) to handle multiple data types as well as base data collections, which are the containers for universe data.

Previous behaviour/structure:

Before these changes the universe dataframes (and generically dataframes created from BaseDataCollection instances) just held the list of values (universe constituents, for instance) in a cell of the data frame. So users access that cell and handle the list and its items as regular instances without any Pandas logic.

New behaviour/structure:

Data frames holding unvierse data now expand the constituents data into data frame rows and columns, so users can use full Pandas logic to manipulate the data.

Examples:

Previous universe data frames contained a list of constituents in master, like for instance for Fundamental data:

symbol                                                           time      
FUNDAMENTALUNIVERSE-USA-A4D299C6-CE93-4274-81AB-18E65099970A 2T  2014-03-25    [A RPTMYV3VC57P: ¤55.26, AA R735QTJ8XC9X: ¤12....
                                                                 2014-03-26    [A RPTMYV3VC57P: ¤55.16, AA R735QTJ8XC9X: ¤12....
FUNDAMENTALUNIVERSE-USA-B1856E19-46E1-4D43-AE78-3D533A330D3C 2T  2014-03-25    [A RPTMYV3VC57P: ¤55.26, AA R735QTJ8XC9X: ¤12....
                                                                 2014-03-26    [A RPTMYV3VC57P: ¤55.16, AA R735QTJ8XC9X: ¤12....

With this changes, data frames look like the following:

Fundamental:

Sample generation code:

universe = qb.add_universe(selection_function)
df = qb.history(universe .data_type, [universe.symbol], timedelta(days=2))

                                adjustedprice  assetclassification  companyprofile  companyreference  dollarvolume  earningratios  earningreports  financialstatements  hasfundamentaldata market  marketcap  operationratios  pricefactor  pricescalefactor  securityreference  splitfactor  valuationratios    value    volume
time       symbol                                                                                                                                                                                                                                                                                                               
2014-03-25 A RPTMYV3VC57P          36.995144  AssetClassification  CompanyProfile  CompanyReference   109624898.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.669474          0.669474  SecurityReference       1.0000   ValuationRatios    55.26   1983802
           AA R735QTJ8XC9X         19.364249  AssetClassification  CompanyProfile  CompanyReference   343727136.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.537394          1.612344  SecurityReference       3.0003   ValuationRatios    12.01  28620078
           AAC RVY73T1OH55X      1200.000000  AssetClassification  CompanyProfile  CompanyReference       27454.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     1.000000       1000.000000  SecurityReference    1000.0000   ValuationRatios     1.20     22879
           ...
2014-03-26 A RPTMYV3VC57P          36.928197  AssetClassification  CompanyProfile  CompanyReference    95018340.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.669474          0.669474  SecurityReference       1.0000   ValuationRatios    55.16   1722595
           AA R735QTJ8XC9X         19.380373  AssetClassification  CompanyProfile  CompanyReference   255898419.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.537394          1.612344  SecurityReference       3.0003   ValuationRatios    12.02  21289386
           AAC RVY73T1OH55X      1200.000000  AssetClassification  CompanyProfile  CompanyReference        7730.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     1.000000       1000.000000  SecurityReference    1000.0000   ValuationRatios     1.20      6442
           ...

Sample code generation (multiple symbols):

universe = qb.add_universe(selection_function)
universe2 = qb.add_universe(selection_function2)
df = qb.history(universe .data_type, [universe.symbol, universe2.symbol], timedelta(days=2))

                                                                                                adjustedprice  assetclassification  companyprofile  companyreference  dollarvolume  earningratios  earningreports  financialstatements  hasfundamentaldata market  marketcap  operationratios  pricefactor  pricescalefactor  securityreference  splitfactor  valuationratios    value    volume
collection_symbol                                               time       symbol                                                                                                                                                                                                                                                                                                                         
FUNDAMENTALUNIVERSE-USA-372FA8E2-27FE-4025-B9E7-3D3D2966BAA8 2T 2014-03-25 A RPTMYV3VC57P          36.995144  AssetClassification  CompanyProfile  CompanyReference   109624898.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.669474          0.669474  SecurityReference       1.0000   ValuationRatios    55.26   1983802
                                                                           AA R735QTJ8XC9X         19.364249  AssetClassification  CompanyProfile  CompanyReference   343727136.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.537394          1.612344  SecurityReference       3.0003   ValuationRatios    12.01  28620078
                                                                           AAC RVY73T1OH55X      1200.000000  AssetClassification  CompanyProfile  CompanyReference       27454.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     1.000000       1000.000000  SecurityReference    1000.0000   ValuationRatios     1.20     22879
                                                                           AADR UOFRJAZTL0TH       35.936785  AssetClassification  CompanyProfile  CompanyReference       85159.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.960877          0.960877  SecurityReference       1.0000   ValuationRatios    37.40      2277
                                                                           ...
                                                                2014-03-26 A RPTMYV3VC57P          36.928197  AssetClassification  CompanyProfile  CompanyReference    95018340.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.669474          0.669474  SecurityReference       1.0000   ValuationRatios    55.16   1722595
                                                                           AA R735QTJ8XC9X         19.380373  AssetClassification  CompanyProfile  CompanyReference   255898419.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.537394          1.612344  SecurityReference       3.0003   ValuationRatios    12.02  21289386
                                                                           AAC RVY73T1OH55X      1200.000000  AssetClassification  CompanyProfile  CompanyReference        7730.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     1.000000       1000.000000  SecurityReference    1000.0000   ValuationRatios     1.20      6442
                                                                           AADR UOFRJAZTL0TH       36.167643  AssetClassification  CompanyProfile  CompanyReference      140060.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.961905          0.961905  SecurityReference       1.0000   ValuationRatios    37.60      3725
                                                                           AAIT V3Z1GERP1JZ9       30.585693  AssetClassification  CompanyProfile  CompanyReference      137754.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.987591          0.987591  SecurityReference       1.0000   ValuationRatios    30.97      4448
                                                                           ...
FUNDAMENTALUNIVERSE-USA-7642EE68-2A85-4E92-BE13-2C0BCA3DB6BA 2T 2014-03-25 A RPTMYV3VC57P          36.995144  AssetClassification  CompanyProfile  CompanyReference   109624898.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.669474          0.669474  SecurityReference       1.0000   ValuationRatios    55.26   1983802
                                                                           AA R735QTJ8XC9X         19.364249  AssetClassification  CompanyProfile  CompanyReference   343727136.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.537394          1.612344  SecurityReference       3.0003   ValuationRatios    12.01  28620078
                                                                           AAC RVY73T1OH55X      1200.000000  AssetClassification  CompanyProfile  CompanyReference       27454.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     1.000000       1000.000000  SecurityReference    1000.0000   ValuationRatios     1.20     22879
                                                                           AADR UOFRJAZTL0TH       35.936785  AssetClassification  CompanyProfile  CompanyReference       85159.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.960877          0.960877  SecurityReference       1.0000   ValuationRatios    37.40      2277
                                                                           ...
                                                                2014-03-26 A RPTMYV3VC57P          36.928197  AssetClassification  CompanyProfile  CompanyReference    95018340.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.669474          0.669474  SecurityReference       1.0000   ValuationRatios    55.16   1722595
                                                                           AA R735QTJ8XC9X         19.380373  AssetClassification  CompanyProfile  CompanyReference   255898419.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.537394          1.612344  SecurityReference       3.0003   ValuationRatios    12.02  21289386
                                                                           AAC RVY73T1OH55X      1200.000000  AssetClassification  CompanyProfile  CompanyReference        7730.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     1.000000       1000.000000  SecurityReference    1000.0000   ValuationRatios     1.20      6442
                                                                           AADR UOFRJAZTL0TH       36.167643  AssetClassification  CompanyProfile  CompanyReference      140060.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.961905          0.961905  SecurityReference       1.0000   ValuationRatios    37.60      3725
                                                                           AAIT V3Z1GERP1JZ9       30.585693  AssetClassification  CompanyProfile  CompanyReference      137754.0  EarningRatios  EarningReports  FinancialStatements                True    usa          0  OperationRatios     0.987591          0.987591  SecurityReference       1.0000   ValuationRatios    30.97      4448
                                                                           ...

Option universe:

Sample generation code (single option):

option = qb.add_option("GOOG").symbol
df = qb.history(option, 3)

                                                     close     delta     gamma     high  impliedvolatility      low     open  openinterest        rho      theta    value      vega  volume
time       symbol                                                                                                                                                                                     
2015-12-23 GOOCV W6NBKM1DB86E|GOOCV VP83T1ZUHROL   149.800  0.929556  0.001122  154.600           3.062114  146.000  147.600           0.0   0.014881  -8.121837  149.800  0.052964     0.0
           GOOCV W6NBKPDXGXUU|GOOCV VP83T1ZUHROL    17.000  0.999997  0.000000   17.000           0.000000   17.000   17.000         162.0   0.020136  -0.017799   17.000  0.000000     6.0
           GOOCV W6NBLJW298LI|GOOCV VP83T1ZUHROL    12.980  0.572801  0.002791   16.900           3.580014   12.980   16.900          62.0   0.010071 -27.574191   12.980  0.153996     4.0
           GOOCV W6NBKMCM3UDI|GOOCV VP83T1ZUHROL    10.980  0.999997  0.000000   14.850           0.000000    8.400   13.000         322.0   0.020273  -0.017936   10.980  0.000000   111.0
           GOOCV W6NBLK4BXZ2E|GOOCV VP83T1ZUHROL     8.500  0.999997  0.000000   12.970           0.000000    6.690   10.900         164.0   0.020342  -0.018005    8.500  0.000000    28.0
           GOOCV W6NBKPFL0ACM|GOOCV VP83T1ZUHROL     6.800  0.999997  0.000000   10.850           0.000000    5.200    9.000        1604.0   0.020410  -0.018073    6.800  0.000000   295.0
           GOOCV W6NBLKCLMPJA|GOOCV VP83T1ZUHROL     5.000  0.999997  0.000000    8.500           0.000000    3.700    7.100         428.0   0.020479  -0.018142    5.000  0.000000   128.0
           GOOCV W6NBKMCS26FA|GOOCV VP83T1ZUHROL     3.500  0.999997  0.000000    6.910           0.000000    2.600    5.410         555.0   0.020547  -0.018210    3.500  0.000000   531.0
           GOOCV W6NBLKKVBG06|GOOCV VP83T1ZUHROL     2.360  0.999997  0.000000    5.350           0.000000    1.700    4.200         321.0   0.020616  -0.018279    2.360  0.000000   167.0
           ...
2015-12-24 GOOCV W8Z0KNIXIEZQ|GOOCV VP83T1ZUHROL    39.400  0.577688  0.001540   39.920           0.701668   38.300   39.920          56.0   0.767563  -0.592345   39.400  1.416528     4.0
           GOOCV WBGM95TAH2LI|GOOCV VP83T1ZUHROL    57.000  0.563698  0.005025   58.000           0.150417   55.600   58.000         229.0   1.868329  -0.096925   57.000  2.050616   105.0
           GOOCV WHEA9G6X2SFA|GOOCV VP83T1ZUHROL    96.000  0.592247  0.005960   96.000           0.083629   96.000   96.000         130.0   4.435962  -0.042009   96.000  3.017564     0.0
           GOOCV W6NBLKCLMPJA|GOOCV VP83T1ZUHROL     3.500  0.000000  0.000000    5.180           0.000000    1.600    5.180         483.0   0.000000   0.000000    3.500  0.000000   411.0
           GOOCV W6U7Q7WS5ZNQ|GOOCV VP83T1ZUHROL     8.600  0.538894  0.003811   11.000           1.002770    6.740   10.200         146.0   0.069310  -2.963558    8.600  0.412546   121.0
           ...

Sample generation code (multiple symbols):

aapl = qb.add_option("AAPL").symbol
twx = qb.add_option("TWX").symbol
df = qb.history([aapl, twx], 3)

                                                             close     delta     gamma     high  impliedvolatility      low     open  openinterest       rho     theta   value      vega  volume
canonical time       symbol                                                                                                                                                                                
?AAPL     2014-06-05 AAPL VXBK532KDV3A|AAPL R735QTJ8XC9X    0.000  0.000000  0.000000    0.000           0.000000    0.000    0.000           0.0  0.000000  0.000000    0.000  0.000000     0.0
                     AAPL VXBK53EB3MLI|AAPL R735QTJ8XC9X    0.000  0.000000  0.000000    0.000           0.000000    0.000    0.000           0.0  0.000000  0.000000    0.000  0.000000     0.0
                     AAPL VXBK53Q7RQ5I|AAPL R735QTJ8XC9X    0.000  0.000000  0.000000    0.000           0.000000    0.000    0.000           0.0  0.000000  0.000000    0.000  0.000000     0.0
                     ...
          2014-06-06 AAPL VXBK4R62CXGM|AAPL R735QTJ8XC9X  451.950  0.762263  0.000134  451.950           2.088176  451.950  451.950         412.0  0.502097  0.145360  451.950  0.595591     1.0
                     AAPL VXBK4QA5EM92|AAPL R735QTJ8XC9X  431.450  0.824429  0.000000  431.450           0.000000  431.450  431.450          58.0  1.227190  0.453852  431.450  0.000000     0.0
                     AAPL VXBK4R7PW9YE|AAPL R735QTJ8XC9X  427.050  0.824429  0.000000  427.950           0.000000  427.050  427.950          24.0  1.257870  0.453750  427.050  0.000000     0.0
                     ...
          2014-06-07 AAPL VXBK532KDV3A|AAPL R735QTJ8XC9X    0.000  0.000000  0.000000    0.000           0.000000    0.000    0.000           0.0  0.000000  0.000000    0.000  0.000000     0.0
                     AAPL VXBK53EB3MLI|AAPL R735QTJ8XC9X    0.000  0.000000  0.000000    0.000           0.000000    0.000    0.000           0.0  0.000000  0.000000    0.000  0.000000     0.0
                     AAPL VXBK53Q7RQ5I|AAPL R735QTJ8XC9X    0.000  0.000000  0.000000    0.000           0.000000    0.000    0.000           0.0  0.000000  0.000000    0.000  0.000000     0.0
                     ...
?TWX      2014-06-05 AOL VXBK4QDMBCYU|AOL R735QTJ8XC9X     47.425  0.795412  0.001168   48.225           2.212209   46.625   47.775           0.0  0.053618  0.003416   47.425  0.068494     0.0
                     AOL VXBK4QDY812E|AOL R735QTJ8XC9X     45.550  0.786325  0.001384   46.250           2.059030   44.600   45.700           0.0  0.061260  0.002084   45.550  0.075528     0.0
                     AOL VXBK4QEG317Q|AOL R735QTJ8XC9X     42.650  0.772118  0.001749   43.250           1.851420   41.675   43.000           0.0  0.073013  0.000619   42.650  0.085860     0.0
                     ...
          2014-06-06 AOL VXBK4QDMBCYU|AOL R735QTJ8XC9X     47.975  0.771128  0.000932   48.775           2.580519   46.475   47.425           0.0  0.041194  0.010063   47.975  0.061716     0.0
                     AOL VXBK4QDY812E|AOL R735QTJ8XC9X     46.025  0.761106  0.001143   46.775           2.376114   44.500   45.550           0.0  0.048835  0.008022   46.025  0.069718     0.0
                     AOL VXBK4QEG317Q|AOL R735QTJ8XC9X     43.025  0.745576  0.001499   43.775           2.111186   41.500   42.650           0.0  0.060542  0.005759   43.025  0.081252     0.0
                     ...
          2014-06-07 AOL VXBK4QDMBCYU|AOL R735QTJ8XC9X     47.875  0.778429  0.000778   49.425           2.928487   46.950   47.975           0.0  0.031711  0.008914   47.875  0.053884     0.0
                     AOL VXBK4QDY812E|AOL R735QTJ8XC9X     45.875  0.767125  0.000997   47.425           2.667908   44.950   46.025           0.0  0.039258  0.006034   45.875  0.062863     0.0
                     AOL VXBK4QEG317Q|AOL R735QTJ8XC9X     42.875  0.750012  0.001355   44.425           2.352795   41.950   43.025           0.0  0.050512  0.002875   42.875  0.075372     0.0
                     ...

ETF Universe:

Sample generation code:

spy = qb.add_equity("SPY").symbol
universe = qb.add_universe(qb.universe.etf(spy, qb.universe_settings, filter_etfs))
history = qb.history(universe, 2)

                              lastupdate period   sharesheld  value    weight
time       symbol                                                           
2020-12-01 A RPTMYV3VC57P    2020-11-27 1 days    3313891.0    0.0  0.001170
           AAL VM9RIYHM8ACL  2020-11-27 1 days    5331000.0    0.0  0.000247
           AAP SA48O8J43YAT  2020-11-27 1 days     741951.0    0.0  0.000344
           AAPL R735QTJ8XC9X 2020-11-27 1 days  172212400.0    0.0  0.062138
           ABBV VCY032R250MD 2020-11-27 1 days   18911224.0    0.0  0.006139
           AAS R735QTJ8XC9X  2020-11-27 1 days    1575257.0    0.0  0.000503
           ABMD R735QTJ8XC9X 2020-11-27 1 days     480480.0    0.0  0.000404
           ABT R735QTJ8XC9X  2020-11-27 1 days   18974188.0    0.0  0.006320
           ACN S6HA7SVNXLYD  2020-11-27 1 days    6818158.0    0.0  0.005278
           ADBE R735QTJ8XC9X 2020-11-27 1 days    5140614.0    0.0  0.007589
           ...

Now Lean has new custom properties to indicate the Pandas converter what to do with certain classes or properties:

PandasColum can be used to set the name that should be used for the data frame column for the property or field.
PandasIgnore can be used to make sure a property or field is not added to the data frame.
PandasIgnoreMembers can be used to make sure that when a property or field is an instance of the class and íts members are tried to be expanded as columns, they are all ignored.
PandasNonExapandable can be used to request that the propery or field's members are not exanded into columns, but the instance itself is added as a single column cell.

Other changes:

Generalization for new and custom data types.
Generalization for existing Lean common types (quotes, trades, ticks).

Related Issue

N/A

Motivation and Context

N/A

Requires Documentation Change

Requires documenting new data frames

How Has This Been Tested?

Unit tests
Regrestion algorithms
Existing test suite

Types of changes

Bug fix (non-breaking change which fixes an issue)
Refactor (non-breaking change which improves implementation)
Performance (non-breaking change which improves performance. Please add associated performance test and results)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Non-functional change (xml comments/documentation/etc)

Checklist:

My code follows the code style of this project.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.
My branch follows the naming convention bug-<issue#>-<description> or feature-<issue#>-<description>

Universe and (generically BaseDataCollection) data frames are not normalize and unpacked into a data frame, instead of just creating data frames with the universe lists within it

Allow and handle duplicate names

…ta frames

This allows users to decide whether they want fully expanded dataframes for universe and other collection data types. Else, master behavior is kept

jhonabreul requested a review from Martin-Molinero October 28, 2024 17:12

jhonabreul added 23 commits November 4, 2024 18:43

Normalize universe data frames

6bb24b0

Universe and (generically BaseDataCollection) data frames are not normalize and unpacked into a data frame, instead of just creating data frames with the universe lists within it

Fix unit tests and algorithms to expecte new universe dataframe format

6c15aad

Fixes

2b8690b

Add PandasConverter.DataFrameGenerator class

b64a614

Pandas data frame generator class fixes

4b2021f

Add comments

d396729

Housekeeping

b043dad

Add attributes to mark classes and properties for pandas processing

d160819

Improve pandas properties expanding

e30a70e

Allow and handle duplicate names

Use PandasData generalization for Lean common data types

b8f84b8

Add points time as column when converting base data collections to da…

94a989a

…ta frames

Cleanup and minor changes

df82784

Minor change

8cd0b94

Pandas data to get type members on demand

dde82ca

Move Pandas helper classes to their own files

07373ee

Minor changes

5007792

Add flatten argument to python history api

edb87f3

This allows users to decide whether they want fully expanded dataframes for universe and other collection data types. Else, master behavior is kept

Adding missing changes to last commit

62080b5

Update Pythonnet version to 2.0.40

41cd3ff

Add flattent argument to algorithm's OptionChain api

b715882

Minor changes

8d14f5a

Housekeeping

eae30eb

Minor changes

37cb9d3

jhonabreul force-pushed the feature-universe-dataframes branch from 0263ecf to 37cb9d3 Compare November 5, 2024 14:25

jhonabreul added 3 commits November 5, 2024 12:43

Bug fix skipping data collection data points

a5f9988

Add comment

cbd50bb

Set correct exchange time to OptionUniverse instances

9b7e9f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Universe data frames normalization #8385

Universe data frames normalization #8385

jhonabreul commented Oct 28, 2024 •

edited

Loading

Universe data frames normalization #8385

Are you sure you want to change the base?

Universe data frames normalization #8385

Conversation

jhonabreul commented Oct 28, 2024 • edited Loading

Description

Previous behaviour/structure:

New behaviour/structure:

Other changes:

Related Issue

Motivation and Context

Requires Documentation Change

How Has This Been Tested?

Types of changes

Checklist:

jhonabreul commented Oct 28, 2024 •

edited

Loading