-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Universe data frames normalization #8385
Open
jhonabreul
wants to merge
26
commits into
QuantConnect:master
Choose a base branch
from
jhonabreul:feature-universe-dataframes
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Universe data frames normalization #8385
jhonabreul
wants to merge
26
commits into
QuantConnect:master
from
jhonabreul:feature-universe-dataframes
+2,472
−855
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Universe and (generically BaseDataCollection) data frames are not normalize and unpacked into a data frame, instead of just creating data frames with the universe lists within it
Allow and handle duplicate names
This allows users to decide whether they want fully expanded dataframes for universe and other collection data types. Else, master behavior is kept
jhonabreul
force-pushed
the
feature-universe-dataframes
branch
from
November 5, 2024 14:25
0263ecf
to
37cb9d3
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Universe data frames normalization by adding some generalization to the Pandas DataFrame conversion logic (
PandasData
andPandasConverter
) to handle multiple data types as well as base data collections, which are the containers for universe data.Previous behaviour/structure:
Before these changes the universe dataframes (and generically dataframes created from
BaseDataCollection
instances) just held the list of values (universe constituents, for instance) in a cell of the data frame. So users access that cell and handle the list and its items as regular instances without any Pandas logic.New behaviour/structure:
Data frames holding unvierse data now expand the constituents data into data frame rows and columns, so users can use full Pandas logic to manipulate the data.
Examples:
Previous universe data frames contained a list of constituents in master, like for instance for Fundamental data:
With this changes, data frames look like the following:
Fundamental:
Sample generation code:
Sample code generation (multiple symbols):
Option universe:
Sample generation code (single option):
Sample generation code (multiple symbols):
ETF Universe:
Sample generation code:
Now Lean has new custom properties to indicate the Pandas converter what to do with certain classes or properties:
PandasColum
can be used to set the name that should be used for the data frame column for the property or field.PandasIgnore
can be used to make sure a property or field is not added to the data frame.PandasIgnoreMembers
can be used to make sure that when a property or field is an instance of the class and íts members are tried to be expanded as columns, they are all ignored.PandasNonExapandable
can be used to request that the propery or field's members are not exanded into columns, but the instance itself is added as a single column cell.Other changes:
Related Issue
N/A
Motivation and Context
N/A
Requires Documentation Change
Requires documenting new data frames
How Has This Been Tested?
Types of changes
Checklist:
bug-<issue#>-<description>
orfeature-<issue#>-<description>