-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mzid plugin #102
Add mzid plugin #102
Conversation
WalkthroughThe pull request introduces updates across several workflow files and code modules for a Python application. Key changes include upgrading GitHub Actions to newer versions in multiple workflow files, adding a new command-line parameter Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR-Agent was enabled for this repository. To continue using it, please link your git user with your CodiumAI identity here. PR Code Suggestions ✨Explore these optional code suggestions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (3)
.github/workflows/python-package.yml (1)
Line range hint 1-41
: Consider these workflow improvements.
While the current workflow is functional, here are some suggestions to enhance it:
- Add caching for pip dependencies to speed up builds
- Consider adding Python 3.10 to the test matrix for better version coverage
- Add test coverage reporting
Here's how you can implement these improvements:
strategy:
fail-fast: false
matrix:
- python-version: [3.8, 3.9, 3.11]
+ python-version: [3.8, 3.9, 3.10, 3.11]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
+ - name: Cache pip packages
+ uses: actions/cache@v3
+ with:
+ path: ~/.cache/pip
+ key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
+ restore-keys: |
+ ${{ runner.os }}-pip-
- name: Install dependencies
run: |
python -m pip install --upgrade pip
- python -m pip install flake8 pytest
+ python -m pip install flake8 pytest pytest-cov
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
- name: Test with pytest
run: |
python setup.py install
- python tests/test_proteomicslfq.py
+ pytest tests/test_proteomicslfq.py --cov=. --cov-report=xml
+ - name: Upload coverage reports
+ uses: codecov/codecov-action@v3
+ with:
+ fail_ci_if_error: true
pmultiqc/main.py (1)
46-51
: Consider documentation and consistency improvements.
- Consider adding comments explaining the purpose of these new patterns, similar to how some other MultiQC modules document their patterns.
- For consistency with the
quantms/exp_design
andquantms/diann_report
patterns, consider explicitly setting theshared
flag.
Here's a suggested improvement:
if 'quantms/mgf' not in config.sp:
- config.update_dict(config.sp, {'quantms/mgf': {'fn': '*.mgf', 'num_lines': 0}})
+ config.update_dict(config.sp, {'quantms/mgf': {'fn': '*.mgf', 'num_lines': 0, 'shared': False}})
if 'quantms/mzid' not in config.sp:
- config.update_dict(config.sp, {'quantms/mzid': {'fn': '*.mzid', 'num_lines': 0}})
+ # Pattern for mzIdentML files containing peptide/protein identification results
+ config.update_dict(config.sp, {'quantms/mzid': {'fn': '*.mzid', 'num_lines': 0, 'shared': False}})
README.md (1)
36-36
: Enhance documentation for the mzid plugin feature.
The documentation for the new --mzid_plugin
parameter needs more detail to match the thoroughness of other features. Consider adding:
- A brief explanation of mzid/mgf file formats and their purpose
- Expected file structure and location requirements
- Example usage with sample command
- Any prerequisites or dependencies specific to mzid processing
- Description of the generated report contents and metrics
- Links to example reports (similar to other features)
This will help users understand and effectively use this new feature.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (8)
- .github/workflows/python-app.yml (4 hunks)
- .github/workflows/python-package.yml (1 hunks)
- .github/workflows/python-publish.yml (1 hunks)
- README.md (1 hunks)
- pmultiqc/cli.py (1 hunks)
- pmultiqc/main.py (1 hunks)
- pmultiqc/modules/quantms/quantms.py (14 hunks)
- setup.py (1 hunks)
🧰 Additional context used
🪛 Ruff
pmultiqc/modules/quantms/quantms.py
1151-1154: Use ternary operator enzyme_name = list(enzyme.keys())[0] if enzyme else 'Trypsin'
instead of if
-else
-block
Replace if
-else
-block with enzyme_name = list(enzyme.keys())[0] if enzyme else 'Trypsin'
(SIM108)
🔇 Additional comments (9)
.github/workflows/python-publish.yml (2)
24-24
: LGTM! Good job upgrading to the latest GitHub Actions versions.
The upgrades to actions/checkout@v4
and actions/setup-python@v4
improve security and reliability while maintaining backward compatibility.
Also applies to: 26-26
24-24
: Consider pinning exact versions for better reproducibility.
While using major versions (v4) works, consider pinning to exact versions (using SHA or full version number) to ensure consistent behavior across builds.
Example:
- - uses: actions/checkout@v4
+ - uses: actions/checkout@8ade135a41bc03ea155e62e844d188df1ea18608 # v4.1.0
- uses: actions/setup-python@v4
+ uses: actions/setup-python@65d7f2d534ac1bc67fcd62888c5f4f3d2cb2b236 # v4.7.1
Also applies to: 26-26
✅ Verification successful
Action required: Update GitHub Actions to use pinned versions for better reproducibility
The verification confirms that both actions have recent releases. Here's the recommended update using the latest stable versions:
- - uses: actions/checkout@v4
+ - uses: actions/checkout@b4ffde65f46336ab88eb53be808477a3936bae11 # v4.2.2
- uses: actions/setup-python@v4
+ uses: actions/setup-python@0a5c61591373683505ea898e09a3ea4f39ef2b9c # v5.3.0
• .github/workflows/python-publish.yml
: Update both actions to use SHA-pinned versions
actions/checkout
: Pin to v4.2.2 (released 2024-10-23)actions/setup-python
: Pin to v5.3.0 (released 2024-10-24)
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the latest versions and any known issues
# Test: Check the latest releases and their dates
gh release list --repo actions/checkout --limit 5
gh release list --repo actions/setup-python --limit 5
Length of output: 479
.github/workflows/python-package.yml (1)
22-22
: LGTM! Good job upgrading the GitHub Actions.
The upgrade to v4 for both actions/checkout
and actions/setup-python
actions is a positive change that brings:
- Enhanced security features
- Improved performance
- Better compatibility with GitHub's latest features
Also applies to: 24-24
pmultiqc/cli.py (1)
38-38
: LGTM! The implementation follows the established patterns.
The new --mzid_plugin
option is correctly implemented using the click library and follows the same pattern as other options in the file.
.github/workflows/python-app.yml (2)
18-18
: LGTM! Good upgrade to latest GitHub Actions versions.
The upgrade of actions/checkout
and actions/setup-python
to v4 brings important security fixes and performance improvements. The changes are also consistently applied across other workflow files.
Also applies to: 20-20
44-44
: Verify artifact upload size limits with v4.
The upgrade from actions/upload-artifact@v1
to v4
is a significant improvement. However, since this workflow handles multiple datasets (LFQ, TMT, DIA), please ensure that the generated artifacts stay within GitHub's storage limits.
Let's check the typical size of these artifacts:
Also applies to: 56-56, 68-68
pmultiqc/main.py (1)
46-51
: LGTM! The new search patterns are correctly implemented.
The addition of .mgf
and .mzid
file patterns follows the existing pattern structure and aligns well with the plugin's objectives.
setup.py (2)
Line range hint 22-40
: Consider updating package dependencies for mzID support.
While pyteomics is already included for general proteomics file handling, you might want to specify a minimum version requirement if specific features are needed for mzID processing.
Let's check the pyteomics version requirements in similar projects:
#!/bin/bash
# Description: Check pyteomics version requirements in similar projects
# Expected: Find common version constraints for mzID support
# Search for pyteomics in requirements files
rg -g "requirements*.txt" -g "setup.py" "pyteomics"
55-56
: LGTM! Entry point addition follows established patterns.
The new mzid_plugin entry point is correctly added to the MultiQC CLI options, following the same pattern as other CLI options in the file.
Let's verify the CLI implementation:
✅ Verification successful
CLI implementation for mzid_plugin is properly defined
The mzid_plugin entry point is correctly implemented in pmultiqc/cli.py
as a Click option with appropriate help text. The implementation matches the entry point defined in setup.py:
- Defined as
mzid_plugin = click.option('--mzid_plugin', 'mzid_plugin', is_flag=True, help="Extract mzIdentML")
- Follows the same pattern as other CLI options in the file
- Help text clearly describes its purpose
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the mzid_plugin CLI option implementation
# Expected: Find the corresponding CLI option implementation in cli.py
# Search for the mzid_plugin implementation
ast-grep --pattern 'def mzid_plugin'
# Verify it's properly decorated as a click option
rg -B 2 'def mzid_plugin'
Length of output: 249
Script:
#!/bin/bash
# Let's try a different approach to find the CLI implementation
# Search for mzid_plugin in Python files
rg -t python "mzid_plugin"
# Look for files in the pmultiqc directory
fd -t f . pmultiqc
# Check content of cli.py if it exists
fd -t f "cli.py" pmultiqc --exec cat {}
Length of output: 2875
@@ -35,3 +35,4 @@ def print_version(ctx, params, value): | |||
help='Prefix (default) or suffix') | |||
disable_plugin = click.option('--disable_plugin', 'disable_plugin', is_flag=True, | |||
help="Disable the pmultiqc plugin on this run") | |||
mzid_plugin = click.option('--mzid_plugin', 'mzid_plugin', is_flag=True, help="Extract mzIdentML") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider enhancing the help message and adding configuration options.
The current help message "Extract mzIdentML" could be more descriptive. Additionally, users might benefit from configuration options for the mzIdentML extraction process.
Consider updating the implementation:
-mzid_plugin = click.option('--mzid_plugin', 'mzid_plugin', is_flag=True, help="Extract mzIdentML")
+mzid_plugin = click.option(
+ '--mzid_plugin',
+ 'mzid_plugin',
+ is_flag=True,
+ help="Enable mzIdentML file processing to extract peptide/protein identifications and generate corresponding reports"
+)
You might also want to consider adding related configuration options such as:
--mzid-score-threshold
: Threshold for filtering identifications--mzid-confidence-level
: Confidence level for peptide/protein matches
Would you like me to propose a complete implementation with these additional options?
Committable suggestion was skipped due to low confidence.
if config.kwargs.get('mzid_plugin', False): | ||
|
||
self.mzid_peptide_map = dict() | ||
self.ms_without_psm = dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential Bug: Inconsistent Type for self.ms_without_psm
At line 117, self.ms_without_psm
is initialized as a dictionary, but it is later used as a list in methods such as parse_out_mgf()
and parse_mzml()
, where append()
is called on it. This will result in an AttributeError
since dictionaries do not have an append
method. Please ensure that self.ms_without_psm
is consistently used as a list throughout the code.
Apply this diff to fix the issue:
- self.ms_without_psm = dict()
+ self.ms_without_psm = []
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self.ms_without_psm = dict() | |
self.ms_without_psm = [] |
enzyme = next(enzyme_iter).get("EnzymeName", None) | ||
if enzyme: | ||
enzyme_name = list(enzyme.keys())[0] | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Simplify enzyme_name
assignment using a ternary operator
At lines 1150-1153, the assignment of enzyme_name
can be condensed using a ternary operator for better readability.
Apply this diff to simplify the code:
- if enzyme:
- enzyme_name = list(enzyme.keys())[0]
- else:
- enzyme_name = "Trypsin"
+ enzyme_name = list(enzyme.keys())[0] if enzyme else "Trypsin"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
enzyme = next(enzyme_iter).get("EnzymeName", None) | |
if enzyme: | |
enzyme_name = list(enzyme.keys())[0] | |
else: | |
enzyme = next(enzyme_iter).get("EnzymeName", None) | |
enzyme_name = list(enzyme.keys())[0] if enzyme else "Trypsin" |
User description
Plugin Update:generate reports using mzid and mgf (mzML) data.
PR Type
enhancement, documentation
Description
--mzid_plugin
to enable mzid data extraction.--mzid_plugin
option in the README.PRDescriptionHeader.CHANGES_WALKTHROUGH
cli.py
Add `mzid_plugin` command-line option
pmultiqc/cli.py
--mzid_plugin
.main.py
Enhance plugin to support mzid and mgf data
pmultiqc/main.py
calculation.
quantms.py
Integrate mzid and mgf data processing in quantms module
pmultiqc/modules/quantms/quantms.py
setup.py
Add `mzid_plugin` entry point in setup
setup.py
mzid_plugin
entry point.python-app.yml
Update GitHub Actions to latest versions
.github/workflows/python-app.yml
python-package.yml
Update GitHub Actions to latest versions
.github/workflows/python-package.yml
python-publish.yml
Update GitHub Actions to latest versions
.github/workflows/python-publish.yml
README.md
Document `mzid_plugin` option in README
README.md
--mzid_plugin
option.Summary by CodeRabbit
Release Notes
New Features
--mzid_plugin
command-line option for thepmultiqc
tool to generate reports from mzid and mzML/mgf files.QuantMSModule
for processing additional file formats (MGF and mzid).Improvements
Documentation
README.md
to reflect the new command-line parameter for enhanced user guidance.