Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: block serialisation and addition of tests #47

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ogabrielides
Copy link
Collaborator

@ogabrielides ogabrielides commented Nov 1, 2024

  • Refactored code with new functions: compact_size_len, write_compact_size, read_compact_size, fixed_bitset_len, write_fixed_bitset and read_fixed_bitset
  • Wrote tests for CompactSize and Dynamic Bitset serialisation/deserialisation
  • Fixed test block_test_2

Summary by CodeRabbit

  • New Features

    • Enhanced encoding capabilities with support for compact size integers and fixed-size bitsets.
    • Updated structures for CoinbasePayload and QuorumFinalizationCommitment to improve data representation and processing.
  • Bug Fixes

    • Adjusted test cases to align with new expected sizes and properties in various structures.
  • Documentation

    • Comprehensive test suite added to validate new encoding and decoding functionalities.

Copy link

coderabbitai bot commented Nov 1, 2024

Walkthrough

The pull request introduces several modifications across multiple files, primarily focusing on the structures and encoding processes related to blockchain data. Key changes include updates to the Block, CoinbasePayload, and QuorumFinalizationCommitment structures, altering how certain fields are represented and processed. Notably, hexadecimal values are replaced with boolean types for specific fields. New functions for compact size handling and bitset encoding are also introduced, alongside adjustments to test cases reflecting these changes.

Changes

File Path Change Summary
dash/src/blockdata/block.rs Modified Block and Header structures; changed prevhash and merkle to new hexadecimal values; updated test values for block properties.
dash/src/blockdata/transaction/special_transaction/coinbase.rs Updated CoinbasePayload structure; modified size calculation, encoding/decoding logic, and test cases for expected sizes.
dash/src/blockdata/transaction/special_transaction/quorum_commitment.rs Changed QuorumFinalizationCommitment structure; altered signers and valid_members fields from Vec<u8> to Vec<bool>; updated encoding/decoding methods and test cases.
dash/src/consensus/encode.rs Added functions for compact size and fixed bitset handling; includes new test suite for validation.

Possibly related PRs

  • fix: further block deserialization fixes  #46: The changes in this PR involve modifications to the QuorumFinalizationCommitment structure, which is relevant to the quorum_commitment.rs file. This relates to the main PR's focus on block properties and structures, as both involve updates to data handling and encoding/decoding processes within the blockchain context.

Suggested reviewers

  • QuantumExplorer

🐇 In the land of blocks and chains,
A rabbit hops through data lanes.
With bytes and bits, we redefine,
Our structures sleek, our tests align.
Compact sizes, boolean cheer,
A brighter blockchain future near! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (6)
dash/src/blockdata/transaction/special_transaction/coinbase.rs (1)

Line range hint 126-141: Consider adding edge cases to the test suite.

While the current tests verify basic functionality and the size changes, consider adding test cases for:

  • Maximum value for best_cl_height to verify compact size encoding handles large numbers correctly
  • Different best_cl_height values that would result in different compact size lengths

Example test cases to add:

// Test maximum height
let payload = CoinbasePayload {
    height: 1000,
    version: 3,
    // ... other fields same as current test ...
    best_cl_height: Some(0xFFFFFFFF), // Max u32 value
    // ... remaining fields ...
};

// Test height requiring different compact size lengths
let test_heights = vec![0xFC, 0xFD, 0xFFFF, 0xFFFFFF];
for height in test_heights {
    let payload = CoinbasePayload {
        // ... same fields ...
        best_cl_height: Some(height),
        // ... remaining fields ...
    };
    // ... size assertions ...
}
dash/src/blockdata/transaction/special_transaction/quorum_commitment.rs (2)

74-74: Simplify length calculation for signers

You can replace self.signers.iter().len() with self.signers.len() for clarity and potential performance benefits, as len() is more direct and avoids creating an iterator.

Apply this diff:

 len += write_compact_size(w, self.signers.len() as u32)?;
-len += write_fixed_bitset(w, self.signers.as_slice(), self.signers.iter().len())?;
+len += write_fixed_bitset(w, self.signers.as_slice(), self.signers.len())?;

76-76: Simplify length calculation for valid_members

Similarly, replace self.valid_members.iter().len() with self.valid_members.len() to make the code more concise.

Apply this diff:

 len += write_compact_size(w, self.valid_members.len() as u32)?;
-len += write_fixed_bitset(w, self.valid_members.as_slice(), self.valid_members.iter().len())?;
+len += write_fixed_bitset(w, self.valid_members.as_slice(), self.valid_members.len())?;
dash/src/consensus/encode.rs (3)

1417-1460: Expand Tests to Cover Non-Minimal Encodings in test_compact_size_round_trip

While the test_compact_size_round_trip function tests valid round-trip serialization and deserialization, it doesn't test handling of non-minimal encodings.

Consider adding test cases to verify that read_compact_size correctly rejects non-minimal encodings:

#[test]
fn test_compact_size_non_minimal_encodings() {
    // Non-minimal encoding: value 252 encoded with 3 bytes instead of 1
    let data = vec![0xFD, 0xFC, 0x00];
    let mut cursor = Cursor::new(&data);
    assert!(read_compact_size(&mut cursor).is_err());

    // Non-minimal encoding: value 65535 encoded with 5 bytes instead of 3
    let data = vec![0xFE, 0xFF, 0xFF, 0x00, 0x00];
    let mut cursor = Cursor::new(&data);
    assert!(read_compact_size(&mut cursor).is_err());
}

35-35: Remove Unused Import

The import statement use std::io::Write; appears redundant because the trait Write is already imported through use crate::io::{self, Cursor, Read};. This may lead to confusion or warnings about unused imports.

Consider removing the redundant import:

-use std::io::Write;

1417-1515: Add Edge Case Tests for test_fixed_bitset_round_trip

The test_fixed_bitset_round_trip function covers several cases but could include more edge cases, such as when size is not a multiple of 8, or when bits contains all true or all false values.

Consider adding the following test cases:

// Case where size is not a multiple of 8 and bits are all true
let bits = vec![true; 13];
let size = 13;
...

// Case where bits are all false
let bits = vec![false; 20];
let size = 20;
...
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d224400 and e75dc6e.

📒 Files selected for processing (4)
  • dash/src/blockdata/block.rs (2 hunks)
  • dash/src/blockdata/transaction/special_transaction/coinbase.rs (5 hunks)
  • dash/src/blockdata/transaction/special_transaction/quorum_commitment.rs (6 hunks)
  • dash/src/consensus/encode.rs (3 hunks)
🔇 Additional comments (10)
dash/src/blockdata/transaction/special_transaction/coinbase.rs (4)

26-26: LGTM: Import addition aligns with serialization refactoring.

The addition of compact size utilities from the consensus module is appropriate for the serialization changes.


47-48: LGTM: Size calculation improved with compact size encoding.

The size calculation now correctly accounts for variable-length encoding of best_cl_height using compact_size_len, making the serialization more space-efficient.

Also applies to: 52-56


71-71: LGTM: Encoding logic updated consistently.

The encoding now uses write_compact_size for best_cl_height, maintaining consistency with the size calculation changes while preserving proper error handling.


99-99: LGTM: Decoding logic updated symmetrically.

The decoding now uses read_compact_size for best_cl_height, maintaining symmetry with the encoding changes.

dash/src/blockdata/block.rs (2)

483-483: LGTM!

The updated hex value for work_bytes is correctly formatted and matches the expected size.


496-498: Please verify the test values.

The updated test values raise some concerns:

  1. The time value (1730283725) corresponds to a date in 2024, which seems unusually far in the future.
  2. The difficulty values (0 and 0.0037...) are unusually low for a blockchain block.

Could you please verify these values against the actual block data to ensure they are correct?

Also applies to: 504-505

✅ Verification successful

Let me gather more context about this test case to understand if these values are from actual block data.


Let me check the test data source to understand if these values are from actual test vectors.


Let me check the actual test data being used in this specific test case.


Based on the gathered context, I can now provide a final response:

The test values appear to be from actual block data and are correct

The test is validating block deserialization using real block data, where:

  • The timestamp (1730283725) and other values are part of a complete test vector that includes matching prevhash, merkle root, and work values
  • The low difficulty is consistent with the provided work value (0x...f7b6f1) in the test
  • The test is properly structured to verify block header field serialization/deserialization
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for similar time and difficulty values in other test files
# to verify if these values are within expected ranges.

# Test: Search for time values in test files
rg -A 2 'time.*17[0-9]{8}' --type rust

# Test: Search for low difficulty values in test files
rg -A 2 'difficulty.*0\.00[0-9]+' --type rust

Length of output: 588


Script:

#!/bin/bash
# Look for the test function containing these assertions
rg -B 10 "assert_eq.*1730283725" --type rust

# Search for any block data or test vectors being used
rg -B 5 "real_decode.*header" --type rust

Length of output: 6978


Script:

#!/bin/bash
# Look for the test data/vectors being used in this test
rg -B 15 "some_block.*=" --type rust

# Also search for any constants or test data definitions
rg -B 5 "const.*BLOCK" --type rust

Length of output: 9360


Script:

#!/bin/bash
# Find the test function and its test data
rg -B 20 "1730283725" --type rust

Length of output: 3061

dash/src/blockdata/transaction/special_transaction/quorum_commitment.rs (3)

26-26: Imports for serialization utilities are appropriate

The added imports from consensus::encode are necessary for handling the encoding and decoding of compact sizes and fixed bitsets.


173-174: Verify lengths of signers and valid_members in test case

In this test case, signers has a length of 5, while valid_members has a length of 4. If these vectors are meant to be of the same length to represent quorum members consistently, this discrepancy might cause issues during serialization and deserialization.


195-196: Verify lengths of signers and valid_members in test case

Here, signers has a length of 7, and valid_members has a length of 6. Ensure that differing lengths are acceptable in the context of quorum commitments, or adjust the test cases to reflect intended usage.

dash/src/consensus/encode.rs (1)

908-931: Handle Unexpected EOF Gracefully in read_compact_size

If the reader does not contain enough bytes to fulfill the read requests, the function will return an UnexpectedEof error. Ensure that this behavior is acceptable or consider providing a more descriptive error message.

Run the following script to check how read_compact_size handles incomplete data:

Ensure that the function returns an appropriate error.

dash/src/blockdata/block.rs Outdated Show resolved Hide resolved
Comment on lines +39 to +40
pub signers: Vec<bool>,
pub valid_members: Vec<bool>,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential issues with Vec<bool> usage

Using Vec<bool> can lead to unexpected behaviors due to its internal bit-packing optimization. Consider using the bitvec crate's BitVec type for more robust and explicit bitset handling, or ensure that all operations account for the specifics of Vec<bool>.

dash/src/consensus/encode.rs Outdated Show resolved Hide resolved
Comment on lines 985 to 1005
pub fn write_fixed_bitset<W: Write + ?Sized>(w: &mut W, bits: &[bool], size: usize) -> io::Result<usize> {
// Calculate the number of bytes needed to represent 'size' bits
let num_bytes = (size + 7) / 8;
let mut bytes = vec![0u8; num_bytes];

// Determine the minimum size to handle cases where bits.len() < size
let ms = std::cmp::min(size, bits.len());

// Pack the bits into the byte buffer
for p in 0..ms {
if bits[p] {
bytes[p / 8] |= 1 << (p % 8);
}
}

// Write the bytes to the writer
w.write_all(&bytes)?;

// Return the number of bytes written
Ok(bytes.len())
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate size Parameter in write_fixed_bitset

Similarly, the write_fixed_bitset function uses the size parameter to determine buffer sizes without validation. If size is unreasonably large, it could cause memory issues.

Please add a validation check for size to ensure it's within acceptable bounds:

 pub fn write_fixed_bitset<W: Write + ?Sized>(w: &mut W, bits: &[bool], size: usize) -> io::Result<usize> {
+    // Define a reasonable maximum size to prevent excessive memory allocation
+    const MAX_BITSET_SIZE: usize = 1_000_000; // Adjust this value as appropriate
+    if size > MAX_BITSET_SIZE {
+        return Err(io::Error::new(
+            io::ErrorKind::InvalidInput,
+            "Bitset size exceeds maximum allowed value",
+        ));
+    }

     // Calculate the number of bytes needed to represent 'size' bits
     let num_bytes = (size + 7) / 8;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn write_fixed_bitset<W: Write + ?Sized>(w: &mut W, bits: &[bool], size: usize) -> io::Result<usize> {
// Calculate the number of bytes needed to represent 'size' bits
let num_bytes = (size + 7) / 8;
let mut bytes = vec![0u8; num_bytes];
// Determine the minimum size to handle cases where bits.len() < size
let ms = std::cmp::min(size, bits.len());
// Pack the bits into the byte buffer
for p in 0..ms {
if bits[p] {
bytes[p / 8] |= 1 << (p % 8);
}
}
// Write the bytes to the writer
w.write_all(&bytes)?;
// Return the number of bytes written
Ok(bytes.len())
}
pub fn write_fixed_bitset<W: Write + ?Sized>(w: &mut W, bits: &[bool], size: usize) -> io::Result<usize> {
// Define a reasonable maximum size to prevent excessive memory allocation
const MAX_BITSET_SIZE: usize = 1_000_000; // Adjust this value as appropriate
if size > MAX_BITSET_SIZE {
return Err(io::Error::new(
io::ErrorKind::InvalidInput,
"Bitset size exceeds maximum allowed value",
));
}
// Calculate the number of bytes needed to represent 'size' bits
let num_bytes = (size + 7) / 8;
let mut bytes = vec![0u8; num_bytes];
// Determine the minimum size to handle cases where bits.len() < size
let ms = std::cmp::min(size, bits.len());
// Pack the bits into the byte buffer
for p in 0..ms {
if bits[p] {
bytes[p / 8] |= 1 << (p % 8);
}
}
// Write the bytes to the writer
w.write_all(&bytes)?;
// Return the number of bytes written
Ok(bytes.len())
}

⚠️ Potential issue

Ensure Consistent bits Length Handling in write_fixed_bitset

In write_fixed_bitset, if bits.len() is less than size, the remaining bits are considered false. However, there's no indication to the caller that additional false bits were appended. This could lead to unexpected behavior.

Consider enforcing that bits.len() must be at least size or documenting this behavior clearly. Alternatively, you could return an error if bits.len() < size.

 pub fn write_fixed_bitset<W: Write + ?Sized>(w: &mut W, bits: &[bool], size: usize) -> io::Result<usize> {
+    if bits.len() < size {
+        return Err(io::Error::new(
+            io::ErrorKind::InvalidInput,
+            "Bits length is less than the specified size",
+        ));
+    }

Committable suggestion skipped: line range outside the PR's diff.

dash/src/consensus/encode.rs Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
dash/src/blockdata/transaction/special_transaction/quorum_commitment.rs (2)

73-76: Optimize length calculations in encoding.

The .iter().len() calls are redundant as Vec already implements the len() method directly.

Apply this diff to optimize the code:

-len += write_fixed_bitset(w, self.signers.as_slice(), self.signers.iter().len())?;
+len += write_fixed_bitset(w, self.signers.as_slice(), self.signers.len())?;
-len += write_fixed_bitset(w, self.valid_members.as_slice(), self.valid_members.iter().len())?;
+len += write_fixed_bitset(w, self.valid_members.as_slice(), self.valid_members.len())?;

163-206: Enhance test coverage.

While the size calculation tests are good, consider adding:

  1. Encoding/decoding roundtrip tests to verify data integrity
  2. Error case tests (e.g., invalid sizes, malformed input)

Would you like me to provide example test cases?

dash/src/consensus/encode.rs (3)

908-945: Consider using the custom Error type for consistency.

The function returns io::Result<u32> while other similar functions in this file use the custom Error type. Consider changing the return type to Result<u32, Error> for consistency.

-pub fn read_compact_size<R: Read + ?Sized>(r: &mut R) -> io::Result<u32> {
+pub fn read_compact_size<R: Read + ?Sized>(r: &mut R) -> Result<u32, Error> {
     let mut marker = [0u8; 1];
-    r.read_exact(&mut marker)?;
+    r.read_exact(&mut marker).map_err(Error::Io)?;
     match marker[0] {
         0xFD => {
             let mut buf = [0u8; 2];
-            r.read_exact(&mut buf)?;
+            r.read_exact(&mut buf).map_err(Error::Io)?;
             let value = u16::from_le_bytes(buf) as u32;
             if value < 0xFD {
-                return Err(io::Error::new(
-                    io::ErrorKind::InvalidData,
-                    "Non-minimal compact size encoding",
-                ));
+                return Err(Error::NonMinimalVarInt);
             }
             Ok(value)
         }

966-978: Style and conciseness improvements suggested.

The function has inconsistent spacing in else clauses and could be more concise.

 pub fn compact_size_len(value: u32) -> usize {
-    let mut size: usize = 0;
-    if value < 253 {
-        size += 1;
-    }
-    else if value < 65536 {
-        size += 3;
-    }
-    else {
-        size += 5;
-    }
-    size
+    if value < 253 {
+        1
+    } else if value < 65536 {
+        3
+    } else {
+        5
+    }
 }

1007-1035: Add documentation about bits.len() < size behavior.

The function correctly handles cases where bits.len() is less than size, but this behavior should be documented.

Add a doc comment explaining that remaining bits are implicitly false:

+/// Writes a fixed-size bitset to a writer.
+///
+/// # Arguments
+/// * `w` - The writer to write to
+/// * `bits` - The bits to write
+/// * `size` - The size of the bitset
+///
+/// If `bits.len()` is less than `size`, the remaining bits are treated as false.
+///
+/// # Returns
+/// The number of bytes written
 pub fn write_fixed_bitset<W: Write + ?Sized>(w: &mut W, bits: &[bool], size: usize) -> io::Result<usize> {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between e75dc6e and fb58431.

📒 Files selected for processing (3)
  • dash/src/blockdata/block.rs (2 hunks)
  • dash/src/blockdata/transaction/special_transaction/quorum_commitment.rs (6 hunks)
  • dash/src/consensus/encode.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • dash/src/blockdata/block.rs
🔇 Additional comments (7)
dash/src/blockdata/transaction/special_transaction/quorum_commitment.rs (3)

26-26: LGTM: Import changes are appropriate.

The new imports for compact size and fixed bitset functions align with the refactoring changes.


39-40: Consider using bitvec crate instead of Vec<bool>.


51-54: LGTM: Size calculation is accurate.

The size calculation correctly accounts for both the compact size and fixed bitset lengths of the boolean vectors.

dash/src/consensus/encode.rs (4)

947-964: LGTM!

The implementation correctly handles different size ranges with proper byte ordering and clear comments.


980-1005: LGTM!

The implementation correctly handles size validation and bit manipulation, with proper memory allocation limits.


1037-1043: LGTM!

The implementation correctly calculates the required bytes for the bitset with clear variable naming.


1447-1545: LGTM! Comprehensive test coverage.

The test suite provides thorough coverage of both normal cases and edge cases, with well-organized test cases and proper round-trip testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant