Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

engine:enhanced byte order handling for timestamps #9196

Merged

Conversation

mirko-lazarevic
Copy link
Contributor

@mirko-lazarevic mirko-lazarevic commented Aug 13, 2024

This change ensures correct byte order conversions for timestamp fields within log event decoder and encoder.

Added FLB_TO_NATIVE_UINT32 in flb_endian.h. This function checks the host machine's byte order and applies the necessary conversion for 32-bit unsigned integers.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

Testing is performed on BigEndian machine using IBM Linux One

  • Example configuration file for the change
[SERVICE]
    Flush                   5
    Daemon                  Off
    Log_Level               debug
    Parsers_File            parsers.conf
    Plugins_File            plugins.conf
    HTTP_Server             On
    HTTP_Listen             0.0.0.0
    HTTP_Port               8084
 
[INPUT]
    Name   dummy
    Dummy {"message": "custom dummy"}

[OUTPUT]
    Name forward
    Match *
    Host 127.0.0.1
    Port 24284
  • Debug log output from testing the change

Using nc command we captured the output.

nc -4 -l -p 24284 > /tmp/capture.forward.dummy.fix.bin &

Using vim and %!xxd command we transform a file in Vim to hex representation.

Capture before fix:
image

We observe that the timestamp converts to a date in the past:
image

Capture after the fix:
image

We observe that the timestamp converts to the current date (the date when the test was performed):
image

  • [N/A] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@mirko-lazarevic
Copy link
Contributor Author

Hey @leonardo-albertovich @edsiper, this pr introduces a new fuction FLT_TO_NATIVE_UINT32 that dynamically determines the byte order of the host machine and apples the necessary byte swap operation for 32-bit unsigned integers. This approach accommodates both little-endian and big-endian architectures and should ensure consistent handling of data (timestamps) across platforms with different endianess.

My knowledge of the fluent-bit codebase is limited, therefore, I am uncertain about the function name and the location where I added the function, in this case, the flb_byteswap.h file. I'm open for suggestions. Thank you.

@@ -102,4 +103,13 @@ static inline uint64_t FLB_BSWAP_64(uint64_t value)

#endif

static inline uint32_t FLB_TO_NATIVE_UINT32(uint32_t value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use FLB_BYTE_ORDER and FLB_BIG_ENDIAN instead of these.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this to FLB_UINT32_TO_HOST_BYTE_ORDER so its intention is clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

src/flb_log_event_encoder_primitives.c Outdated Show resolved Hide resolved
This change ensures correct byte order conversions for
timestamp fields within log event decoder and encoder.

Added FLB_TO_NATIVE_UINT32 in flb_endian.h. This function checks the
host machine's byte order and applies the necessary conversion for 32-bit
unsigned integers.

Co-authored-by: Bernhard Schmid <[email protected]>
Signed-off-by: Mirko Lazarevic <[email protected]>
@rightblank
Copy link

Hi, @edsiper @leonardo-albertovich @cosmo0920 @pwhelan,
Would you please help to review this PR again? This is a crucial part for fluent bit to work on s390x.

@rightblank
Copy link

rightblank commented Aug 21, 2024

@rightblank I see the unit tests in the CI are failing, would you please check them ?

Hi, @edsiper @leonardo-albertovich @cosmo0920 @pwhelan @fujimotos, these cases failed because the timestamps are dealt with in big endian system style because of an issue in the flb_endian.h file

  • On linux x86 platform, __BIG_ENDIAN is defined in /usr/include/x86_64-linux-gnu/bits/endian.h,

    #define __LITTLE_ENDIAN 1234
    #define __BIG_ENDIAN    4321
    #define __PDP_ENDIAN    3412
    
  • This makes the condition on line 60 of flb_endian.h become true and FLB_BYTE_ORDER is defined to FLB_BIG_ENDIAN on line 61.

    60    #elif defined(__BIG_ENDIAN__) || defined(__BIG_ENDIAN) || defined(_BIG_ENDIAN)
    61        #define FLB_BYTE_ORDER FLB_BIG_ENDIAN
    

@leonardo-albertovich
Copy link
Collaborator

@rightblank I see the unit tests in the CI are failing, would you please check them ?

Hi, @edsiper @leonardo-albertovich @cosmo0920 @pwhelan @fujimotos, these cases failed because the timestamps are dealt with in big endian system style because of an issue in the flb_endian.h file

  • On linux x86 platform, __BIG_ENDIAN is defined in /usr/include/x86_64-linux-gnu/bits/endian.h,
    #define __LITTLE_ENDIAN 1234
    #define __BIG_ENDIAN    4321
    #define __PDP_ENDIAN    3412
    
  • This makes the condition on line 60 of flb_endian.h become true and FLB_BYTE_ORDER is defined to FLB_BIG_ENDIAN on line 61.
    60    #elif defined(__BIG_ENDIAN__) || defined(__BIG_ENDIAN) || defined(_BIG_ENDIAN)
    61        #define FLB_BYTE_ORDER FLB_BIG_ENDIAN
    

Good catch, do you want to open a PR to fix it or would you rather have me do it?

@rightblank
Copy link

Good catch, do you want to open a PR to fix it or would you rather have me do it?

@leonardo-albertovich, I have a local patch but not sure if it can work for *BSD, windows, and macOS

diff --git a/include/fluent-bit/flb_endian.h b/include/fluent-bit/flb_endian.h
index b376ea842..5a6f73a09 100644
--- a/include/fluent-bit/flb_endian.h
+++ b/include/fluent-bit/flb_endian.h
@@ -55,9 +55,9 @@
 #define FLB_BIG_ENDIAN    1
 
 #ifndef FLB_BYTE_ORDER
-    #if defined(__BYTE_ORDER__) &&  __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
-        #define FLB_BYTE_ORDER FLB_BIG_ENDIAN
-    #elif defined(__BIG_ENDIAN__) || defined(__BIG_ENDIAN) || defined(_BIG_ENDIAN)
+    #if (defined(__BYTE_ORDER__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) || \
+          (defined(_BYTE_ORDER) && _BYTE_ORDER == _BIG_ENDIAN) || \
+          (defined(__BIG_ENDIAN__))
         #define FLB_BYTE_ORDER FLB_BIG_ENDIAN
     #else
         #define FLB_BYTE_ORDER FLB_LITTLE_ENDIAN

@leonardo-albertovich
Copy link
Collaborator

leonardo-albertovich commented Aug 21, 2024

Yeah, that's the thing, since my initial approach was clearly flawed I think we should do a more thorough check for the second one.

Another option would be using a cmake test like this :

check_c_source_runs("
  int main() {
    volatile uint64_t source_value;
    volatile uint8_t *test_value;

    source_value = 1;
    test_value = (uint8_t *) &source_value; 

    return (int) test_value[0];
  }
  }" FLB_ENDIANNESS_TEST_RESULT)

if(!FLB_ENDIANNESS_TEST_RESULT)
  FLB_DEFINITION(FLB_HAVE_BIG_ENDIAN_SYSTEM)
endif()

The only upside of this would be simplicity and the pressumption that it should work accross systems (unless the compiler is seriously dodgy about optimizations).

Which approach do you think would be better?

Side note : the volatile qualifier is more or less abused to prevent overly eager compilers from erroneously optimizing the code.

@rightblank
Copy link

rightblank commented Aug 21, 2024

@leonardo-albertovich I'm good with your solution, it's more generic across all the platforms, please go ahead and open the PR! Thanks in advance!

Also I remembered that the cmake file of msgpack-c implemented something very similar to your solution.
https://github.com/fluent/fluent-bit/blob/master/lib/msgpack-c/CMakeLists.txt#L12-L30

@leonardo-albertovich
Copy link
Collaborator

Well, given that it's already implemented I don't see a need to add a different version so I'll just copy and adapt that snippet in a PR.

I'll send an update as soon as it's up.

@leonardo-albertovich
Copy link
Collaborator

PR #9256 up, would you mind taking a look at it @rightblank?

@rightblank
Copy link

Hi, @cosmo0920, would you please help to trigger the CI for this PR again?

@rightblank
Copy link

@rightblank I see the unit tests in the CI are failing, would you please check them ?

Hi, @edsiper, the CI looks good now, would you please help to merge this PR?

@edsiper edsiper merged commit 8e5c213 into fluent:master Aug 27, 2024
43 checks passed
@mirko-lazarevic mirko-lazarevic deleted the timestamp-encode-decode-big-endian branch August 29, 2024 18:16
legendary-acp pushed a commit to ctrlb-hq/ctrlb-fluent-bit that referenced this pull request Oct 1, 2024
…r timestamps (fluent#9196)

This change ensures correct byte order conversions for
timestamp fields within log event decoder and encoder.

Added FLB_TO_NATIVE_UINT32 in flb_endian.h. This function checks the
host machine's byte order and applies the necessary conversion for 32-bit
unsigned integers.

Signed-off-by: Mirko Lazarevic <[email protected]>
Co-authored-by: Bernhard Schmid <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants