dev ARQ (retransmit) #185

olliw42 · 2024-05-24T16:24:23Z

this is a dev/test branch for working on retransmission

this one is simplified in two ways, to not make it too complicated and facilitate easier testing:

retransmission handling only for receiver->transmitter link direction. That's the more complicated direction also (entangles with flow control). Once this is bugged out adding tx->rx direction should be easy.
lossles transmission, i.e., a frame is send for as long as it is not acked. For low LQ this is bad. Eventually we should go to a scheme there it does only one retransmit attempt, but so it's easier to get going.

@brad112358 this might interest you, given you showed interest in this before. I actually would much appreciate you cheking it out; your strength in finding the little loopholes/bugs would be useful :)

anyone else is of course also massively welcome !!!!!

olliw42 · 2024-05-24T16:25:39Z

a pic I made to help me

brad112358 · 2024-05-26T17:42:33Z

Thanks for including the very helpful and, I think clear, diagram. So far, I've only looked at that, but I already have a few comments/observations/suggestions.

You seem to be sending back only ack or nack and not the received sequence number. I believe this is not optimal as we observe below.
Sequence number 5 shows sub-optimal behavior which is a direct result of 1). In the 8th response, we send a nack with no indication of which sequence number was lost, even though, we have previously received and accepted the frame which was lost.
Instead of responding with just ack or nack, it would be better to send back a received sequence number and to always just acknowledge the last successfully received data. In other words, it doesn't matter much that we haven't received the most recent message. What matters most is what was the last message we successfully did receive. This change would have eliminated an unnecessary retransmission for sequence number 5.
The diagram doesn't specify the size of the sequence number field, but they seem to be at least 3 bits. I think the sequence numbers don't need to be so large. In fact, I think they can be 1 bit. This is because we have alternating transmission direction with no chance of out of order reception. This reduces the overhead for the sequence number and has the added advantage that responding with the last received sequence number rather than ack or nack still only requires 1 bit.
What I describe above is actually the retransmission method ELRS uses (last time I checked) and I think it is probably optimal in terms of overhead. They can only use it in one direction because they don't send an equal number of frames in both directions, but we can use this method in both directions.
There might be some small utility to using sequence numbers larger than 1 bit or to also send the ack/nack flag in terms of allowing each side to more accurately estimate the loss in both directions.
I agree we should limit the number of retransmissions. Even with 1 bit sequence numbers, I think we can limit the number of retransmissions to any value we like as long as both sides agree what the limit is (both sides would need a counter for each direction).

Does this make sense to you? Am I missing something?

olliw42 · 2024-05-26T18:14:14Z

many thx for the comments

concerning all points related to seq not 1 bit: YES, 1 bit is fully sufficient for a basic mechanism. Eventually we may want to change. However, for purely historical reasons the seq number just happens to be 3 bit, and at this point I see no reason to change that. We know we could/will. Not a relevant point IMHO :)

there is btw always a situation which is not optional. See e.g seq 3

the protocol is just the standard protocol as on any web page (there are various names for it, so no name here)(all 1 bit). There are two versions, those which send an ack, and those which send the next desired number. The main challenge is handling the various non-protocol related states, like disconnects, frames which carry commands not serial data, etc. pp. Hence these states, and I figured that the send ack version is easier to handle these states.

it's possible that with > 1 bit one may reduce few edge cases

the problem with 5 is probably solved by using the other method, to send the next desired no. Hm. Maybe I shoud convert to that.
EDIT: yes, I think that's what we should go to ...

brad112358 · 2024-05-26T18:25:28Z

~~Can you elaborate a bit on what cases in mLRS the ack/nack is better than sending the sequence number of the last received frame?~~ (You are faster at edits than I am at replys) If I understand your edit correctly, that was my point. Is there any practical difference between sending the next desired number vs the last received number?

brad112358 · 2024-05-26T18:32:51Z

Also, 3 is only sub-optimal if you allow for sending ahead of acknowledged (whether by sequence number or ack) frames which requires more buffering and can result in out of order delivery.

I think such type of protocols are out of question

olliw42 · 2024-05-26T18:42:23Z

sorry, I edited your post ... grrrr, this damed github, not the first time I stepped into ths trap

olliw42 · 2024-05-26T18:43:42Z

Is there any practical difference between sending the next desired number vs the last received number?

the book keeping and state handling looked easier to me

brad112358 · 2024-05-26T19:14:08Z

the book keeping and state handling looked easier to me

To me, the opposite. Sending back the last received sequence number means you just check if the acknowledged sequence number matches what you last sent. If so, move on, if not, retransmit. The other way seems to require you to add when you respond and subtract when you compare. But, I suppose it's mostly a matter of how you think about it. Either way, there is not much state involved except for deciding when to give up on retransmission of a given frame.

Many of the algorithms found online and in text books are intended for more complex systems which don't just ping-pong messages at a constant interval like we do, so a very simple method can be optimal for us if we rule out buffering more than the most recent transmission as you have.

olliw42 · 2024-05-26T19:18:20Z

I was trying this initially, but then concluded for the ack, I will retry, could be that at the early stages I also had too much of how to abstract the code in mind. I didn't had sorted out the states initially. Anyway, it has benefits, so whatever what it's going to be so :)

olliw42 · 2024-05-27T20:31:45Z

@brad112358
so, changed it now to send the last received seq no as ack, instead of ack/nack-ing reception

seemingly works, in that it connects etc. pp, but the symptom described by @jlpoltrack also exists, i.e. MP shows lost packets ... so, appears something is still not working as expected ...

here the time plan for the changed protocol
(note, ack is only 1 bit, seq is 3 bits)

# Conflicts: # mLRS/CommonRx/mlrs-rx.cpp # mLRS/CommonTx/mlrs-tx.cpp

olliw42 · 2024-05-28T06:10:15Z

with

#define USE_ARQ_TX_SIM_MISS 9 //9
#define USE_ARQ_RX_SIM_MISS 5 //5

I do see continuous packet losses in MP, MP tells pretty stably 95-96% link quality, so around every 20iest packet is lost

the mLRS LQ metric on the OLED display tells something around 75% ... not sure if that means that the mechanism is helping

I do my tests btw in 19 Hz mode (with a 2.4 GHz system)

brad112358 · 2024-05-28T15:43:59Z

And you didn't add a retransmission limit yet? So something must be wrong if MP is correct. When I get some time, I'll try to reproduce the problem with QGC.

brad112358 · 2024-05-28T15:45:55Z

Do you use Bluetooth or UDP or TCP WiFi or wired serial for the GCS connection?

olliw42 · 2024-05-28T16:30:01Z

And you didn't add a retransmission limit yet?

yes, no retransmission limit

So something must be wrong if MP is correct.

yes :)

Do you use Bluetooth or UDP or TCP WiFi or wired serial for the GCS connection?

wired serial connection, from tx serial via usb-ttl to PC

one potential source of problem which I have not yet ruled out is that the stream flow control isn't good enough, so that AP sends too many messages, so that some are dropped every once in a while ... I'm using 19 Hz, so there is some restriction.
It's also curious that 95% is close to 1/19 ... can't see though how that could correlated.
Some more tests should clear up some speculations.

brad112358 · 2024-05-28T21:55:36Z

Looks like that fixed it. QGC is now reporting 0 lost messages

olliw42 · 2024-05-28T22:03:58Z

Looks like that fixed it. QGC is now reporting 0 lost messages

YES :) @jlpoltrack made the relevant comment

not sure you also follow the discussion at discord

brad112358 · 2024-05-28T22:26:16Z

Well, It ran well for a while and then it started dropping a lot of messages and it seemed to get worse over time. I've power cycled both ends of the link (one at a time) and restarted the GCS, but it hasn't recovered. I'm not sure what happened.

brad112358 · 2024-05-28T23:35:34Z

I had the baud rate too high for the crap R9M inverter with weak pullup. Working fine at 115200 serial speed on the Tx

# Conflicts: # mLRS/CommonRx/mlrs-rx.cpp # mLRS/CommonTx/mlrs-tx.cpp

olliw42 added 2 commits May 24, 2024 15:52

1st

c48ddb1

2nd, seems ok

d07e8e3

olliw42 added do not merge experimental labels May 24, 2024

nfc

062b73a

change to ack with last received seq no

f43d224

olliw42 added 4 commits May 27, 2024 22:45

ups

67e45e5

rx, previous frame handling corrected

19c5402

Merge branch 'main' into dev-arq

6d1ee37

Merge branch 'main' into dev-arq

325d3c8

# Conflicts: # mLRS/CommonRx/mlrs-rx.cpp # mLRS/CommonTx/mlrs-tx.cpp

tx, correct mavlink parser reset

197c44d

some streamlining, preperation

4ea70dd

olliw42 added 30 commits June 28, 2024 08:13

Merge branch 'main' into dev-arq

517e8d9

# Conflicts: # mLRS/CommonRx/mlrs-rx.cpp # mLRS/CommonTx/mlrs-tx.cpp

cc

4b5971e

Merge branch 'main' into dev-arq

6df029a

Merge branch 'main' into dev-arq

286d991

formatting

ed4f274

tx, tasks revised, ups

5bf51c8

common, tidy, nfc

5048328

rx, remove buzzer setting

1541f4f

rx,tx, hal, separate out generic power configs, nfc

d861793

Merge branch 'main' into dev-arq

15fd452

cube, ld scripts etc to v1.16.0

ac527f9

cube, ld scripts etc to v1.16.0 for wle

01ac320

cube, ld scripts etc to v1.16.0 for f072

6e8e77e

cube, ld scripts etc to v1.16.0 for R9 systems

bd15cb1

cube, ld scripts etc to v1.16.0 for f103cb

843e6d4

mavlinkX, drop a warning with gcc12

9b900f3

Merge branch 'main' into dev-arq

b20b9a5

some doc

d69c5b4

tidy, nfc

e74735a

try to do only rx->tx, doesn't fully work

40a8cbc

rx,common, rename tarq Receive() to AckReceive(), nfc

574c904

tx, remove sx serial IsEnabled(), is always true, should be nfc

277a3b4

rx,tx, tidy comments, nfc

3f24eec

tx, esp, remove esp cli, ser baud in init

74d1ca2

tx, cli, improve timing for usb, improve configid, add configid in h

50ef52a

tx, disp, a method is missing

fcf1afd

sx driver, sx1262, sx1280, allow general power calc

450398b

fan remove comment, nfc

5b0f96c

submodule stm32ll-lib update

f0881e0

Merge branch 'dev-arq' into merge-main

8199f4d

# Conflicts: # mLRS/CommonRx/mlrs-rx.cpp # mLRS/CommonTx/mlrs-tx.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dev ARQ (retransmit) #185

dev ARQ (retransmit) #185

olliw42 commented May 24, 2024 •

edited

Loading

olliw42 commented May 24, 2024

brad112358 commented May 26, 2024 •

edited

Loading

olliw42 commented May 26, 2024 •

edited

Loading

brad112358 commented May 26, 2024

brad112358 commented May 26, 2024 •

edited by olliw42

Loading

olliw42 commented May 26, 2024

olliw42 commented May 26, 2024

brad112358 commented May 26, 2024

olliw42 commented May 26, 2024 •

edited

Loading

olliw42 commented May 27, 2024 •

edited

Loading

olliw42 commented May 28, 2024

brad112358 commented May 28, 2024

brad112358 commented May 28, 2024

olliw42 commented May 28, 2024

brad112358 commented May 28, 2024

olliw42 commented May 28, 2024

brad112358 commented May 28, 2024 •

edited

Loading

brad112358 commented May 28, 2024

dev ARQ (retransmit) #185

Are you sure you want to change the base?

dev ARQ (retransmit) #185

Conversation

olliw42 commented May 24, 2024 • edited Loading

olliw42 commented May 24, 2024

brad112358 commented May 26, 2024 • edited Loading

olliw42 commented May 26, 2024 • edited Loading

brad112358 commented May 26, 2024

brad112358 commented May 26, 2024 • edited by olliw42 Loading

olliw42 commented May 26, 2024

olliw42 commented May 26, 2024

brad112358 commented May 26, 2024

olliw42 commented May 26, 2024 • edited Loading

olliw42 commented May 27, 2024 • edited Loading

olliw42 commented May 28, 2024

brad112358 commented May 28, 2024

brad112358 commented May 28, 2024

olliw42 commented May 28, 2024

brad112358 commented May 28, 2024

olliw42 commented May 28, 2024

brad112358 commented May 28, 2024 • edited Loading

brad112358 commented May 28, 2024

olliw42 commented May 24, 2024 •

edited

Loading

brad112358 commented May 26, 2024 •

edited

Loading

olliw42 commented May 26, 2024 •

edited

Loading

brad112358 commented May 26, 2024 •

edited by olliw42

Loading

olliw42 commented May 26, 2024 •

edited

Loading

olliw42 commented May 27, 2024 •

edited

Loading

brad112358 commented May 28, 2024 •

edited

Loading