Initial J1939 changes #57

JohnLussmyer · 2024-02-11T02:47:36Z

Initial set of changes, test failing

pschichtel · 2024-02-11T05:02:30Z

If I understand the can-utils code below correctly, then you can only write to sockets after connect() (so you have a known destination address for write()) and and you can only read from sockets after bind() (so you have a known receive address for read())

https://github.com/linux-can/can-utils/blob/master/j1939sr.c (sender + receiver)
https://github.com/linux-can/can-utils/blob/master/j1939spy.c (pure receiver)

(can-utils is mostly using send, sendmsg, recv and recvmsg because they also process message attribute, they are equivalent to read and write for JavaCAN's purposes)

JohnLussmyer · 2024-02-11T13:44:30Z

Ahh ok, I hadn't quite made sense of the J1939 docs yet.
I'll try setting up tests with addresses.

JohnLussmyer · 2024-02-11T20:47:52Z

Well, I'm getting a bit closer. Now it fails during write with a:

tel.schich.javacan.platform.linux.LinuxNativeOperationException: Unable to get FD frame support - errorNumber=22, errorMessage='Invalid argument'

pschichtel · 2024-02-11T21:03:24Z

that exception does not happen during write, but during the setOption call of the FD_FRAMES option. not sure why it would be invalid, I guess J1939 might not support FD frames?

JohnLussmyer · 2024-02-11T21:07:57Z

It's happening on the test code line 202 - which is commented out.
Looking at what was being called, it's failing on
final CanFrame output = b.read();

pschichtel · 2024-02-11T21:12:22Z

your read() method does call getOption(FD_FRAMES) to allocate a correctly sized buffer, which I assume you copied over from the raw channel. When J1939 doesn't support FD, you can just hardcode MTU (the else-value in the ternary).

pschichtel · 2024-02-11T21:18:19Z

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/can/j1939/socket.c#n681

there you can see the supported socket options for J1939 sockets.

JohnLussmyer · 2024-02-11T21:28:51Z

Yeah, lots of copied code.
Your do realize that i'm new to the whole canbus programming thing? :-)
I'm completely un-surprised at missing odd interactions and restrictions.
I'll make some changes and try again!

JohnLussmyer · 2024-02-11T21:35:55Z

Whee!!! that test passed!

JohnLussmyer · 2024-02-11T22:27:12Z

In my hunt for actual documentation on the options, I did find this:
https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/
Sure would be nice to find whatever doc it's a patch for...

pschichtel · 2024-02-11T22:30:15Z

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/j1939.rst

I think that's what you are looking for

JohnLussmyer · 2024-02-11T22:47:59Z

Not really, that's pretty much the same as a couple of the other pages I've found, where it mentions the options, but gives no description of what they do, or what values are valid.
The patch link I gave has descriptions and values.

pschichtel · 2024-02-11T23:29:30Z

What are you looking for? Except for SO_J1939_ERRQUEUE the J1939 specific options seem to all be mentioned and explained. I think parts of the options you found in that patch never materialized upstream. From scrolling through it I guess some of the options are covered by existing generic CAN options

JohnLussmyer · 2024-02-11T23:54:53Z

I'm used to full docs on an API.
The link you provided mentions SO_J1939_PROMISC and SO_J1939_SEND_PRIO.
No specifics on what the values should be, almost no description of what they do.
Admittedly, this seems to be the normal case for Unix/Linux API's, where it's expected that you go look at the source code to figure out the details.
It's not what I consider a good API doc.

pschichtel · 2024-02-12T08:15:01Z

Their domain is actually documented, though not as explicitly as you'd hope. I think the "issue" is that they assume knowledge of the specs which are sadly not as accessible as I would prefer them to be. I learned that when I tried implementing ISO-TP in user space.

If you want to improve on this, the can subsystem mailing list is pretty responsive.

Given you are at the point that this is working for an initial set of tests, I'm thinking about merging this early and cleaning it up a bit on master. What do you think?

JohnLussmyer · 2024-02-12T16:16:23Z

Sure , I take it need to do another pull request?

pschichtel · 2024-02-12T17:32:16Z

I would merge it and refine it on master to be a little more in line with the rest of the codebase. Is there any major functionality left you'd like to add?

JohnLussmyer · 2024-02-12T17:43:38Z

Not that I can think of right now.
Once I start using it, maybe...

pschichtel · 2024-02-12T23:39:35Z

I've locally reworked your commits to remove noisy whitespace changes, I'll also do some other cleanups

JohnLussmyer · 2024-02-13T00:15:09Z

Not surprised at the whitespace changes. I'm so used to letting eclipse clean up the little things...

JohnLussmyer · 2024-02-13T03:39:46Z

I've setup a tiny test app, and getting a weird error:

john@raspberrypi4:~/PiCanTest $ java -cp PiCanTest_lib -jar PiCanTest.jar
Exception in thread "main" java.lang.IllegalArgumentException: length must be either MTU or FD_MTU, but was 8!
at tel.schich.javacan.CanFrame.createUnsafe(CanFrame.java:458)
at tel.schich.javacan.CanFrame.create(CanFrame.java:431)
at tel.schich.javacan.J1939CanChannelImpl.read(J1939CanChannelImpl.java:101)
at tel.schich.javacan.J1939CanChannelImpl.read(J1939CanChannelImpl.java:95)
at com.casadelgato.cantest.PiCANTest.main(PiCANTest.java:32)

I've even tried forcing it to 16, like this:

public CanFrame read() throws IOException {
	int length = 16; //MTU;

Same error.

pschichtel · 2024-02-13T14:53:01Z

have you checked how many bytes have actually been read? Also: you shouldn't be reading CAN frames from the J1939 socket. the read() syscall reads the data bytes only. I've already deleted the CanFrame read() methods on master as they don't make sense. CanFrame is meant for raw frames (I know, the name doesn't really reflect that.)

pschichtel · 2024-02-13T15:02:54Z

I think we might have to add send* and recv* style APIs as J1939 doesn't seem to be useful without metadata in certain situations. ISO-TP is only point-to-point (with a tiny exception), so that got away with just read and write, but J1939 supports multicasts more similar to raw.

JohnLussmyer · 2024-02-17T20:45:03Z

So, it looks like I need to add the Send and Recv calls?
Ok, will do that today.
(As you might have guessed, I don't get much free time during the work week...)

pschichtel · 2024-02-17T20:52:41Z

I was actually about to start on it

JohnLussmyer · 2024-02-17T20:57:37Z

If you want to, that would be great! You are likely far more experienced with these updates, and won't mess up the formatting like I do.

JohnLussmyer · 2024-03-09T18:51:13Z

Oh, and yes it's my fault for not properly reading the javadocs.
My only excuse is that my dev environment right now is spread of 3 computers, and I didn't HAVE the javadocs on the one that needed them.
As far as I can tell, JavaCAN is working just fine.

pschichtel · 2024-03-09T19:30:56Z

ok, now I realize that the send method is returning the buffer size, not a linux return code.
Not used to that style.

I think it's pretty common for read/write operations to return the bytes read/written. Java's various stream and channel APIs do the same, also all the posix IO functions.

Oh, and yes it's my fault for not properly reading the javadocs.

Don't worry

As far as I can tell, JavaCAN is working just fine.

great.

pschichtel · 2024-03-09T19:37:53Z

I considered adding some function to parse the PGN fields, would that be of interest to you?

JohnLussmyer · 2024-03-09T20:16:14Z

I don't think so. When I send, the code is simple. The PGN is a constant 0x00xxxx. (I have it in an enum)
J1939Address addr2 = new ImmutableJ1939Address(networkDevice, 0L, EPGN.DNC_Setpoint.getPGN(), bcvAddr);
When I receive, I just se the object methods, and don't need any finer grain parsing.
log.debug( "handleMessage: {}",
String.format( "name=%016X, PGN=%08X, from=%02X, data=%s",
hdr.getSourceAddress().getName(),
hdr.getSourceAddress().getParameterGroupNumber(),
hdr.getSourceAddress().getAddress(),
CanUtils.hexDump(buf)));

What it's looking like I DO need, is some way to write a log file in the Canoe .asc file format.

pschichtel · 2024-03-09T20:52:29Z

I've just split out the tools package into a separate module: https://github.com/pschichtel/JavaCAN/tree/master/tools/src/main/java/tel/schich/javacan/tools

feel free to add additional tools there.

JohnLussmyer · 2024-03-09T21:40:04Z

I am running into one issue - that I'm not sure WHO is causing it.
After working with my device for a while - all messages just stop.
When I send a message, i don't receive it. (normally I do)

Resetting the equipment doesn't help.
The only thing that fixes it is shutting down my applications, and re-starting it.

Hmm, if I do a JavaCan.allocateOrdered(), do I need to release it?
Same with the new J1939ReceiveMessageHeaderBuffer().

I just found a place in my code that assumes they get garbage collected like any other Java object.

pschichtel · 2024-03-09T21:43:21Z

ByteBuffers will be GCed, yes, even direct buffers. The APIs auf JavaCAN are also all designed to be able to re-use buffers across operations. I'd recommend to re-use buffers as much as possible instead of allocating for each operation.

pschichtel · 2024-03-16T13:41:10Z

@JohnLussmyer I just remembered that I have the TROUBLESHOOTING.md that documents some pitfalls and how to detect/resolve them. It doesn't currently have anything on J1939, maybe some of your experiences are worth documenting there?

JohnLussmyer · 2024-03-16T22:24:06Z

So far the only real problem I had was using ByteBuffer.get(byte[], off, len)
I had my offset 1 too large, and that would lockup the thread it was on.
As far as I can tell, it was overwriting something in memory.

I'm currently running into an issue where after running for 10 minutes or so, it just stops receiving any messages.
The send thread doesn't seem to be locked up, as the logging is showing that my sends are returning, just my receive thread stops receiving anything at all. The receive thread hasn't exited, as I have logging that would indicate that.
If I shutdown my app, and restart it - things start working again.

pschichtel · 2024-03-16T22:27:08Z

can you attach strace to the program when it is stuck ? also a thread dump might be interesting.

JohnLussmyer · 2024-03-16T22:38:47Z

Note that I am primarily (nowadays) a Java programmer. It's been many years since I had to debug native code.
If you can provide instructions on what I can do to help....
Note that this is being run on a Raspberry Pi 4, using java 17.

pschichtel · 2024-03-16T22:51:06Z

Simply install strace and run strace -p <pid of java process> when it's stuck. it should log any syscalls being invoked.

For the thread dump you can just use the jstack tool from the jdk: jstack -F -l -m <pid of java process>

The outputs might be very long, so paste them into github gist or pastebin or so.

JohnLussmyer · 2024-03-16T23:37:11Z

Well, it's obvious that I need to know more about those tools.
strace gave very little output - and redirecting the console to a file resulted in an empty file.
jstack failed, and suggested I run jhsdb in front of it. I then had to play around to get a command line it would accept.
The only thing it reported was this:

Attaching to process ID 2389, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 17.0.10+7-Debian-1deb12u1
Deadlock Detection:

pschichtel · 2024-03-16T23:55:22Z

strace gave very little output

can you provide what ever little output it produced?

JohnLussmyer · 2024-03-16T23:58:49Z

I'll try again, and copy/paste the console since redirection failed. (and I didn't notice until I got back to the desk.)

JohnLussmyer · 2024-03-17T00:21:43Z

probably tomorrow. Monitor out in the shop just failed, so I have to find another that will work with that unit.

pschichtel · 2024-03-17T00:23:24Z

get a pikvm if you have power and wifi or lan available: https://pikvm.org/

JohnLussmyer · 2024-03-17T00:45:42Z

I still need a real monitor out in the shop. It's 200' from my desk, and I have to be at the equipment to work with it. (power switches, etc...)

JohnLussmyer · 2024-03-17T00:46:03Z

I'll probably drag a spare laptop out there and SSh into the Pi.

JohnLussmyer · 2024-03-17T20:01:58Z

Ok, found the lockup issue.
My code received a msg that I hadn't received before - and once again I had a 1 byte offset error in the byteBuffer.get(byte[]) call.
I'm REALLY not used to java code that does NOT check array indices for being valid.
(and it didn't help that the docs I have on this message are wrong.)

pschichtel · 2024-03-18T00:20:04Z

I'm REALLY not used to java code that does NOT check array indices for being valid.

what do you mean? bytebuffers do perform range checks (actually several).

JohnLussmyer · 2024-03-18T03:13:39Z

int fldcnt = 0x0FF & buf.get(0);
int len = buf.remaining();
log.debug("parseSoftwareIdentification: fldcnt={}, len={}", fldcnt, len);

if (len > 0) {
	byte[] data = new byte[len];
	buf.get(data, 0, len);
}

If you make a mistake, and put a "1" for the middle param of the last line - you will hang the thread.
I've run into this twice now.

pschichtel · 2024-03-18T03:27:51Z

It would really surprise me if that actually hangs the thread, that would be a JVM bug worth reporting upstream. Sure there is no IndexOutOfBoundsException that somehow triggers a loop in your program?

JohnLussmyer · 2024-03-18T03:40:47Z

Hmm, possibly. My usual app framework has an unhandled exception handler.
Need to check if this one does.

splatch · 2024-03-18T06:04:21Z

If you make a mistake, and put a "1" for the middle param of the last line - you will hang the thread.
I've run into this twice now.

Your thread gets blocked until len bytes is available. And, given the specifics of JavaCAN, it never grows beyond 8 bytes or so, so your thread will never get released.

JohnLussmyer · 2024-03-18T13:48:05Z

Note: That I am receiving 56 byte messages just fine.
I also found that the Unhandled exception handler hadn't been added, so that was why the thread "hung" - it had been killed by an exception.

pschichtel · 2024-03-18T14:44:53Z

@splatch Java's ByteBuffer doesn't wait, it's pretty "dumb" and doesn't have any synchronization. If the buffer doesn't have enough bytes remaining to satisfy the request, then it just throws.

Ignore that I'm using scala:

That matches what the javadocs say. I'm wondering though, why didn't you program log the exception if there was no other default exception handler configured?

JohnLussmyer · 2024-03-18T15:00:03Z

Exceptions only get written to the log if you catch them.
Since this one was occurring in a helper thread, it didn't kill or stop the AWT thread.
With no Unhandled Exception Handler - the only result is the helper thread dies.

olerem · 2024-10-13T18:26:22Z

HI, thank you for your work!

I'm working on the kernel documentation for the CAN_J1939 socket,
i do not know about implementation state of this socket in this project, but may be new documentation may help :)

https://lore.kernel.org/all/[email protected]/

pschichtel · 2024-10-13T23:09:45Z

@olerem thanks for the hint! The first iteration of the CAN_J1939 support landed in release 3.4.0, but the additional documentation is very likely useful nonetheless

Initial J1939 changes

f2e8b36

Fixing bind/connect handling

d917d18

Tests working (only a few), added J1939 option support

075f08a

pschichtel merged commit f788145 into pschichtel:master Feb 12, 2024

Initial J1939 changes #57

Initial J1939 changes #57

Conversation

JohnLussmyer commented Feb 11, 2024 • edited by pschichtel Loading

pschichtel commented Feb 11, 2024 • edited Loading

JohnLussmyer commented Feb 11, 2024

JohnLussmyer commented Feb 11, 2024

pschichtel commented Feb 11, 2024

JohnLussmyer commented Feb 11, 2024

pschichtel commented Feb 11, 2024

pschichtel commented Feb 11, 2024

JohnLussmyer commented Feb 11, 2024

JohnLussmyer commented Feb 11, 2024

JohnLussmyer commented Feb 11, 2024

pschichtel commented Feb 11, 2024

JohnLussmyer commented Feb 11, 2024

pschichtel commented Feb 11, 2024

JohnLussmyer commented Feb 11, 2024

pschichtel commented Feb 12, 2024

JohnLussmyer commented Feb 12, 2024

pschichtel commented Feb 12, 2024

JohnLussmyer commented Feb 12, 2024

pschichtel commented Feb 12, 2024

JohnLussmyer commented Feb 13, 2024

JohnLussmyer commented Feb 13, 2024

pschichtel commented Feb 13, 2024

pschichtel commented Feb 13, 2024

JohnLussmyer commented Feb 17, 2024

pschichtel commented Feb 17, 2024

JohnLussmyer commented Feb 17, 2024

JohnLussmyer commented Mar 9, 2024

pschichtel commented Mar 9, 2024 • edited Loading

pschichtel commented Mar 9, 2024

JohnLussmyer commented Mar 9, 2024

pschichtel commented Mar 9, 2024

JohnLussmyer commented Mar 9, 2024

pschichtel commented Mar 9, 2024

pschichtel commented Mar 16, 2024

JohnLussmyer commented Mar 16, 2024

pschichtel commented Mar 16, 2024

JohnLussmyer commented Mar 16, 2024

pschichtel commented Mar 16, 2024

JohnLussmyer commented Mar 16, 2024

pschichtel commented Mar 16, 2024

JohnLussmyer commented Mar 16, 2024

JohnLussmyer commented Mar 17, 2024

pschichtel commented Mar 17, 2024

JohnLussmyer commented Mar 17, 2024

JohnLussmyer commented Mar 17, 2024

JohnLussmyer commented Mar 17, 2024

pschichtel commented Mar 18, 2024

JohnLussmyer commented Mar 18, 2024 • edited by pschichtel Loading

pschichtel commented Mar 18, 2024 • edited Loading

JohnLussmyer commented Mar 18, 2024

splatch commented Mar 18, 2024

JohnLussmyer commented Mar 18, 2024

pschichtel commented Mar 18, 2024

JohnLussmyer commented Mar 18, 2024

olerem commented Oct 13, 2024

pschichtel commented Oct 13, 2024

JohnLussmyer commented Feb 11, 2024 •

edited by pschichtel

Loading

pschichtel commented Feb 11, 2024 •

edited

Loading

pschichtel commented Mar 9, 2024 •

edited

Loading

JohnLussmyer commented Mar 18, 2024 •

edited by pschichtel

Loading

pschichtel commented Mar 18, 2024 •

edited

Loading