Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial J1939 changes #57

Merged
merged 3 commits into from
Feb 12, 2024
Merged

Initial J1939 changes #57

merged 3 commits into from
Feb 12, 2024

Conversation

JohnLussmyer
Copy link
Contributor

@JohnLussmyer JohnLussmyer commented Feb 11, 2024

Initial set of changes, test failing

see #54

@pschichtel
Copy link
Owner

pschichtel commented Feb 11, 2024

If I understand the can-utils code below correctly, then you can only write to sockets after connect() (so you have a known destination address for write()) and and you can only read from sockets after bind() (so you have a known receive address for read())

(can-utils is mostly using send, sendmsg, recv and recvmsg because they also process message attribute, they are equivalent to read and write for JavaCAN's purposes)

@JohnLussmyer
Copy link
Contributor Author

Ahh ok, I hadn't quite made sense of the J1939 docs yet.
I'll try setting up tests with addresses.

@JohnLussmyer
Copy link
Contributor Author

Well, I'm getting a bit closer. Now it fails during write with a:

tel.schich.javacan.platform.linux.LinuxNativeOperationException: Unable to get FD frame support - errorNumber=22, errorMessage='Invalid argument'

@pschichtel
Copy link
Owner

that exception does not happen during write, but during the setOption call of the FD_FRAMES option. not sure why it would be invalid, I guess J1939 might not support FD frames?

@JohnLussmyer
Copy link
Contributor Author

It's happening on the test code line 202 - which is commented out.
Looking at what was being called, it's failing on
final CanFrame output = b.read();

@pschichtel
Copy link
Owner

your read() method does call getOption(FD_FRAMES) to allocate a correctly sized buffer, which I assume you copied over from the raw channel. When J1939 doesn't support FD, you can just hardcode MTU (the else-value in the ternary).

@pschichtel
Copy link
Owner

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/can/j1939/socket.c#n681

there you can see the supported socket options for J1939 sockets.

@JohnLussmyer
Copy link
Contributor Author

Yeah, lots of copied code.
Your do realize that i'm new to the whole canbus programming thing? :-)
I'm completely un-surprised at missing odd interactions and restrictions.
I'll make some changes and try again!

@JohnLussmyer
Copy link
Contributor Author

Whee!!! that test passed!

@JohnLussmyer
Copy link
Contributor Author

In my hunt for actual documentation on the options, I did find this:
https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/
Sure would be nice to find whatever doc it's a patch for...

@pschichtel
Copy link
Owner

@JohnLussmyer
Copy link
Contributor Author

Not really, that's pretty much the same as a couple of the other pages I've found, where it mentions the options, but gives no description of what they do, or what values are valid.
The patch link I gave has descriptions and values.

@pschichtel
Copy link
Owner

What are you looking for? Except for SO_J1939_ERRQUEUE the J1939 specific options seem to all be mentioned and explained. I think parts of the options you found in that patch never materialized upstream. From scrolling through it I guess some of the options are covered by existing generic CAN options

@JohnLussmyer
Copy link
Contributor Author

I'm used to full docs on an API.
The link you provided mentions SO_J1939_PROMISC and SO_J1939_SEND_PRIO.
No specifics on what the values should be, almost no description of what they do.
Admittedly, this seems to be the normal case for Unix/Linux API's, where it's expected that you go look at the source code to figure out the details.
It's not what I consider a good API doc.

@pschichtel
Copy link
Owner

Their domain is actually documented, though not as explicitly as you'd hope. I think the "issue" is that they assume knowledge of the specs which are sadly not as accessible as I would prefer them to be. I learned that when I tried implementing ISO-TP in user space.

If you want to improve on this, the can subsystem mailing list is pretty responsive.

Given you are at the point that this is working for an initial set of tests, I'm thinking about merging this early and cleaning it up a bit on master. What do you think?

@JohnLussmyer
Copy link
Contributor Author

Sure , I take it need to do another pull request?

@pschichtel
Copy link
Owner

I would merge it and refine it on master to be a little more in line with the rest of the codebase. Is there any major functionality left you'd like to add?

@JohnLussmyer
Copy link
Contributor Author

Not that I can think of right now.
Once I start using it, maybe...

@pschichtel pschichtel merged commit f788145 into pschichtel:master Feb 12, 2024
@pschichtel
Copy link
Owner

I've locally reworked your commits to remove noisy whitespace changes, I'll also do some other cleanups

@JohnLussmyer
Copy link
Contributor Author

Not surprised at the whitespace changes. I'm so used to letting eclipse clean up the little things...

@JohnLussmyer
Copy link
Contributor Author

I've setup a tiny test app, and getting a weird error:

john@raspberrypi4:~/PiCanTest $ java -cp PiCanTest_lib -jar PiCanTest.jar
Exception in thread "main" java.lang.IllegalArgumentException: length must be either MTU or FD_MTU, but was 8!
at tel.schich.javacan.CanFrame.createUnsafe(CanFrame.java:458)
at tel.schich.javacan.CanFrame.create(CanFrame.java:431)
at tel.schich.javacan.J1939CanChannelImpl.read(J1939CanChannelImpl.java:101)
at tel.schich.javacan.J1939CanChannelImpl.read(J1939CanChannelImpl.java:95)
at com.casadelgato.cantest.PiCANTest.main(PiCANTest.java:32)

I've even tried forcing it to 16, like this:

public CanFrame read() throws IOException {
	int length = 16; //MTU;

Same error.

@pschichtel
Copy link
Owner

have you checked how many bytes have actually been read? Also: you shouldn't be reading CAN frames from the J1939 socket. the read() syscall reads the data bytes only. I've already deleted the CanFrame read() methods on master as they don't make sense. CanFrame is meant for raw frames (I know, the name doesn't really reflect that.)

@pschichtel
Copy link
Owner

I think we might have to add send* and recv* style APIs as J1939 doesn't seem to be useful without metadata in certain situations. ISO-TP is only point-to-point (with a tiny exception), so that got away with just read and write, but J1939 supports multicasts more similar to raw.

@JohnLussmyer
Copy link
Contributor Author

So, it looks like I need to add the Send and Recv calls?
Ok, will do that today.
(As you might have guessed, I don't get much free time during the work week...)

@pschichtel
Copy link
Owner

I was actually about to start on it

@JohnLussmyer
Copy link
Contributor Author

If you want to, that would be great! You are likely far more experienced with these updates, and won't mess up the formatting like I do.

@JohnLussmyer
Copy link
Contributor Author

Oh, and yes it's my fault for not properly reading the javadocs.
My only excuse is that my dev environment right now is spread of 3 computers, and I didn't HAVE the javadocs on the one that needed them.
As far as I can tell, JavaCAN is working just fine.

@pschichtel
Copy link
Owner

pschichtel commented Mar 9, 2024

ok, now I realize that the send method is returning the buffer size, not a linux return code.
Not used to that style.

I think it's pretty common for read/write operations to return the bytes read/written. Java's various stream and channel APIs do the same, also all the posix IO functions.

Oh, and yes it's my fault for not properly reading the javadocs.

Don't worry

As far as I can tell, JavaCAN is working just fine.

great.

@pschichtel
Copy link
Owner

I considered adding some function to parse the PGN fields, would that be of interest to you?

@JohnLussmyer
Copy link
Contributor Author

I don't think so. When I send, the code is simple. The PGN is a constant 0x00xxxx. (I have it in an enum)
J1939Address addr2 = new ImmutableJ1939Address(networkDevice, 0L, EPGN.DNC_Setpoint.getPGN(), bcvAddr);
When I receive, I just se the object methods, and don't need any finer grain parsing.
log.debug( "handleMessage: {}",
String.format( "name=%016X, PGN=%08X, from=%02X, data=%s",
hdr.getSourceAddress().getName(),
hdr.getSourceAddress().getParameterGroupNumber(),
hdr.getSourceAddress().getAddress(),
CanUtils.hexDump(buf)));

What it's looking like I DO need, is some way to write a log file in the Canoe .asc file format.

@pschichtel
Copy link
Owner

I've just split out the tools package into a separate module: https://github.com/pschichtel/JavaCAN/tree/master/tools/src/main/java/tel/schich/javacan/tools

feel free to add additional tools there.

@JohnLussmyer
Copy link
Contributor Author

I am running into one issue - that I'm not sure WHO is causing it.
After working with my device for a while - all messages just stop.
When I send a message, i don't receive it. (normally I do)

Resetting the equipment doesn't help.
The only thing that fixes it is shutting down my applications, and re-starting it.

Hmm, if I do a JavaCan.allocateOrdered(), do I need to release it?
Same with the new J1939ReceiveMessageHeaderBuffer().

I just found a place in my code that assumes they get garbage collected like any other Java object.

@pschichtel
Copy link
Owner

ByteBuffers will be GCed, yes, even direct buffers. The APIs auf JavaCAN are also all designed to be able to re-use buffers across operations. I'd recommend to re-use buffers as much as possible instead of allocating for each operation.

@pschichtel
Copy link
Owner

@JohnLussmyer I just remembered that I have the TROUBLESHOOTING.md that documents some pitfalls and how to detect/resolve them. It doesn't currently have anything on J1939, maybe some of your experiences are worth documenting there?

@JohnLussmyer
Copy link
Contributor Author

So far the only real problem I had was using ByteBuffer.get(byte[], off, len)
I had my offset 1 too large, and that would lockup the thread it was on.
As far as I can tell, it was overwriting something in memory.

I'm currently running into an issue where after running for 10 minutes or so, it just stops receiving any messages.
The send thread doesn't seem to be locked up, as the logging is showing that my sends are returning, just my receive thread stops receiving anything at all. The receive thread hasn't exited, as I have logging that would indicate that.
If I shutdown my app, and restart it - things start working again.

@pschichtel
Copy link
Owner

can you attach strace to the program when it is stuck ? also a thread dump might be interesting.

@JohnLussmyer
Copy link
Contributor Author

Note that I am primarily (nowadays) a Java programmer. It's been many years since I had to debug native code.
If you can provide instructions on what I can do to help....
Note that this is being run on a Raspberry Pi 4, using java 17.

@pschichtel
Copy link
Owner

Simply install strace and run strace -p <pid of java process> when it's stuck. it should log any syscalls being invoked.

For the thread dump you can just use the jstack tool from the jdk: jstack -F -l -m <pid of java process>

The outputs might be very long, so paste them into github gist or pastebin or so.

@JohnLussmyer
Copy link
Contributor Author

Well, it's obvious that I need to know more about those tools.
strace gave very little output - and redirecting the console to a file resulted in an empty file.
jstack failed, and suggested I run jhsdb in front of it. I then had to play around to get a command line it would accept.
The only thing it reported was this:

Attaching to process ID 2389, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 17.0.10+7-Debian-1deb12u1
Deadlock Detection:

@pschichtel
Copy link
Owner

strace gave very little output

can you provide what ever little output it produced?

@JohnLussmyer
Copy link
Contributor Author

I'll try again, and copy/paste the console since redirection failed. (and I didn't notice until I got back to the desk.)

@JohnLussmyer
Copy link
Contributor Author

probably tomorrow. Monitor out in the shop just failed, so I have to find another that will work with that unit.

@pschichtel
Copy link
Owner

get a pikvm if you have power and wifi or lan available: https://pikvm.org/

@JohnLussmyer
Copy link
Contributor Author

I still need a real monitor out in the shop. It's 200' from my desk, and I have to be at the equipment to work with it. (power switches, etc...)

@JohnLussmyer
Copy link
Contributor Author

I'll probably drag a spare laptop out there and SSh into the Pi.

@JohnLussmyer
Copy link
Contributor Author

Ok, found the lockup issue.
My code received a msg that I hadn't received before - and once again I had a 1 byte offset error in the byteBuffer.get(byte[]) call.
I'm REALLY not used to java code that does NOT check array indices for being valid.
(and it didn't help that the docs I have on this message are wrong.)

@pschichtel
Copy link
Owner

I'm REALLY not used to java code that does NOT check array indices for being valid.

what do you mean? bytebuffers do perform range checks (actually several).

@JohnLussmyer
Copy link
Contributor Author

JohnLussmyer commented Mar 18, 2024

int fldcnt = 0x0FF & buf.get(0);
int len = buf.remaining();
log.debug("parseSoftwareIdentification: fldcnt={}, len={}", fldcnt, len);

if (len > 0) {
	byte[] data = new byte[len];
	buf.get(data, 0, len);
}

If you make a mistake, and put a "1" for the middle param of the last line - you will hang the thread.
I've run into this twice now.

@pschichtel
Copy link
Owner

pschichtel commented Mar 18, 2024

It would really surprise me if that actually hangs the thread, that would be a JVM bug worth reporting upstream. Sure there is no IndexOutOfBoundsException that somehow triggers a loop in your program?

@JohnLussmyer
Copy link
Contributor Author

Hmm, possibly. My usual app framework has an unhandled exception handler.
Need to check if this one does.

@splatch
Copy link
Contributor

splatch commented Mar 18, 2024

If you make a mistake, and put a "1" for the middle param of the last line - you will hang the thread.
I've run into this twice now.

Your thread gets blocked until len bytes is available. And, given the specifics of JavaCAN, it never grows beyond 8 bytes or so, so your thread will never get released.

@JohnLussmyer
Copy link
Contributor Author

Note: That I am receiving 56 byte messages just fine.
I also found that the Unhandled exception handler hadn't been added, so that was why the thread "hung" - it had been killed by an exception.

@pschichtel
Copy link
Owner

@splatch Java's ByteBuffer doesn't wait, it's pretty "dumb" and doesn't have any synchronization. If the buffer doesn't have enough bytes remaining to satisfy the request, then it just throws.

Ignore that I'm using scala:

image

That matches what the javadocs say. I'm wondering though, why didn't you program log the exception if there was no other default exception handler configured?

@JohnLussmyer
Copy link
Contributor Author

Exceptions only get written to the log if you catch them.
Since this one was occurring in a helper thread, it didn't kill or stop the AWT thread.
With no Unhandled Exception Handler - the only result is the helper thread dies.

@olerem
Copy link

olerem commented Oct 13, 2024

HI, thank you for your work!

I'm working on the kernel documentation for the CAN_J1939 socket,
i do not know about implementation state of this socket in this project, but may be new documentation may help :)

https://lore.kernel.org/all/[email protected]/

@pschichtel
Copy link
Owner

@olerem thanks for the hint! The first iteration of the CAN_J1939 support landed in release 3.4.0, but the additional documentation is very likely useful nonetheless

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants