You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been using Corb2 successfully to update large numbers of documents. Usually, I declare external variables in the transform.xqy script, and define values for these variables in our options file that I pass to Corb using -DOPTIONS_FILE parameter.
I've now hit a snag: Corb loads the options file using the java.util.Properties.Load(Inputstream) method. Unexpectedly to me, this causes the contents of the properties file to be interpreted as if it has been encoded in ISO 8859-1, even though the system encoding (the LANG environment variable) is en_US.UTF-8. I'll now need to run a repair job to fix the encoding error.
I believe the best way to prevent this issue is to wrap the InputStream parameter in a java.util.InputStreamReader object. That should use the system default charset for the properties file, which is almost always what you want.
I use Java 8 in my production jobs, but Java 21 will (according to its JavaDoc) still use ISO 8859-1 for property files read from an InputStream.
Thank you for reporting the issue. Sorry to hear about the trouble it caused.
It seems that you are right, InputStreamReader would be a better choice than Load - as it at least provides a means of specifying the encoding.
Whether to use the system encoding or not for various files is always tricky. Might look to use system encoding unless an option is specified to set something different (that way you can load UTF-8 options files on a Windows machine with cp1252 as system encoding).
I've been using Corb2 successfully to update large numbers of documents. Usually, I declare external variables in the transform.xqy script, and define values for these variables in our options file that I pass to Corb using -DOPTIONS_FILE parameter.
I've now hit a snag: Corb loads the options file using the java.util.Properties.Load(Inputstream) method. Unexpectedly to me, this causes the contents of the properties file to be interpreted as if it has been encoded in ISO 8859-1, even though the system encoding (the LANG environment variable) is en_US.UTF-8. I'll now need to run a repair job to fix the encoding error.
I believe the best way to prevent this issue is to wrap the InputStream parameter in a java.util.InputStreamReader object. That should use the system default charset for the properties file, which is almost always what you want.
I use Java 8 in my production jobs, but Java 21 will (according to its JavaDoc) still use ISO 8859-1 for property files read from an InputStream.
CoRB version: marklogic-corb-2.5.4
OS: CentOS Linux release 7.6.1810 (Core)
JVM: OpenJDK Runtime Environment (build 1.8.0_191-b12)
Let me know if you'd like me to create a pull request.
The text was updated successfully, but these errors were encountered: