-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CRediT roles to JATS #10153
base: main
Are you sure you want to change the base?
Add CRediT roles to JATS #10153
Conversation
You should add a "command test" to test the proposed feature, and to illustrate how it works. See |
Hi @jgm , I think I figured out making a test file. I have a few questions I bet you could answer very quickly but would take me a long time to figure out:
|
Usually we try to name them after the associated issue. So, if there's an open issue that this fixes you could use that number; otherwise you could use the PR's number, 10153.md. That makes it easy to figure out later why the test was added.
Instead of creating a separate module, I'd prefer if you just included this map in the JATS writer module itself. |
@jgm thanks for that feedback, I addressed both of your points. Now, the only thing left is how to make this dictionary (renamed to My first try was |
what happens when you try that? By the way, our usual convention for variable names would favor |
It seems the issue was the definition of
Switching the signature to use
|
I don't understand why you'd be getting this behavior. Nested Maps seem to work fine in ToContext: here I explore using
|
Btw, you can see what instances are defined for ToContext here: As you can see, it's only defined for Map with Text keys. ToContext typeclass is relevant because of the type of defField :: ToContext a b => Text -> b -> Context a -> Context a |
Are you saying that |
Yes! This is what I'm experiencing |
I'm a bit confused because originally you had a nested map and now it seems not to be nested. With a nested map, in which the value of |
I thought that the nested map was a bit over-complicated and unnecessary so I simplified it to a regular map. This means that the
ref: https://github.com/jgm/pandoc/actions/runs/11021333152/job/30608170051?pr=10153#step:9:1926 |
Ah, I see what you are trying to do. I don't think The way to do this is to resolve these names in the writer. Probably the simplest approach would be to use your Map to add a |
I believe that it may be useful to re-iterate this comment regarding this issue. |
Yes, overriding is actually important for people who don't want it to be in English! |
Thanks @jgm and @estedeahora for the feedback. I've opted to make the input YAML format a bit more explicit, so you have to say exactly what the CRediT name is. You also have the option to override, to support the multi-language things. I also improved the docs to give all of the different possibilities and added to the tests one for each. The tests (with narrative documentation) can be found in https://github.com/cthoyt/pandoc/blob/patch-1/test/command/10152.md. |
cc @sneakers-the-rat - your input is also greatly appreciated! |
It seems to me that this project is a significant and good modification for Pandoc. However, I don't seem to modify Pandoc without modifying the code in JATS.hs to generate Personally I think that if this change is not possible, it is better not to generate the change. If someone needs to implement it, they can read these discussions with alternative solutions with Lua filters and template modifications. |
Apologies @estedeahora, but I'm not quite sure what you're asking for. |
I see that in the last commits you opted to generate the changes by incorporating everything manually in the yaml header and then these in the template (leaving aside the modification of JATS.hs). This is a “correct” solution. However, as you pointed out in #10152 it is an inefficient solution because writing three times variations of the same information can generate errors. I think it would be a contribution to generate this in Haskell, but I don't know how to do it yet. On the other hand, I don't think it is advisable to add this functionality to Pandoc if this point is not solved, because it generates too many fields making it not very concise. As an alternative and provisional solution I think that users can use a lua filter and “inject” these fields through a dictionary. Then this can be placed in a custom template. I think our discussion provides elements for others to do this if they need to. |
It should be completely feasible to add the |
I am feeling like figuring out haskell and the specifics of this codebase might be too much of a lift for me at the moment. @jgm if you are willing to give a hand, then I would be very grateful. I had squashed the commits earlier, but just added back the name lookup dictionary in |
Here are some tips. In Text.Pandoc.Writers.Shared you'll find -- | Retrieve a field value from a template context.
getField :: FromContext a b => Text -> Context a -> Maybe b and -- | Reset a field of a template context. If the field already has a
-- value, the new value replaces it.
-- This is a utility function to be used in preparing template contexts.
resetField :: ToContext a b => Text -> b -> Context a -> Context a So the basic template for adding these fields to the addCreditNames :: Context a -> Context a
addCreditNames context =
case getField "roles" context of
Nothing -> context
Just roles -> resetField "roles" (map addCreditName roles) context
addCreditName :: M.Map Text Text -> M.Map Text Text
addCreditName rolemap =
case M.lookup "credit-name" rolemap of
Just _ -> rolemap
Nothing -> maybe id (M.insert "credit-name")
(M.lookup "credit-id" rolemap >>= flip M.lookup creditNames) Then you'd put |
I'm happy to give this a look, since I'm 100% in favour of getting more useful metadata into the journal ecosystem and getting CRediT roles into more JATS seems like a good thing. Probably won't have time to get properly into it until the weekend though! |
Do you mean this section? pandoc/src/Text/Pandoc/Writers/JATS.hs Lines 160 to 166 in 56fc7d5
I tried sticking it in the middle like let context = defField "body" main
$ defField "back" back
$ addCreditNames
$ resetField "title" title'
$ resetField "date" date
$ defField "mathml" (case writerHTMLMathMethod opts of
MathML -> True
_ -> False) metadata but this doesn't appear to work correctly - what's the right way to apply it? |
The thing you said doesn't appear to work should work. Each line in this pipeline has type |
src/Text/Pandoc/Writers/JATS.hs
Outdated
("writing-original-draft", "Writing – original draft"), | ||
("writing-review-editing", "Writing – review & editing")] | ||
|
||
addCreditNames :: Context a -> Context a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type might need to be changed to Context Text -> Context Text
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I think this helped. Now the error is inside the other functions:
src/Text/Pandoc/Writers/JATS.hs:79:8: error: [GHC-39999]
• No instance for ‘Text.DocTemplates.Internal.FromContext
Text (M.Map Text Text)’
arising from a use of ‘getField’
• In the expression: getField "roles" context
In the expression:
case getField "roles" context of
Nothing -> context
Just roles -> resetField "roles" (map addCreditName roles) context
In an equation for ‘addCreditNames’:
addCreditNames context
= case getField "roles" context of
Nothing -> context
Just roles -> resetField "roles" (map addCreditName roles) context
|
79 | case getField "roles" context of
| ^^^^^^^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at https://hackage.haskell.org/package/doctemplates-0.11.0.1/docs/Text-DocTemplates.html
You can see that there is a ToContext instance for ToContext b => Map Text b,
but there isn't a similar FromContext instance! Argh.
Probably I should just add one. Stand by.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well actually we may be able to work around this by operating directly with a Val type instead of a Map.
addCreditName :: Val Text -> Val Text
addCreditName val =
case val of
MapVal ctx ->
case getField "credit-name" ctx of
Just _ -> val
Nothing -> maybe val (MapVal . resetField "credit-name")
(getField "credit-id" ctx >>= flip M.lookup creditNames)
_ -> val
(UNTESTED! But I think something along these lines will work, without the need for changes in doctemplates.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a couple variations on this. The code in #10153 (comment) gave this error:
src/Text/Pandoc/Writers/JATS.hs:95:40: error: [GHC-83865]
• Couldn't match type: Context a0 -> Context a0
with: Context Text
Expected: Text -> Context Text
Actual: Text -> Context a0 -> Context a0
• Probable cause: ‘resetField’ is applied to too few arguments
In the second argument of ‘(.)’, namely ‘resetField "credit-name"’
In the second argument of ‘maybe’, namely
‘(MapVal . resetField "credit-name")’
In the expression:
maybe
val (MapVal . resetField "credit-name")
(getField "credit-id" ctx >>= flip M.lookup creditNames)
|
95 | Nothing -> maybe val (MapVal . resetField "credit-name")
|
Unfortunately, I don't think it's very effective working this way and I'm not sure how to push this forward. If we agree on the base ideas (extending the documented data model, how the templating for JATS changes, and how the tests look) then I would say we should split this PR into two parts - one that introduces those changes, then a second as a follow-up that uses some haskell magic to help when data is incomplete (i.e. the changes we're talking about in this thread)
@jgm do you think that would work? Otherwise I fear that I might not be able to get the job done and am sucking up too much of your open source time, and/or this thread becomes impossible to follow
another idea is we could schedule a pair programming session and try to work through this together, if you have some time in the upcoming weeks. I'm on German time but relatively flexible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, I think we're making progress, but now it seems that the way getField is defined is causing the issue:
src/Text/Pandoc/Writers/JATS.hs:93:12: error: [GHC-39999]
• Ambiguous type variable ‘b0’ arising from a use of ‘getField’
prevents the constraint ‘(Text.DocTemplates.Internal.FromContext
Text b0)’ from being solved.
Probable fix: use a type annotation to specify what ‘b0’ should be.
Potentially matching instances:
instance Text.DocTemplates.Internal.TemplateTarget a =>
Text.DocTemplates.Internal.FromContext a a
-- Defined in ‘Text.DocTemplates.Internal’
instance Text.DocTemplates.Internal.TemplateTarget a =>
Text.DocTemplates.Internal.FromContext a (Doc a)
-- Defined in ‘Text.DocTemplates.Internal’
...plus two others
(use -fprint-potential-instances to see them all)
• In the expression: getField "credit-name" ctx
In the expression:
case getField "credit-name" ctx of
Just _ -> val
Nothing
-> case getField "credit-id" ctx of
Nothing -> val
Just creditId
-> case M.lookup creditId creditNames of
Nothing -> ...
Just creditName -> ...
In a case alternative:
MapVal ctx
-> case getField "credit-name" ctx of
Just _ -> val
Nothing
-> case getField "credit-id" ctx of
Nothing -> val
Just creditId -> ...
|
93 | case getField "credit-name" ctx of
| ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probable fix: use a type annotation to specify what ‘b0’ should be.
This is your hint. getField
is polymorphic, so we may need to explicitly specify the type of the intended result, if it's not pinned down by other uses.
in this case, getField "credit-name" ctx
is ambiguous because we don't do anythnig with the Just _
return value ,so it could be any type. Thus, we need to say what the type is that we expect:
Just (_ :: Text) -> val
if we want to leave it alone if it's a text value, or
Just (_ :: Val Text) -> val
if we want to leave it alone if it's any kind of value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woohoo! It has a few warnings but managed to compile! I added a corresponding test to make sure that it works (i.e., if there's a credit-id
and no credit-name
, add the right credit-name
.) This isn't passing yet, so this needs some more invesigation
Warnings
src/Text/Pandoc/Writers/JATS.hs:91:3: warning: [GHC-62161] [-Wincomplete-patterns]
Pattern match(es) are non-exhaustive
In a case alternative:
Patterns of type ‘Val Text’ not matched:
SimpleVal _
ListVal _
BoolVal _
NullVal
|
91 | case val of
| ^^^^^^^^^^^...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At line 92 you match Just (MapVal _) but you don't match anything else.
You need a case for the catch-all "other" clause: _ -> val
.
See my original example at #10153 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @jgm sorry I had to step away from this because I felt like I had hit a wall, but I realized an issue with the way my filtering was working was it didn't match the data structure (I was looking for a roles list, but the only thing that was available was an authors list, and there was a roles list in each)
I have (I hope) a final issue with the compiler, again related to me not understanding the typing system enough to annotate the cases correctly. Could you please give me another hint?
src/Text/Pandoc/Writers/JATS.hs:79:8: error: [GHC-39999]
• No instance for ‘Text.DocTemplates.Internal.FromContext
Text (Context Text)’
arising from a use of ‘getField’
• In the expression: getField "authors" context
In the expression:
case getField "authors" context of
Nothing -> error "No Authors!"
Just authors
-> resetField
"authors" (map addCreditNamesToAuthor authors) context
In an equation for ‘addCreditNames’:
addCreditNames context
= case getField "authors" context of
Nothing -> error "No Authors!"
Just authors
-> resetField
"authors" (map addCreditNamesToAuthor authors) context
|
79 | case getField "authors" context of
| ^^
Closes #10152
This PR adds support for annotating author roles using the Contribution Role Taxonomy (CRediT).
I'm motivated to add this to Pandoc since I want the Journal of Open Source Software (JOSS), which is built on top of Pandoc, to be able to create compliant JATS. We're already adding support for encoding this information in article metadata in parallel in openjournals/inara#75.
On my first attempts to implement this, I tried to inject some clever logic inside the Haskell code to automatically look up the primary labels, but this had the two issues of being programatically complicated, and too rigid towards internationalization. Now, you can specify the roles in the metadata like this:
The tests (with narrative documentation) can be found in https://github.com/cthoyt/pandoc/blob/patch-1/test/command/10152.md. Run these tests specifically with
cabal test --test-options="-p 10152.md"