Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to b2mn parsing #23

Merged
merged 6 commits into from
Feb 13, 2024
Merged

Update to b2mn parsing #23

merged 6 commits into from
Feb 13, 2024

Conversation

eldond
Copy link
Collaborator

@eldond eldond commented Feb 2, 2024

@eldond eldond self-assigned this Feb 2, 2024
@eldond
Copy link
Collaborator Author

eldond commented Feb 2, 2024

@sbdepascuale If you check out this branch within SOLPS2IMAS, you can test whether SOLPS2IMAS.read_b2mn_output() now works on your file. This can be done either by importing just SOLPS2IMAS and only calling read_b2mn_output(), or by doing the whole workflow test after updating.

If you have an existing julia session open, you will have to restart julia unless you imported Revise (import Revise) prior to loading SOLPS2IMAS.

@sbdepascuale
Copy link
Collaborator

Can confirm demo.ipynd workflow correctly executes all calls with KSTAR case under b2mn_parse branch of SOLPS2IMAS. Furthermore, interferometer data extraction from "ids.interferometer" functions as expected.
Screenshot 2024-02-02 at 6 04 12 PM
Screenshot 2024-02-02 at 6 04 20 PM

@anchal-physics
Copy link
Collaborator

Few more checks need to be added to conform with all possibilities of the file. I'm adding Jeremy's email instructions here for documentation.

Empty lines are ignored

Should always start with a section like this. Here each variable is optional (Maybe not label?). You might want to grab the label but the rest can be skipped for our purposes (jump to endphy)

label (lblmn: character60)

'F57: D+Ne'

*b2cmpa basic parameters

*b2cmpb boundary conditions

*b2cmpt transport coefficients

*cfsig (0) (1) (2) (3) (4) (5) (6) (7)

'-1' 4.0e-05 0.0e+00 0.0e+00 0.0e+00 0.0e+00 0.0e+00 0.0e+00 0.0e+00

*cfalf (0) (1) (2) (3) (4) (5) (6) (7)

'-1' 4.0e-05 0.0e+00 0.0e+00 0.0e+00 0.0e+00 0.0e+00 0.0e+00 0.0e+00

*cflim (0) (1) (2) (3) (4) (5) (6) (7)

'-1' 3.00E-01 6.00E-01 5.00E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00

*endphy

So this section could also just be

label (lblmn, character60; free format)

'ITER_D+He+Ne_nodrift_FPO_100MW_(target=Be,gaspuff=top)_cNe=0.8%_Dpuff=1.85e23'

*b2cmpa basic parameters

*b2cmpb boundary conditions

*b2cmpt transport coefficients

*endphy

After this there are name-value pairs

Characters #, *, and ! indicate comments. These can be after name value pairs or if at the beginning of the line the whole line is skipped
Name value pairs are matched case insensitive but exact strings. These have at least two valid representations:
    Value and name in single quotes
    'b2mwti_jxa'   '36' # optional comment
    Name in single quotes but value is not
    'b2mwti_jxa'   36 # optional comment
    I think double quotes work here as well. The comment section can also contain quotation marks
Due to exact string matching, be aware of “commented out” variables that could look like this.
    '#b2mwti_jxa'   36 # Variable unused
    '!b2mwti_jxa'   36 # Variable unused
If a name value pair is repeated the last one is used

I'll edit some sample files to incorporate all these test cases and test. I think we just need to add some more checking for special characters and also change all case to lower before adding keys.

@anchal-physics
Copy link
Collaborator

Found another bug in the current implementation. It distinguishes between Float64 and Int data types by checking if there is a '.' in the value field, but many value fields are written in '1e-10' format, so they get parsed as strings. I think, we should not use hidden error catching and instead should have designated code for all cases we can think of so that the parser fails when it is not parsing correctly.

@eldond
Copy link
Collaborator Author

eldond commented Feb 5, 2024

Oh, oops. All the samples I saw were like 1.0e-10. But if this file is human writable (and it is), then I guess 1e-10 could happen, too.

@jlore
Copy link
Collaborator

jlore commented Feb 5, 2024

The type will depend on the original Fortran, you cannot necessarily tell from the format of the value. Same goes for booleans and ints.

Removed:
	deleted:    samples/b2mn.dat.sample_50dn
	deleted:    samples/b2mn.dat.sample_50xd
	deleted:    samples/b2mn.dat.sample_si1
	deleted:    samples/b2mn.dat.sample_si2

Added:
    new file:   samples/b2mn.dat.json.dvc
    new file:   samples/test_b2mn.dat.dvc
    new file:   samples/test_b2mn.dat.json.dvc

Adding a json file of correctly parsed version of b2mn.dat file that is
already in the samples directory. Also added a new test_b2mn.dat file
that is manually edited to cover all edge cases of the parser and a
json file version of the same that has been correctly parsed (checked
manually).
Updated parser so that it reads b2mn.dat file with following rules:
* Strip all leading and trailing white spaces (including tabs) in a line
* Ignore all lines until *enphy is found
* Characters #, *, and ! indicate comments. These can be after name
value pairs or if at the beginning of the line the whole line is skipped
* Name value pairs are matched case insensitive but exact strings.
These have at least two valid representations:
  * Value and name in single quotes
    * 'b2mwti_jxa'   '36' # optional comment
  * Name in single quotes but value is not
    * 'b2mwti_jxa'   36 # optional comment
  * Double quotes work here as well.
  * The comment section can also contain quotation marks
* Due to exact string matching, be aware of “commented out” variables
that could look like this.
  * '#b2mwti_jxa'   36 # Variable unused
  * '!b2mwti_jxa'   36 # Variable unused
* If a name value pair is repeated the last one is used
* Read a value containing '.' or 'e' as a float and the rest as an
integer.
end
# Get key and value in lowercase
key = lowercase(name_value[1])
value = lowercase(name_value[2])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this file have arrays in it? The python version seems to be expecting to get multiple values sometimes, but it also is supposed to work for two of the SOLPS files. Maybe b2mn doesn't have multiple values; maybe that's the other file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. @jlore didn't mention any possibility of arrays in these fields, but we can expand it if that is a possibility.

@anchal-physics
Copy link
Collaborator

@jlore Currently, the code is reading anything with a decimal point or 'e' in it as float and everything else as int. Is there a possibility that this might cause an issue later. Do we have a list of all the fields that fortran keeps as float and all that fortran keeps as int (maybe indices). Or is there a nomenclature that we can use to determine this information. The julia functions are all type specific and using floats and ints interchangeably will cause errors in future. However, right now, solps2imas only uses b2mn.dat to read inner and outter midplane indices defined in 'b2mwti_jxi' and 'b2mwti_jxa'.

@jlore
Copy link
Collaborator

jlore commented Feb 6, 2024

@anchal-physics

There is no requirement in SOLPS that a REAL type be initialized using 'e' or a decimal point, so I don't think you should make that assumption here. Is Julia strictly typed in that you have to know the Fortran type? We can, of course, get the type from the Fortran source but I'm not sure why there is a need.

Also, I don't think it is required to read "all" of the variables from b2mn.dat for our project. Why not just look for the small number of variables we will actually use (for which I can tell you the Fortran type) and ignore the rest. There are probably at least 50 variables that are not present in any of the b2mn.dat examples you are using, and these often change with the code base. Also, SOLPS ignores variable names that it doesn't use, which is important so that an obsolete variable definition does not cause the code to stop.

Suggest that you just make a list of the variables that are actually needed for the project and we make a robust parser that works for these. I can then also provide you the default values (and/or logic for computing the defaults) for cases where they are not present.

@eldond
Copy link
Collaborator Author

eldond commented Feb 6, 2024

I just think it's awkward to parse number of steps (for example) as a float when it's clearly an int. Maybe we can parse all numbers as float64 by default and keep a list of things to force to int?

The python implementation was intended to preserve type more carefully because it was used to read as well as write, and writing 3.5 steps seems like it wouldn't do anything good.

@jlore
Copy link
Collaborator

jlore commented Feb 6, 2024

Agreed. Suggest specific treatment of a limited number of variables. If you have a list of the ones you use then I can confirm the original type.

@anchal-physics
Copy link
Collaborator

From the discussion, we have these two possibilities:

Whitelisting solution

Our code, specifically, SOLPS2IMAS.solps2imas() function, only uses b2mwti_jxa and b2mwti_jxi which I know must be integers as they are indices. I'm not sure if default values can be given for these indices.
But we thought if we limit to only reading these two, the parser will not be useful for further use if someone wants to pick up some field from b2mn while using our code to do something different. That's why we were looking for a more general solution.

Handle types in code where values are used

Since Julia is type specific, we should write Julia code to force cast into integer or float whenever we use the read values. This is the case already in solp2imas() actually https://github.com/ProjectTorreyPines/SOLPS2IMAS.jl/blob/37b04d7ca0eaba0251f3029590152808dbee8d35/src/SOLPS2IMAS.jl#L99
If we use this solution, we just have to be careful when we write code in future using values from b2mn.dat . In this case, we can merge the current parser branch as it is.

Let me know what you all prefer.

@jlore
Copy link
Collaborator

jlore commented Feb 6, 2024

I can certainly provide the logic for default values of jxa and jxi. It depends on the grid topology, but it is straightforward

@eldond
Copy link
Collaborator Author

eldond commented Feb 7, 2024

@jlore Yes, please. It's valuable to have a way to set defaults for anything that's used as much as jxa and jxi. Beyond that, I think the logic should be:
Try to parse as float. Fail? Parse as string and record so value can be inspected. Okay? Check list of things that are numbers but not floats and change type. Otherwise leave as float. We don't have to list all the integers in the file, just the ones that index the mesh and stuff like that where using a float as an index would mess things up. And yes, the client could force our float into an int if they need to downstream from the file parsing, but why not get some of it right if we can?

List of things to force to non-float if possible:

b2mwti_jxa: Int
b2mwti_jxi: Int
b2mndr_ntim: Int

List of things to populate using default values if they're missing:

b2mwti_jxa: <formula needed>
b2mwti_jxi: <formula needed>
b2mndr_ntim: nothing (make it obvious that this is an important value but that we didn't get it)

Created a new file in src/b2mn_int_fields.txt where names of all known
int fields are listed. This file is used to check if a field value
should be converted to integer. Also, if there is a second value listed
in this file in a row, that field is considered to be required
and a default value is used if it is not found in the input file.

Updated sample json files with additional int fields that get added
by default.
@anchal-physics
Copy link
Collaborator

I used the useful comments in b2mn.dat present in ITER_Lore_2296_00000/run_time_dep_EIRENE_jdl_to_ss_cont_sine2_2d_output to create a list of all integer fields and default values wherever they were listed in the comments. Please see b2mn_int_fields.txt to check the list.

@jlore
Copy link
Collaborator

jlore commented Feb 8, 2024

Ah good, I'm glad my comments were useful!

@anchal-physics
Copy link
Collaborator

I realized that since @eldond opened this PR originally, and I pushed commits after that, I can't set him as a reviewer for this PR. Thus, I've set @dautt-silva as a reviewer. Please complete the review and mark approve if it looks good.

@eldond
Copy link
Collaborator Author

eldond commented Feb 13, 2024

Good work.

@eldond eldond merged commit b976a3e into dev Feb 13, 2024
1 check passed
@eldond eldond deleted the b2mn_parse branch February 13, 2024 01:16
@jlore
Copy link
Collaborator

jlore commented Feb 15, 2024

Here is some logic for default values of jxa, jxi.

First you need to identify the topology. Use nncut and nnreg from b2fgmtry
if (Geo.nncut == 1) && (Geo.nnreg(1) == 4) % SN case
Geo.geometry = 'SN';
elseif (Geo.nncut == 2) && (Geo.nnreg(1) == 8) && (Geo.nnreg(2) == 13) % DDN case
Geo.geometry = 'DDN';
elseif (Geo.nncut == 2) && (Geo.nnreg(1) == 8) && (Geo.nnreg(2) == 12) % CDN case
Geo.geometry = 'CDN';
endif

Then you can guess jxa, jxi. These guesses can be pretty far off depending on the poloidal spacing, but these are the code defaults. Use leftcut and rightcut from b2fgmtry.

if strcmp(Geo.geometry,'SN')
Geo.jxa_guess = ceil(Geo.rightcut(1) - (Geo.rightcut(1) - Geo.leftcut(1))/4);
Geo.jxi_guess = floor(Geo.leftcut(1) + (Geo.rightcut(1) - Geo.leftcut(1))/4);
elseif strcmp(Geo.geometry,'DDN') || strcmp(Geo.geometry,'CDN')
Geo.jxa_guess = ceil((Geo.rightcut(1) + Geo.rightcut(2))/2);
Geo.jxi_guess = floor((Geo.leftcut(1) + Geo.leftcut(2))/2);
end

@eldond
Copy link
Collaborator Author

eldond commented Feb 15, 2024

Since this change to b2mn parsing was already merged, I opened #26 to keep track of the proposed method for guessing jxa and jxi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants