Problem running PH for a model built in PySP #376

abodh · 2024-03-31T00:28:16Z

abodh
Mar 31, 2024

Hello,

I am shifting to mpisppy and have been an active PySP user. I understand the basic concepts of running a model with or without cylinders and have successfully run the hydro_cylinders_pysp.py.

I followed examples/farmer/from_pysp/concrete_ampl.py to understand the problem-building mechanism for PH and examples/hydro/hydro_ef.py for the EF. My problem runs well with PySP alone in the extensive form and PH(I ran this for a few iterations to check). However, I am getting the following error on PH for mpisppy. I was using Pyomo 6.7 for mpisppy and downgraded it to Pyomo 6.4, but the problem persists.

planningmodel_ph.py (almost identical to farmer example):

# scenario is created using scenario creator and networkx in PySP
distplanning = PySPModel(model="planningmodel.py", scenario_tree="planningmodel.py")

phoptions = {'defaultPHrho': 1.0,
             'solver_name':'gurobi',
             'PHIterLimit': 50,
             'convthresh': 0.01,
             'verbose': True,
             'display_progress': True,
             'display_timing': True,
             'iter0_solver_options': None,
             'iterk_solver_options': None
             }
             
ph = PH( options = phoptions,
         all_scenario_names = distplanning.all_scenario_names,
         scenario_creator = distplanning.scenario_creator,
         scenario_denouement = distplanning.scenario_denouement
        )

ph.ph_main()

if ph.tree_solution_available:
    print(f"Final objective from XhatClosest: {ph.extobject._final_xhat_closest_obj}")

Run the model in the terminal

mpiexec --np 5 python planningmodel_ph.py

Error:
 File "/home/abodh.poudyal/.conda/envs/mpisppy_new/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
ERROR: evaluating object as numeric value: Z[0]
        (object: <class 'pyomo.core.base.var._GeneralVarData'>)
    No value for uninitialized NumericValue object Z[0]

Traceback (most recent call last):
  File "/home/abodh.poudyal/mpi_install/mpi-sppy/mpisppy/spopt.py", line 77, in _check_staleness
    float(pyo.value(v))
  File "pyomo/core/expr/numvalue.pyx", line 156, in pyomo.core.expr.numvalue.value
  File "pyomo/core/expr/numvalue.pyx", line 143, in pyomo.core.expr.numvalue.value
ValueError: No value for uninitialized NumericValue object Z[0]

  File "/home/abodh.poudyal/mpi_install/mpi-sppy/mpisppy/spopt.py", line 79, in _check_staleness
    raise RuntimeError(
RuntimeError: Non-anticipative variable Z[0] on scenario Scenario37 reported as stale. This usually means this variable did not appear in any (active) components, and hence was not communicated to the subproblem solver.

I wanted to check if the same problem persists with a single process i.e. --np 1. But it is extremely slow and does not proceed forward.

Check for EF

For EF, the problem solves for --np > 1 but the solution is very off from the one I got from PySP. I assume we are not supposed to run in multiple processes unless we are using cylinders which is why the results do not make sense (I suppose). With --np 1 it is extremely slow while PySP solves the EF in less than 5 mins.

At this point, I am only concerned that the PH version is throwing in some errors. Is there any error in how I formed the model (or options) for PH?

DLWoodruff · 2024-03-31T20:31:52Z

DLWoodruff
Mar 31, 2024
Maintainer

Here is the key hint: RuntimeError: Non-anticipative variable Z[0] on scenario Scenario37 reported as stale. This usually means this variable did not appear in any (active) components, and hence was not communicated to the subproblem solver.

…

On Sat, Mar 30, 2024 at 5:28 PM Abodh Poudyal ***@***.***> wrote: Hello, I am shifting to mpisppy and have been an active PySP user. Although I understand the basic concepts of running a model with or without cylinders and have successfully run the hydro_cylinders_pysp.py. I followed examples/farmer/from_pysp/concrete_ampl.py to understand the problem-building mechanism for PH and examples/hydro/hydro_ef.py for the EF. My problem runs well with PySP alone in the extensive form and PH(I ran this for a few iterations to check). However, I am getting the following error on PH for mpisppy. I was using Pyomo 6.7 for mpisppy and downgraded it to Pyomo 6.4, but the problem persists. planningmodel_ph.py (almost identical to farmer example): # scenario is created using scenario creator and networkx in PySP distplanning = PySPModel(model="planningmodel.py", scenario_tree="planningmodel.py") phoptions = {'defaultPHrho': 1.0, 'solver_name':'gurobi', 'PHIterLimit': 50, 'convthresh': 0.01, 'verbose': True, 'display_progress': True, 'display_timing': True, 'iter0_solver_options': None, 'iterk_solver_options': None } ph = PH( options = phoptions, all_scenario_names = distplanning.all_scenario_names, scenario_creator = distplanning.scenario_creator, scenario_denouement = distplanning.scenario_denouement ) ph.ph_main() if ph.tree_solution_available: print(f"Final objective from XhatClosest: {ph.extobject._final_xhat_closest_obj}") Run the model in the terminal mpiexec --np 5 python planningmodel_ph.py Error: File "/home/abodh.poudyal/.conda/envs/mpisppy_new/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction return ufunc.reduce(obj, axis, dtype, out, **passkwargs) ValueError: zero-size array to reduction operation minimum which has no identity ERROR: evaluating object as numeric value: Z[0] (object: <class 'pyomo.core.base.var._GeneralVarData'>) No value for uninitialized NumericValue object Z[0] Traceback (most recent call last): File "/home/abodh.poudyal/mpi_install/mpi-sppy/mpisppy/spopt.py", line 77, in _check_staleness float(pyo.value(v)) File "pyomo/core/expr/numvalue.pyx", line 156, in pyomo.core.expr.numvalue.value File "pyomo/core/expr/numvalue.pyx", line 143, in pyomo.core.expr.numvalue.value ValueError: No value for uninitialized NumericValue object Z[0] File "/home/abodh.poudyal/mpi_install/mpi-sppy/mpisppy/spopt.py", line 79, in _check_staleness raise RuntimeError( RuntimeError: Non-anticipative variable Z[0] on scenario Scenario37 reported as stale. This usually means this variable did not appear in any (active) components, and hence was not communicated to the subproblem solver. I wanted to check if the same problem persists with a single process i.e. --np 1. But it is extremely slow and does not proceed forward. Check for EF For EF, the problem solves for --np > 1 but the solution is very off from the one I got from PySP. I assume we are not supposed to run in multiple processes unless we are using cylinders which is why the results do not make sense (I suppose). With --np 1 it is extremely slow while PySP solves the EF in less than 5 mins. At this point, I am only concerned that the PH version is throwing in some errors. Is there any error in how I formed the model (or options) for PH? — Reply to this email directly, view it on GitHub <#376>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AD4VTBBHSVXPF5DZLC4NJWLY25KDLAVCNFSM6AAAAABFP6GFDOVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWGQ2DAMZUGA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

2 replies

abodh Mar 31, 2024
Author

Thanks, David, for your response. I still did not get it. The model runs fine when using standalone PySP.

Although the problem has N nodes, Z is only defined for a few candidate nodes. Is that the issue?

Update (edited):
Z is a linearizing variable and the product of two variables, x(binary) and y(continuous), i.e., Z = x*y, and both x and y are non-anticipative first-stage decisions. I kept all Z, x, and y as first-stage decisions. Upon checking the closed issue#170, there seems to be a problem with the derived variable. I tried keeping Z as the first stage and moved x and y to the second stage, but the problem still persisted.

Instead I introduced constraints to set bounds on each of the first stage variables as suggested in the issue. I thought defining bound=(lb,ub) in each variables should have done the job but this did not work and I had to explicitly write constraints such as: for each node n: lb<= Z[n] <= ub and now the issue is gone even if I include or exclude Z in the list of first stage vars.

However I get a new numpy issue now:

  File "/home/abodh.poudyal/mpi_install/mpi-sppy/mpisppy/spopt.py", line 864, in _create_solvers
    (np.min(asit), np.mean(asit), np.max(asit)))
  File "/home/abodh.poudyal/.conda/envs/mpisppy_new/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 2953, in min
    return _wrapreduction(a, np.minimum, 'min', axis, None, out,
  File "/home/abodh.poudyal/.conda/envs/mpisppy_new/lib/python3.10/site-packages/numpy/core/fromnumeric.py", line 88, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity

abodh Apr 1, 2024
Author

After a few hours of workaround, I was able to make this work by only defining the variables for candidate nodes. Now, I no longer have the stale error. However, the report timing error still persists and seems to be an error due to an empty asit

I was able to create the cylinder for the model. I ran the following in the terminal based on 'examples/hydro/hydro_cylinders_pysp.py`

mpiexec -np 3 python -m mpi4py planningmodel_cylinder.py --lagrangian --xhatshuffle --bundles_per_rank=0 --max-iterations=50 --default-rho=1 --solver-name="gurobi"

but this throws in an error to pass defaultPHrho and solver name. Since I did not use options dictionary and passed these arguments through the terminal as seen above, what is triggering the error?

Here is the code snippet (as per hydro example):

  cfg = _parse_args() 

  xhatshuffle = cfg.xhatshuffle
  lagrangian = cfg.lagrangian
  fwph = cfg.fwph

  distplanning = PySPModel("planningmodel.py", "planningmodel.py")
  rho_setter = None
  
  # Things needed for vanilla cylinders
  beans = (cfg, distplanning.scenario_creator, distplanning.scenario_denouement, distplanning.all_scenario_names)
  
  # Vanilla PH hub
  hub_dict = vanilla.ph_hub(*beans,
                            ph_extensions=None,
                            rho_setter = rho_setter,
                            all_nodenames=distplanning.all_nodenames)

  # Standard Lagrangian bound spoke
  if lagrangian:
      lagrangian_spoke = vanilla.lagrangian_spoke(*beans,
                                            rho_setter = rho_setter,
                                            all_nodenames=distplanning.all_nodenames)

  if xhatshuffle:
      xhatshuffle_spoke = vanilla.xhatshuffle_spoke(*beans,
                                                    distplanning.all_nodenames)

  list_of_spoke_dict = list()
  if lagrangian:
      list_of_spoke_dict.append(lagrangian_spoke)
  if xhatshuffle:
      list_of_spoke_dict.append(xhatshuffle_spoke)

  wheel = WheelSpinner(hub_dict, list_of_spoke_dict)
  wheel.spin()

  if wheel.global_rank == 0:  # we are the reporting hub rank
      print(f"BestInnerBound={wheel.BestInnerBound} and BestOuterBound={wheel.BestOuterBound}")
  
  if write_solution:
      wheel.write_tree_solution('hydro_full_solution')
      wheel.write_first_stage_solution('hydro_full_solution/hydro_first_stage.csv')

  hydro.close()

DLWoodruff · 2024-04-01T16:49:11Z

DLWoodruff
Apr 1, 2024
Maintainer

Can you show the full stack trace of the error?

…

On Mon, Apr 1, 2024 at 12:21 AM Abodh Poudyal ***@***.***> wrote: After a few hours of workaround, I was able to make this work by only defining the variables for candidate nodes. Now, I no longer have the stale error. However, the report timing error still persists and seems to be an error due to an empty asit I was able to create the cylinder for the model. I ran the following in the terminal based on 'examples/hydro/hydro_cylinders_pysp.py` mpiexec -np 3 python -m mpi4py planningmodel_cylinder.py --lagrangian --xhatshuffle --bundles_per_rank=0 --max-iterations=50 --default-rho=1 --solver-name="gurobi" but this throws in an error to pass defaultPHrho and solver name. Since I did not use options dictionary and passed these arguments through the terminal as seen above, what is triggering the error? Here is the code snippet (as per hydro example): cfg = _parse_args() xhatshuffle = cfg.xhatshuffle lagrangian = cfg.lagrangian fwph = cfg.fwph distplanning = PySPModel("planningmodel.py", "planningmodel.py") rho_setter = None # Things needed for vanilla cylinders beans = (cfg, distplanning.scenario_creator, distplanning.scenario_denouement, distplanning.all_scenario_names) # Vanilla PH hub hub_dict = vanilla.ph_hub(*beans, ph_extensions=None, rho_setter = rho_setter, all_nodenames=distplanning.all_nodenames) # Standard Lagrangian bound spoke if lagrangian: lagrangian_spoke = vanilla.lagrangian_spoke(*beans, rho_setter = rho_setter, all_nodenames=distplanning.all_nodenames) if xhatshuffle: xhatshuffle_spoke = vanilla.xhatshuffle_spoke(*beans, distplanning.all_nodenames) list_of_spoke_dict = list() if lagrangian: list_of_spoke_dict.append(lagrangian_spoke) if xhatshuffle: list_of_spoke_dict.append(xhatshuffle_spoke) wheel = WheelSpinner(hub_dict, list_of_spoke_dict) wheel.spin() if wheel.global_rank == 0: # we are the reporting hub rank print(f"BestInnerBound={wheel.BestInnerBound} and BestOuterBound={wheel.BestOuterBound}") if write_solution: wheel.write_tree_solution('hydro_full_solution') wheel.write_first_stage_solution('hydro_full_solution/hydro_first_stage.csv') hydro.close() — Reply to this email directly, view it on GitHub <#376 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AD4VTBGGRNNOOBU5PQPW3JDY3EDJHAVCNFSM6AAAAABFP6GFDOVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DSNZQGE2DO> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

abodh · 2024-04-05T07:50:12Z

abodh
Apr 5, 2024
Author

Hi David,

Apologies for a late response. Surprisingly, the error is gone and did not appear again. I am not sure what was causing the issue but the error was on not providing the defaultPH and max-iteration which is true if I had used options dict() but I created beans for hub and passed the arguments in ths terminal so I am not sure what triggerred the error of defaultPH not being passed.

I am running in to another issue (I keep on getting errors one after the other). This issue however is not consistent and appears once in a while (may be 1 out of 5-7 runs):

A system call failed during shared memory initialization that should not have.

Q.1. This seems more of an openmpi or mpi issue. May be you have some information on this? This might create problems for me later when I spawn multiple jobs using slurm and some of them might show this error.

Q.2. Nevertheless. I am able to run some experiments using mpi cylinders now. Does mpisppy work with multiple nodes? The hpc at my universiry has a maximum of 70 cores in a node and I wanted to leverage more cores for higher number of scenarios using multiple nodes. Is that possible?

2 replies

bknueven Apr 5, 2024
Maintainer

This is an MPI error. Which MPI implementation are you using? The developers of mpi-sppy have had the most success with MPICH. Be sure you have mpi4py compiled against the system MPI -- you can usually achieve this by pip install mpi4py. Installing from anaconda will bring in anaconda's MPI, which usually does not perform well on an HPC system.

To make sure you're step up to run across multiple nodes, please have a look at this documentation starting a "Verifying you installation": https://github.com/NREL/HPC/tree/master/languages/python/pyomo/mpi-sppy#verifying-your-installation. Notice you'll need to change the slurm script for your HPC's environment. But regardless you'll need to set the environment variable export MPICH_ASYNC_PROGRESS=1 for MPICH.

We have successfully run mpi-sppy across hundreds of nodes on at least three different HPC platforms.

abodh Apr 5, 2024
Author

Looks like I am hitting all the red flags here.

I was interchangeably using openmpi via gcc or intel MPI (I don't know if the latter threw an error too), and as you found out right, I did conda install mpi4py instead of pip install mpi4py. So I definitely need to fix these. I figured that we have mpich/3.3 module available so I will try this with the export command you gave.

Thank you for directing me to the slurm script for multi-node operation. I will fix the above error and try the multi-node example sometime next week and provide an update on how it went.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem running PH for a model built in PySP #376

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Problem running PH for a model built in PySP #376

abodh Mar 31, 2024

planningmodel_ph.py (almost identical to farmer example):

Run the model in the terminal

Check for EF

Replies: 3 comments · 4 replies

DLWoodruff Mar 31, 2024 Maintainer

abodh Mar 31, 2024 Author

abodh Apr 1, 2024 Author

DLWoodruff Apr 1, 2024 Maintainer

abodh Apr 5, 2024 Author

bknueven Apr 5, 2024 Maintainer

abodh Apr 5, 2024 Author

abodh
Mar 31, 2024

Replies: 3 comments 4 replies

DLWoodruff
Mar 31, 2024
Maintainer

abodh Mar 31, 2024
Author

abodh Apr 1, 2024
Author

DLWoodruff
Apr 1, 2024
Maintainer

abodh
Apr 5, 2024
Author

bknueven Apr 5, 2024
Maintainer

abodh Apr 5, 2024
Author