Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a class to represent the answer file #337

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ehelms
Copy link
Member

@ehelms ehelms commented Jul 19, 2021

Refactors code into an AnswerFile class for handling the loading of
and details of the parts of the answer file. This abstraction will
allow easier introduction of newer versions of the answer file.

README.md Show resolved Hide resolved
lib/kafo/answer_file.rb Outdated Show resolved Hide resolved
lib/kafo/answer_file.rb Outdated Show resolved Hide resolved
lib/kafo/answer_file.rb Outdated Show resolved Hide resolved
@ehelms ehelms force-pushed the refactor-answer-file branch 2 times, most recently from 8aca687 to cbd0679 Compare July 21, 2021 01:39
String $puppet_class => Hash[
String $parameter => Enum[true, false, Hash[String, Variant[String, Boolean, Integer, Array, Hash]]]
]
]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this makes sense in this format. Or here. What I wanted to do was document the current schema to allow then to document a Version 2 schema. I considered doing this in a comment in the file itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts about using https://yardoc.org/ in the actual classes as well?

Also, it's not really valid. You can't use names and Enum only describes strings. really it's:

Hash[String, Variant[Boolean, Hash[String, Any]]]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think to use yarddoc? Just for documenting the method calls or somehow for this schema?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to model the structure off of https://github.com/theforeman/puppet-pulpcore/blob/master/templates/apache-fragment.epp#L2

That's a struct and that works different. With that you describe what's essentially a hash, but with explicit keys. See https://puppet.com/docs/puppet/6/lang_data_abstract.html#lang_data_abstract_flexible-struct-data-type

You could certainly use a Struct to describe the scenario file (where we have a limited number of fields, each with their own rules, but not the answers file.

How do you think to use yarddoc? Just for documenting the method calls or somehow for this schema?

I'd consider both. In the class documentation you can give describe the structure and give examples while also documenting the individual methods. That said, a short bit in README about the answers file can make sense. I guess it's a bit of both.

end

def parameters_for_class(puppet_class)
if @version == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is the right way. We can use inheritance so perhaps it's better to introduce Kafo::AnswerFile::V1 and Kafo::AnswerFile::V2 as concrete implementations. Then on the module you would have a Kafo::AnswerFile.load_answers(filename, version)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, thats a fair idea. I was not a big fan of the if-else, but it was the crude way my brain got around the problem. I am bouncing back and forth between this and a yet unopened commit to try to balance the setup for V2 (ehelms@2b96d65).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before a v2 we also need to introduce a migration DSL because now that also operates directly on the answers. Some things I thought about:

# Ensure a module is present in the answers
def add_module(name, default=false)
end

# Ensure a module is absent in the answers
def remove_module(name)
end

# Ensure a module is enabled in the answers
def enable_module(name)
end

# Ensure a module is disabled in the answers
def disable_module(name)
end

# Set an answer for a module's parameter. If a module is disabled, it will be enabled
def set_answer(mod, parameter, value)
end

# Get an answer for a module's parameter. If a module is disabled or no answer for the parameter exists, it will return nil
def get_answer(mod, parameter)
end

# Clear an answer for a module's parameter. If a module is disabled, it does nothing
def unset_answer(mod, parameter)
end

# Migrate a module. The given block is only called if the module is enabled
# @example
#   migrate_module('puppet') do |mod|
#     current_autosign = mod['autosign']
#     unless current_autosign.is_a?(TrueClass) || current_autosign.is_a?(FalseClass)
#       mod.unset_answer('autosign')
#     end
#   end
#
# @example
#   migrate_module('foreman_proxy') do |mod|
#     mod.rename_parameter('puppetrun', 'puppet')
#     mod.rename_parameter('puppetrun_listen_on', 'puppet_listen_on')
#   end
def migrate_module(mod, &block)
end

I'm not entirely sure about the migrate_module name. I think the essence is that it operates on a module only if it's enabled and you get a module back which has some ways of expressing operations.

In theforeman/foreman-installer#700 I tried to see if I could express the migrations and it looks like it. It feels more natural to me. I skipped a few complexer ones and all katello/foreman-proxy-content ones so I may have missed some things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look this over. I wanted to make a note of something I wanted to discuss more as its something I find a bit odd and affects things. We draw some parallels to databases and migrations with the answer file yet we treat it like its both the end product and the baseline that migrations are layered on top of. I had been wondering if we ought to have this more formalized that really the answer file is the initial migration and migrations layer on top of that. I think this would mean we need a helper to calculate the answer file final state, but in a way we need that today as you can't see the affect of migrations either. This would also allow "squashing" migrations in the future similar to the Rails model.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think i'll need this change in place so that I have the interface to build the DSL into.

lib/kafo/answer_file.rb Outdated Show resolved Hide resolved
lib/kafo/configuration.rb Outdated Show resolved Hide resolved
lib/kafo/configuration.rb Outdated Show resolved Hide resolved
require 'test_helper'

describe 'Kafo::AnswerFile' do
let(:dummy_logger) { DummyLogger.new }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let(:dummy_logger) { DummyLogger.new }
let(:logger) { DummyLogger.new }

test/kafo/answer_file_test.rb Outdated Show resolved Hide resolved
@@ -0,0 +1,47 @@
require 'test_helper'

describe 'Kafo::AnswerFile' do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally you actually use the class reference.

Suggested change
describe 'Kafo::AnswerFile' do
describe Kafo::AnswerFile do

String $puppet_class => Hash[
String $parameter => Enum[true, false, Hash[String, Variant[String, Boolean, Integer, Array, Hash]]]
]
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts about using https://yardoc.org/ in the actual classes as well?

Also, it's not really valid. You can't use names and Enum only describes strings. really it's:

Hash[String, Variant[Boolean, Hash[String, Any]]]

@ehelms
Copy link
Member Author

ehelms commented Jul 21, 2021

I plan to open up this part of the discussion more broadly, but will put a note here as it helps build context. I want to set up a good way to encapsulate V1 vs V2 differences and provide a nice clean interface between them. An aspect worth considering is then how a project built on top of Kafo then declares and uses one format or the other. The options I thought of in my currently preferred order:

  1. Projects define migrations that convert a V1 answer file into a V2
    • This lets Kafo support either format and have to do no conversion itself or provide auto-conversion which could be error prone. Further, this keeps migrations working as they can apply to a V1 answer file and then do the conversion. This would require treating the answer file as the first migration and not a living document. That is, truly like a database.
    • The downside is the user has to get the migration right versus Kafo providing a utility for conversion
  2. Kafo performs manual conversion of V1 to V2
    - A utility is provided that knows how to convert V1 to V2. Users still have to consider their migrations and what to do with them.
  3. Automatic conversion of V1 to V2 by new Kafo major version.
    - Then Kafo only supports one version, code is straight forward, Kafo handles conversion.
    - New migration folder/format would be required and old migrations would be ignored

@ekohl
Copy link
Member

ekohl commented Jul 21, 2021

In #337 (comment) I described a migration DSL which could be much easier to support both v1 and v2. If we require users of Kafo to use that DSL, an automated migration can be performed. Trying to support option 1 and 2 sounds hard (though not impossible).

Refactors code into an AnswerFile class for handling the loading of
and details of the parts of the answer file. This abstraction will
allow easier introduction of newer versions of the answer file.
Copy link
Contributor

@wbclark wbclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you briefly describe here which problems the V2 of the answers file schema is intended to solve?

Looking over the schema in ehelms@2b96d65 and DSL described in comments here + examples in theforeman/foreman-installer#700 , I can see that migrations become much easier to read and write without having to think about module: true vs. module: {} vs. module: {param: value}.

Do we still have the ability to differentiate new installs from upgrades, and how will that look going forward?

And what is the purpose of including and excluding params ? In particular, what does it mean to have module: { parameters: { myparam: myvalue, ... }, include: { myparam, ... } } (and how does the meaning change if that included param is moved to excluded, or neither included nor excluded?)

Does excluding a parameter provide the ability to revert to module default value without forgetting the prior custom value (a downside would be that it becomes more difficult for a human to read and understand configuration from an answers file, for example one collected in an sosreport), or would the module default value for the parameter then get recorded (which brings me back to the question, what does exclusion mean)?

@ehelms
Copy link
Member Author

ehelms commented Jul 28, 2021

The reference design conversation for V2 format can be seen over here -- https://community.theforeman.org/t/defaulting-puppet-to-off-in-the-katello-scenario/14553

And what is the purpose of including and excluding params? In particular, what does it mean to havemodule: { parameters: { myparam: myvalue, ... }, include: { myparam, ... } }` (and how does the meaning change if that included param is moved to excluded, or neither included nor excluded?)

Think of inclusion and exclusion as a way of deciding what parameters you want to present to a user and which you do not. I think the most obvious use case is the exclude case where you do not want users changing a parameter that a puppet module happens to expose. This pairs down the size of help and reduces potential to break a scenario accidentally.

@wbclark
Copy link
Contributor

wbclark commented Jul 28, 2021

Think of inclusion and exclusion as a way of deciding what parameters you want to present to a user and which you do not. I think the most obvious use case is the exclude case where you do not want users changing a parameter that a puppet module happens to expose. This pairs down the size of help and reduces potential to break a scenario accidentally.

Thank you.

In that case, I like the feature but disagree with the landing spot. The included/excluded parameters for a module should be in the scenario configuration rather than the answers file. Since hiding a parameter / forcing module default is more a feature of the scenario itself, that all installations of that scenario would have in common regardless of any choices the user makes in installation. The answers file should contain only that which might be reasonably configured by the end user / that which might vary between two installations of the same scenario.

@ehelms
Copy link
Member Author

ehelms commented Jul 28, 2021

The answers file should contain only that which might be reasonably configured by the end user / that which might vary between two installations of the same scenario.

I see your point. Once thing to recognize about the move to V2 is that it aims to move away from users editing the file and it being treated like a system of record instead. In todays world, we present the answers file as something a user could hand edit, we also present the configuration file in the same way. In reality, we do not intend users to edit these files by hand because it can break the installation, it can skip over hooks and other validations.

If you look at the enabled field its a similar argument. In V2, we aim to support the notion of "you cannot disable this module". So while a user can say enable or disable a module, they should not be able to change whether a module can be disabled if it's been marked as such.

We should probably move this discussion off of this PR and either to the RFC or a new PR with the format.

@wbclark
Copy link
Contributor

wbclark commented Jul 28, 2021

Once thing to recognize about the move to V2 is that it aims to move away from users editing the file and it being treated like a system of record instead.

Right. While there are cases - such as cloning to DR, test, or reproducer environments - where hand editing may still be necessary, I wouldn't insist on the change for that reason.

My argument is based more on logical separation of what is configuration for the scenario vs. what is configuration for the particular installation, which is still relevant in the paradigm of answers file as a system of record.

In particular, when an answers file is read by a human to understand and analyze the install configuration, it is nicer to keep these concepts totally separate. This might occur when reading an answers file provided by an end user (e.g. via an sosreport) for troubleshooting purposes.

Even in the case of an admin checking their own configuration ("what did I or my colleagues set as that param value again?") there are still advantages to reading them from the answers file. In the case of foreman-installer, the --help option does show you current values but it also shows a lot of other information, isn't formatted for that specific purpose, and can have a non-negligible load time.

We should probably move this discussion off of this PR and either to the RFC or a new PR with the format.

OK, agreed. I've commented on the RFC and will share my follow up thoughts there as well.

@ehelms
Copy link
Member Author

ehelms commented Jul 28, 2021

In particular, when an answers file is read by a human to understand and analyze the install configuration, it is nicer to keep these concepts totally separate. This might occur when reading an answers file provided by an end user (e.g. via an sosreport) for troubleshooting purposes.

Even in the case of an admin checking their own configuration ("what did I or my colleagues set as that param value again?") there are still advantages to reading them from the answers file. In the case of foreman-installer, the --help option does show you current values but it also shows a lot of other information, isn't formatted for that specific purpose, and can have a non-negligible load time.

We could (and should) provide a command that prints the answers out in a user friendly way. That would both provide an interface for backup and restore, and help keep users from the concerns you raise.

Right. While there are cases - such as cloning to DR, test, or reproducer environments - where hand editing may still be necessary, I wouldn't insist on the change for that reason.

If we have parameters for everything, I would argue this should never be needed. And if it is, then we should be re-assessing why as that is a likely indicator of something being missed. Similar to the "export" or "print" command mentioned above, we can add the ability to import a set or subset of answers via a file. I feel this is a better way to handle customization. And would match something like the Ansible model.

TLDR; We should do a better job aiming at the separation of the user interface and the data storage.

@wbclark
Copy link
Contributor

wbclark commented Jul 28, 2021

If we have parameters for everything, I would argue this should never be needed. And if it is, then we should be re-assessing why as that is a likely indicator of something being missed.

Speaking from experience building a few reproducers from end-user data, the awful workflow goes something like this:

  1. obtain complete backup from end user.
  2. setup reproducer target and take snapshot after installing RPMs.
  3. extract directory, grab the answers file, and copy that to the target machine.
  4. try to install on the target machine, get some error like name resolution (for obvious reasons). either edit /etc/hosts to workaround this OR
  5. edit answers file and try again. get some other stupid error, such as end user was configured to use external DB, or some issue with some plugin that I know is unrelated to the thing that I care about reproducing.
  6. make more hacks to answers file and try again. repeat as needed (for enterprise users with highly customized installs, probably several times).
  7. finally when the installer will run without errors, reset my target machine to pre-installer snapshot.
  8. run the installer with the hacked answers to confirm it works the first time. after confirmation, reset target machine to pre-installer snapshot once again.
  9. take the working answers file and re-tar it with all the other contents of the original backup.
  10. run the clone script with the hacked backup tar.

I probably got some details wrong as it's been around 2 years since I've done this (I don't recall whether the clone script runs the installer first with default answers, and then again with answers from the backup. I think that may be the case, and that introduces additional complexity, but I'm not positive).

I'm pretty sure Devendra spends a LOT of time day to day on this general process, so he'd likely be a great person to talk to for further feedback on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants