Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to clean-up files as they leave the profile #14

Open
piojo-zz opened this issue Feb 2, 2014 · 28 comments
Open

Allow to clean-up files as they leave the profile #14

piojo-zz opened this issue Feb 2, 2014 · 28 comments
Assignees

Comments

@piojo-zz
Copy link
Member

piojo-zz commented Feb 2, 2014

With the help of quattor/CAF#10, we should be able to track which files are owned by Quattor components, and enable a clean-up infrastructure.

The use case is that components like metaconfig or filecopy create files but don't remove them when these files leave the profile. This can cause problems in, f.i, Apache, where old configurations are left behind.

@ghost ghost assigned piojo-zz Feb 2, 2014
@piojo-zz
Copy link
Member Author

piojo-zz commented Feb 2, 2014

More requirements:

  • When a component is removed from the system, ncm-ncd must be able to remove the files it owns.
  • Components must tell whether their files should be removed or not (do not remove /etc/fstab if we drop ncm-fstab!)
    • By default we do not clean up old files
  • We need some database of managed files. A simple schema would be like:
{
    "accounts" : {
       "files": [ "/etc/passwd", "/etc/shadow", "/etc/group" ],
       "cleanup-allowed" : false
    }
}
  • We should expose to users whether the component's files should be cleaned up. Say by adding to the components' schema a cleanup-allowed field. A component may set its own defaults in its config template.

@jrha
Copy link
Member

jrha commented Feb 2, 2014

When it comes to services configured by metaconfig (or filecopy) there may be times when you want some files cleaned up and not others, I wonder if there is a clean way of achieving this...

@piojo-zz
Copy link
Member Author

piojo-zz commented Feb 2, 2014

Sure. Split them into two different component paths with ncm-module.

For instance:

prefix "/software/components";
"cleanable_metaconfig/cleanup-allowed" = true;
"cleanable_metaconfig/ncm-module" = "metaconfig";
"uncleanable_metaconfig/cleanup-allowed" = false;
"uncleanable_metaconfig/ncm-module" = "metaconfig";

@jrha
Copy link
Member

jrha commented Feb 2, 2014

Cool, my vote is for storing this in whatever backend CCM is using and extending something like ccm-query to expose this.

@msmark
Copy link

msmark commented Feb 3, 2014

Out of interest, how is a component "removed from the system"? Using the Unconfigure method? When I ccm-fetch a new profile, and a component has disappeared that was there the last time?

@piojo-zz
Copy link
Member Author

piojo-zz commented Feb 3, 2014

A component simply vanishes from the system. Its subtree in the profile disappears and its package is removed. So there is no configuration left for it to know what it should be cleaning up.

Also, the Unconfigure method is never implemented, partly because it's terribly difficult to assess what to do. What does it mean to "unconfigure sudo?" Revert to some random, arbitrary defaults? Leave the system unusable?

With the proposal, we'll be able to track when a component is no longer present and, if it allowed to remove files, know which ones.

@msmark
Copy link

msmark commented Feb 3, 2014

So with this proposal, you'll be able to tell if a component has vanished from an updated profile, and remove some files?

@piojo-zz
Copy link
Member Author

piojo-zz commented Feb 3, 2014

Yes. That's half of the benefits. 😄

@msmark
Copy link

msmark commented Feb 3, 2014

Sounds like a good idea to me. Could we also have some kind of hook? A hook that is called that says a component has vanished from the profile. So if there is some other nuance we'd like to introduce as a result of a particular component being removed, we could easily code that in?

@piojo-zz
Copy link
Member Author

piojo-zz commented Feb 3, 2014

I hadn't thought about it. It definitely makes sense. So we'd extend our schema to:

{
    "accounts": {
       "files": [ "/etc/shadow", "/etc/group", "/etc/passwd" ],
        "cleanup-allowed": true,
        "remove-hook": "/bin/true"
     }
}

@msmark
Copy link

msmark commented Feb 3, 2014

Yes, and send the name of the component that has been removed as an argument to the hook script, just in case we want one generic hook script that handles all components.

@piojo-zz
Copy link
Member Author

piojo-zz commented Feb 3, 2014

I can pass the above JSON object via stdin, as I do with the ncm-ncd hooks. The script should know that, if cleanup is allowed, the files are already gone.

@msmark
Copy link

msmark commented Feb 3, 2014

That would be ok I think. If we really wanted the files to be removed last, we would set cleanup-allowed to false and remove them in the hook script.

@ned21
Copy link
Contributor

ned21 commented Feb 3, 2014

How do you deliver the hook script? In theory that's exactly what unconfigure() is supposed to do, but I would guess that spma would remove the component rpm before it had chance to run. But then similarly if /clean_my_component is delivered by rpm, how does spma know it should install the script briefly and then remove it again? In general this is a great idea, just pointing out that there may be some chicken-and-egg problems. :)

@piojo-zz
Copy link
Member Author

piojo-zz commented Feb 4, 2014

How do you deliver the hook script?

I guess, in a generic RPM that bundles all your unconfigure hooks. That's how we shipped all the Icinga checks at UGent.

In theory that's exactly what unconfigure() is supposed to do, but I would guess that spma would remove the component rpm before it had chance to run.

That's correct. Also, because there is no subtree dedicated to the component it's not possible for the component to do the right thing. Having hooks with minimal status may be a better solution.

But then similarly if /clean_my_component is delivered by rpm, how does spma know it should install the script briefly and then remove it again?

Since it's in a generic bundle, it will stay there even if the component is not installed, or has never been relevant for that host.

If you put it as part of your service/feature definition, when you unbind it, it will get uninstalled before running, and surprises will arise.

(In truth, you could find your way with Yum's post-transaction-actions plugin, which I'd like to support eventually, but that's not portable and will be fragile)

In general this is a great idea, just pointing out that there may be some chicken-and-egg problems. :)

The more feedback the cooler it will be. 😄

@msmark
Copy link

msmark commented Feb 4, 2014

I see no reason why the hook script can't be part of the component package. Ok that package will get removed if a) the SPMA component runs and b) the SPMA component is configured with userpkgs = no. But if we ensure the clean-up (and running the hook script) occurs before any of the other components are run, that should be ok.

Now rather than having a hook script, why not just call Unconfigure? Up until now Unconfigure has been sporadically implemented, and is not actually called by any automatic workflow as far as I am aware (I have to manually go ncm-ncd -unconfigure <component>). So actually this would become a good use-case for Unconfigure to be called automatically and a good argument for it to be implemented more robustly. I'd like to go further and suggest that Unconfigure should be called first, prior to clean-up. However we would lose one advantage that we get with hook scripts, and that's the ability to have a generic hook script that has the opportunity to be called on behalf of multiple components.

Or perhaps we can have both. Hook scripts might be a more general concept we could develop, because they are above components and can operate on multiple components, perhaps we want to introduce more hooks this way. For example, we could have a hook script that runs if any component fails (given a list of failed components, I can see a use for that). There may be other times we could do with a hook, can't think of any more examples right now.

Re Yum's post-transaction-actions, that's very Yum-specific, let's not invent things that aren't portable if we can avoid it.

@njwilliams
Copy link

Confusingly, I thought that ncm-cdispd would detect a removed tree and
invoke unconfigure on the component. However, looking at the source, it
doesn't seem to have ever (at least in the last 5 years) done that. So, I
made up that memory! However, it's possible for it to do that task.

Cheers,
Nick

@piojo-zz
Copy link
Member Author

piojo-zz commented Feb 5, 2014

This is a busy issue!

I see no reason why the hook script can't be part of the component package. Ok that package will get removed if a) the SPMA component runs and b) the SPMA component is configured with userpkgs = no.

Which is what will happen in most sites.

But if we ensure the clean-up (and running the hook script) occurs before any of the other components are run, that should be ok.

As @njwilliams states, this should be done inside ncm-cdispd. But nobody has touched it in the last six years (well, a minor change by him four years ago). I don't think we have any volonteers to redo ncm-cdispd, just for this change.

Now rather than having a hook script, why not just call Unconfigure?

Three reasons:

  • Authors will expect some sort of profile to be available (see the few components that do implement it). By definition that's not the case.
  • Calling Unconfigure on a component that doesn't implement it will raise an exception. Sure, we could change the base class and this is a trivial concern.
  • We still need some organisation-specific behaviour for Unconfigure, even in the absence of a profile. You probably need different behaviours depending on domain/personality/moon phase.

None are terrible technical challenges, but it seems we lack the manpower to accomplish them all.

So actually this would become a good use-case for Unconfigure to be called automatically and a good argument for it to be implemented more robustly. I'd like to go further and suggest that Unconfigure should be called first, prior to clean-up.

Indeed, making ncm-cdispd detect what has to be unconfigured, unconfiguring it and then running the configure phase makes sense. But nobody in the community knows that code anymore.

However we would lose one advantage that we get with hook scripts, and that's the ability to have a generic hook script that has the opportunity to be called on behalf of multiple components.

Also, it seems Perl is not the favourite language of this community anymore. The only ones who would contribute Unconfigure methods are the usual suspects. Allowing Python/Haskell/brainfuck would probably be appreciated.

Or perhaps we can have both.

Hooks can be ready before the next workshop (are you going to Amsterdam?). I doubt anyone will dare to touch ncm-cdispd this year.

Hook scripts might be a more general concept we could develop, because they are above components and can operate on multiple components, perhaps we want to introduce more hooks this way. For example, we could have a hook script that runs if any component fails (given a list of failed components, I can see a use for that).

We already have that. Those are the global post-configure hooks, which receive the list of components that failed or raised warnings in their input. 😄

There may be other times we could do with a hook, can't think of any more examples right now.

There are two hooks here:

  • Configuration hooks: We already have them, check with @ned21 as he participated in that discussion back in June.. We can use them to:
    • Block deployments based on policies/rules.
    • Report immediately deployment results.
  • Removal/unconfiguration hooks. We don't have anything for that. When the Unconfigure part was implemented there was no clear use case, so it's not useable anymore.

@ned21
Copy link
Contributor

ned21 commented Feb 5, 2014

I don't think that implementing a new mechanism just because we don't know the code to ncm-cdisp very well is valid. We still have to support it so we should know it, and it makes sense that the logic to remove something is in the same place as creating something--the component.

I think the logic for ncm-cdisp is fairly straight forward. When it receives a new profile:

  1. if it sees any component disappear from a tree, execute ncm-ncd --unconfigure $comp
  2. if any component should be dispatched, dispatch it.

If SPMA is set to dispatch when /software/packages changes, and ncm-$comp is removed from the tree, then the sequence of events will be that ncm-ncd --unconfigure $comp is run and then later when ncm-ncd --configure spma is run, the component package will also be removed.

But that only covers the case where a component owns a config file outright, e.g. if you had an ncm-motd. If the file is created by a generic component, such as filecopy or metaconfig, then a you still need a way to clean that up when it disappears from the component's list of files.

@msmark
Copy link

msmark commented Feb 6, 2014

I agree with @ned21, decisions can't be made solely on the basis that no one will dare touch ncm-cdispd because of lack of understanding. How hard can it be? It's only Perl.

@Piojo raises a valid point however, that Unconfigure does need some sort of profile to be available. Perhaps we figure out a way of giving the method a reference to the old profile before the component was removed, which it can then use if it needs to determine the state it is supposed to be unconfiguring?

Perhaps when a file is created by filecopy or metaconfig, one also needs to register the name of the component that requires the file, so if that component is removed, the file can be too?

@jrha
Copy link
Member

jrha commented Feb 7, 2014

@ned21 this discussion did indeed start as a way to clear up files no longer configured by metaconfig or filecopy, which in our experience has been what we've found ourselves cleaning up the most.

I also agree that we shouldn't shy away from ncm-cdispd.

@jouvin
Copy link
Contributor

jouvin commented Apr 7, 2014

I'm joining this discussion very late. Can we reach some sort of (temporary) conclusion? I agree with last @jrha comment, the thing we wanted to originally addressed was removal of files no longer managed by a component (typically fileconfig/metaconfig) rather than implementing the never implemented Unconfigure(). This is pretty different from cleaning files previously managed by a removed component. I don't see clearly from the original discussion how this proposal help to handle this case. Who will be responsible to manage the file database proposed? Is it filecopy which will have the responsibility to register files in this database? In this case will probably require an extension of filecopy/metaconfig schema to define the cleanup action.

@piojo-zz
Copy link
Member Author

piojo-zz commented Apr 9, 2014

So, the conclusion seems to be that cleaning up obsoleted files is good. And that querying which files are "owned" by Quattor components is also a good addition (one that people have been requesting for quite some time).

I'll continue work on this during this week.

@jrha jrha added this to the 14.8 milestone May 21, 2014
@jrha jrha modified the milestones: 14.8, 14.10 Aug 14, 2014
@jrha
Copy link
Member

jrha commented Aug 14, 2014

Bumped to next release.

@jrha
Copy link
Member

jrha commented Oct 1, 2014

Agreed at the workshop that this clearly not urgent, also agreed to use sqlite as a datastore.
Discussed a phased introduction across two releases, first passive, then active.

@jrha jrha modified the milestones: 14.12, 14.10 Oct 30, 2014
@jrha jrha modified the milestones: 15.4, 14.12 Jan 7, 2015
@jrha
Copy link
Member

jrha commented Jan 7, 2015

Bump.

@jrha
Copy link
Member

jrha commented Oct 7, 2015

Lack of developer effort, postponed.

@stdweird
Copy link
Member

add_files is too limited imho, i added quattor/CAF#117

@jrha jrha removed this from the 16.4 milestone Dec 12, 2015
ttyS4 pushed a commit to ttyS4/ncm-ncd that referenced this issue Dec 15, 2017
ccm-purge was still assuming that the profile data was being stored in
GDBM. This updates the call to use the correct EDG::WP4::CCM::DB
indirection to handle all backends.

This fixes quattor#14 on github.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants