-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow running in mlock-less environment #13804
Comments
I had a look at
If above is correct, then I'd say use of
|
Thanks for the issues around secret store. I want to sync with Sven and our security team next week. I think your improvements will certainly make for a better overall experience, but want to get us all on the same page. |
@redbaron first of all, thank you for the comments and your interest in the topic! Let me try to address your questions/comments. Please feel free to ask follow-up questions or ask about points I did not address!
That is not the complete picture. The constant-string secret can result from a string in the TOML config option (as you mention), but it can also result from an environment variable as well as a secret-store, so it is not necessarily readable from the config file. Furthermore, your comment (at least to me) implies that we want to protect against a malicious part within the Telegraf process. This is not the thread-model nor the goal nor do we claim that somewhere. The goal is to protect sensitive data "as-good-as possible" against other processes or users apart from the telegraf user and root. One part is to not store secrets on-disk in an unprotected fashion. This is what memguard does, it encrypts data marked as sensitive (i.e. having type Another part is to be able to access external authentication systems to allow a central management of credentials (e.g. oauth2 tokens etc). This is covered by the secret-store plugins which, as a goodie, now can also provide "dynamic" secrets that can change over time (as necessary by e.g. tokens). This not directly has something to do with As you can see by the formulation, this is a best-effort approach, i.e. we try to protect secrets until they enter plugins. Now the plugin has the possibility to minimize the leak-surface (e.g. look at the
Despite those points, I still think offering additional means to protect sensitive data (i.e. kicking the can further down the street) is worth the effort.
I think you miss the fact that memguard also protects the sensitive data in memory. The plain secret is only accessible in a small time-span between Furthermore, Telegraf is not only installed in the cloud environments you mention! Asking users to disable swap for every machine hosting a Telegraf instance is probably not a good idea, wouldn't you agree?
As I wrote before, I don't think this is a argument against offering the possibility for plugins to do better! The radius input plugin for example uses a byte-slice and thus minimizes the potential leakage (there can still be copy operations inside the underlying lib). So I think the "hey everybody is insecure, so why bother" argumentation is not something we should follow. However, for me a valid point would be the use of Telegraf in environments that either do not offer memory locking (I don't think there is any non-exotic) or the use in enviroments where you cannot control the locked-page limit. As you outlined this is the case for at-least Kubernetes (btw, how much is your limit and how many "secrets" do you have typically?) and thus I do see your point. This being said, we should think of some way to allow the user to bypass memguard explicitly (e.g. with a Anyway, I want to discuss this within the team as it would potentially increase the maintenance effort... |
@srebhan , Thanks for detailed response. Let me state that my main goal is to run upstream version of Telegraf in our environment and whatever approach takes us there is fine by me. Having said that , if threat model is to protect in-memory data from external non-root access, then Linux already solves it as-is:
As a consequence of the above, I still maintain view that scrambling adds no value for the threats you protecting from, The way I understand |
As for "plaintext secrets" (#13807), all process environment variables are stored in a single continuous memory location which is swappable, |
Interesting, it seems to be possible to cast https://cs.opensource.google/go/go/+/refs/tags/go1.21.0:src/strings/builder.go;l=49 so if new |
This was a nice conversation, thanks! As mentioned elsewhere, the use of mlock is really about hardening rather than meant as a full, security access control. I tend to think of it mostly in terms of a way to minimize swapping out the decrypted credential since the user went through all the trouble of setting up a secret store to avoid storing it on disk in the clear. The least we can do is honor that and do best effort to prevent it hitting the disk in the clear during runtime (not all swap is encrypted). It's understood that the current implementation is imperfect (which is why I think of it as "hardening for" as opposed to "protection against") and that some plugins are more effectively hardened than others. We've talked about how to improve the situation for plugins where the hardening is less effective, but in some cases that requires somewhat deep architectural changes in how plugins are invoked and other times, vendoring libraries and modifying them to handle the secrets better. Both have (sometimes significant) tradeoffs. In general, I've advocated for being transparent about the limitations and perhaps part of this issue is that we weren't transparent enough about the hardening.... All that said, we do want reasonable defaults in telegraf and defaulting to additional hardening makes sense for the general case, but your use case of a constrained environment without swap is a clear use case where the default may not be desirable. We could try to address this in several ways: We've discussed 'a' before and decided that we wanted upstream comment before applying. We've discussed 'b' before and found it undesirable since the hardening measure might be removed without the user realizing it. 'c' is interesting to think about and would solve this particular case, but it feels a bit like we're being too opinionated. I think 'd' is the right choice here so long as we make it clear what the option is doing. I suggest the following:
|
@jdstrand , I agree with adding a switch to disable mlock. There is one thing you didn't cover in your respnonse though. You said:
Do you agree, that if config option doesn't contain references to a secret store, then whole value can be treated as "not a secret" and therefore bypass any hardening? This is what #13812 implements, where it detects values with no references and store them as simple go |
I'll answer more generally: conceptually, sure, if it isn't a secret then we don't need to mlock it. However @srebhan said this: "That is not the complete picture. The constant-string secret can result from a string in the TOML config option (as you mention), but it can also result from an environment variable as well as a secret-store, so it is not necessarily readable from the config file". ISTR it was an active decision to harden environment variable handling too. I defer to @srebhan on the historic details. |
Right and all environment variables for any Linux process are in swappable memory already, by protecting copy of it in Go heap we don't prevent leakage through swap.
I don't understand, constant string I am talking about is value specified in TOML without any references to secret stores. |
@redbaron my suggestion is that we look into how much effort it is to support both a locked-page setup (default) as well as a non-locked page setup (enabled via explicit option). Let me be clear: the switch will be explicit, e.g. via command-line option, and not dependent on if the "secret" links to a secret-store or not. However, I would need your help to test the setup thoroughly. Are you willing to help me there? |
@redbaron any comment to my suggestion above? |
@srebhan , yes I can help with testing. I do think that if locked (non swappable pages) is all what you want for secrets, then using |
Use Case
memguard
as a backend for secret store requires lockable memory. Kubernetes doesn't have native way to configure RLIMIT_MEMLOCK and for that reason it is hard to run Telegraf with many "secret-capable" config options, even if they don't contain references to the secret store.I didn't dig deep what value
memguard
provides, but I assume locked memory is used so that plaintext value is not leaked to swap. Kubernetes nodes run in swap-less mode and if "secret-capable" config option doesn't contain any secret references then using locked memory has no benefit.Consider adding a switch to disable use of locked memory so that it can be enabled in environments where it is safe to do so.
Expected behavior
add agent level option
secretstore_memlock
with default valuetrue
. When set tofalse
disable use of locked memory bymemguard
Actual behavior
Locked memory requirement is enforced on Linux even if on other OSes
memguard
can run without it.Additional info
No response
The text was updated successfully, but these errors were encountered: