Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise Smtpd552Exception without accepting whole message #55

Open
iuri-gg opened this issue Nov 20, 2023 · 14 comments
Open

Raise Smtpd552Exception without accepting whole message #55

iuri-gg opened this issue Nov 20, 2023 · 14 comments
Assignees
Labels

Comments

@iuri-gg
Copy link
Contributor

iuri-gg commented Nov 20, 2023

Describe the question
I am looking for a way to prevent malicious users from sending 100s of MB of data to the server and making it run out of memory. Currently, I am checking message size in on_message_data_receiving_event and raising Smtpd552Exception if it is too big. However, this does not terminate the connection to the client and the client keeps sending more data.

The only way to terminate the connection is to raise Smtpd421Exception, however, that will cause the legitimate client to retry delivery of the message since it is not a 5xx error.

Expected behavior
I am not sure what is the proper way to handle this and looking for guidance. Ideally, it would be great if the SMTP client could receive SMTP error 552 and get disconnected right away. Is that possible?

@TomFreudenberg
Copy link
Member

TomFreudenberg commented Nov 27, 2023

He @iuri-gg

interesting request - I would prefer to answer in a manner of "Teergrube" to those malicious clients and also lock them by ip or address or ...

At first, I guess you stay with some "small" amount of buffer sizes like defaults:

  DEFAULT_IO_BUFFER_CHUNK_SIZE = 4 * 1024
  DEFAULT_IO_BUFFER_MAX_SIZE = 1 * 1024 * 1024

From that there is no "storm" per incoming line possible anymore.

When detecting the case for a Smtpd552Exception inside on_message_data_receiving_event you should probably add a flag to the context vars of this session, so that you do not accept any incoming data from that session anymore until the session is closed by client.

In addition like from "Teergrube" you should slow down the connection by an additional "sleep" in your on_message_data_receiving_event. That means:

if flag malicious sender is set, you just clear session[:ctx][:message][:data] and add an additional sleep (30 seconds e.g.) and do nothing else here. That will be boring from clients perspective if you lock their clients in your "Teergrube".

When at some end you come the on_message_data_event then you you just can reply by Smtpd552Exception or maybe also just answer with OK but ignore the message.

What that be an idea?

@TomFreudenberg
Copy link
Member

TomFreudenberg commented Nov 27, 2023

In a meta program like:

  def on_message_data_receiving_event
    if ctx[:status] && ctx[:status][:ignore_message]
      ctx[:message][:data] = ''
      sleep 30
    else
      ctx[:status] || ctx[:status] = { ignore_message: true } if ctx[:message][:data].bytesize > 10MB
    end
  end

  def on_message_data_event
    # option 1 :: if you want to signal the client
    raise Smtpd552Exception if ctx[:status] && ctx[:status][:ignore_message]

    # option 2 :: do not let the malicious sender know that you ignore him
    return true if ctx[:status] && ctx[:status][:ignore_message]

    # otherwise your default code to handle your incoming mails
  end

@TomFreudenberg
Copy link
Member

@TomFreudenberg
Copy link
Member

TomFreudenberg commented Nov 27, 2023

In addition I am willing to extend the initialization of the context by an additional event

https://github.com/4commerce-technologies-AG/midi-smtp-server/blob/master/lib/midi-smtp-server.rb#L1373

so that private flags and states could be initialized and stated there in preparation

@iuri-gg
Copy link
Contributor Author

iuri-gg commented Nov 27, 2023

Thanks Tom! I like the Teergrube approach - it will waste bandwidth (there is no way to get around it), but at least it will not cause any OOM issues. I appreciate your thoughtful response.

@iuri-gg
Copy link
Contributor Author

iuri-gg commented Nov 27, 2023

Is it possible to set a timeout for the clients, after which we will close the connection? I found @io_cmd_timeout but I don't think that is enough.

What if the malicious user tries to tie up all the threads and causes a denial of service by keep sending message data (without end)? We will not run out memory since we will be discarding it (Teergrube), but the thread will not be available to serve anyone.

@TomFreudenberg
Copy link
Member

TomFreudenberg commented Nov 28, 2023

Is it possible to set a timeout for the clients, after which we will close the connection? I found @io_cmd_timeout but I don't think that is enough.

The @io_cmd_timeout will drop the connection by 421 when the client does not send in a speedy way (to prevent that your session is halted by slow sending).


a (general) timeout for the clients

No, not yet - let me think about it


Even with @io_cmd_timeout someone can eat all you connections while just sending 'NOOP' commands for example. Let's say you have 10 threads allowed, 10 connections just sending 'NOOP' per 20 seconds each, then your service is denied.

But you can built a solution by just using the existing events. Let me also think about that.


What if the malicious user tries to tie up all the threads and causes a denial of service by keep sending message data (without end)?

That will happen. There are two options to prevent that:

a. allow to raise the number of threads dynamically if you put some senders into nonsense mode (just ignore their traffic)
b. drop the connection - but - then they will visit you again.


Could you state out that it would be possible to lock them from a new re-connect? Like checking SPF records etc.?

I like to build a cookbook around a functional teergrube and spammer preventing server.

Let's check out what we want to implement for that.


@gencer - Hey Gencer, we invite you to come up with your thoughts for that.

Cheers
Tom

@gencer
Copy link
Contributor

gencer commented Nov 28, 2023

Hey @TomFreudenberg,

Thanks for the heads-up!

We already hit this situation long time ago. Similar to this issue -as stated above let's say NOOP command send every few seconds- we seen some clients intentionally keep the connection up and send dummy commands which resulted in exhausted resources and service downtime.

I did not immediately raise an issue here at that time because I believed this is our job to handle not the Gem's job.

Of course, raising the Smtpd552Exception without accepting whole message in a manner time is reasonably acceptable but we just did not want to give a timeout to our users due to some clients can have a very limited and low-quality internet connection.

Therefore, we implemented a client control mechanism and rate limiting. I believe Office365 and Gmail have similar patterns. Instead of giving a timeout (of course timeouts are still useful in any case whatsoever) we record the connections in dedicated & separate database and redis-backed cache and handle the connections. If a client started to bully our system, we are just block it until a specific amount of time or permanently BUT there are some checks to be done before we finally do this.

When we want to block a client, we use 3 different checking schemes. First one is detection of client's location (Both geologically and ISP Kind.). We have an extensive IP Location database and ISP Service database for that. Thanks to this, we exactly know where this client came from. (From his garage, a rented dedicated/vps server from a known datacenter or perhaps even from Azure or Google networks.) Why we do this? Because we can't risk blocking the whole SMTP Service/IP Address/Subnet just because one of their clients is bullying our system.

Second one is detecting how often this client do the attack. If we found similar/matching attack on the previous records, we may manually block the whole subnet too. That is absolutely depends on the attack's size.

Third one is conjunction with the first 2 scheme: IP Reputation services.

All these 3 schemes give us the final results of deciding. Should we block the client IP or perhaps Client IP Subnet, how long or permanently?

According to this data, blocking a user can be 1 hour, 1 day or perhaps permanent. We just can't block or temporarily block clients based on few variables or can't give fixed blocking time or timeouts. We need to know where they came from, how often do this and also partitioned by "Country, ISP and timing". We are profiling every single connection and remember it.

From your comment:

a. allow to raise the number of threads dynamically if you put some senders into nonsense mode (just ignore their traffic)

That is the exhausted resource part in my answer. I'd never ever recommend using this way. Of course, If you set an upper limit that would be OK. But, if they are going to be used already OR hang out in memory, what's the point?

b. drop the connection - but - then they will visit you again.

That's what our Rate Limiting, Blocking and Caching mechanism do. They can't.

P.S.: Addition to this we also send Abuse reports to the IP Owner. Mostly automatically by the detection of our system or manually due to some ISPs want you to fill a form.


That can be an overkill for many users or usages but eventually it is up to you how would you want to detect and block the user.

TL; DR: I am absolutely open to this addition/change but ultimately people may need a much more resilient solutions like ours in some point. We may just give them a basic prevention mechanism.

*Batteries included

@TomFreudenberg
Copy link
Member

Hey @gencer

that's why I had invited you :-) thanks for your comment.

I am absolutely open to this addition/change but ultimately people may need a much more resilient solutions like ours in some point. We may just give them a basic prevention mechanism.

To make it clear, I do not want to include all your above documented functionality in the gem. That is not the purpose of the gem.

I just want to make a cookbook example, how in general something like a spam preventing service could be built or better could be started to build.

But maybe there are some painpoints currently which could be added in reasonable additional events.

For example, an event to initialize session parameters could very senseful for the gem.

That's what I am collecting here.

Cheers
Tom

@gencer
Copy link
Contributor

gencer commented Nov 28, 2023

Of course! That is why I said I did not raise an issue here because its our job not the gems job.

Sessions are the best way. Initializing session parameters could be a starting point.

Every application has its own decising point.

P.s. I just explained our solution in detailed maybe it can give people an idea. (Similar to cookbook 😀)


@TomFreudenberg;

To make it clear, I do not want to include all your above documented functionality in the gem. That is not the purpose of the gem.

No, no. I was referring the cookbook and initializers part of your comment. Not adding my solution to the gem :) Sorry for the misunderstanding.

@TomFreudenberg
Copy link
Member

Sorry for the misunderstanding.

I did not missunderstood you - just want to make it clear for everyone.

I really appriciate your long comment and what you have implemented and invested!

@gencer
Copy link
Contributor

gencer commented Nov 28, 2023

I did not missunderstood you - just want to make it clear for everyone.

Gotcha! Fair enough. 😄

@iuri-gg
Copy link
Contributor Author

iuri-gg commented Dec 1, 2023

Thank you both for such in-depth answers. I understand having rate-limiting and IP blacklist/whitelist works well once the offender is identified. I was looking for a way to disconnect malicious clients after they already established a connection (the first time they were identified to be malicious). For example: if there is a client who is just trying to hold connecting (NOOP every 10 seconds), I can add their IP to the blacklist so they don't connect anymore in the future, but how do I terminate their existing connection? Is there on_cmd_event or something similar to hook into and raise an error? I was thinking of upper bound timeout as a way to break out from those kinds of connections, but agree offloading it to the implementer would be a better solution.

So to rephrase my original question: what hooks can be used to have the implementer break out of the connection (close it with some error)?

@TomFreudenberg
Copy link
Member

Hey @iuri-gg

So to rephrase my original question: what hooks can be used to have the implementer break out of the connection (close it with some error)?

I will come up with an idea for that as well :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants