Question How to Prevent High Connection Counts Causing High Memory Usage (Possible DDoS?)

Webmore · Feb 7, 2025

Hello everyone,

I know this might be a simple question and may have been asked multiple times, but I couldn’t find a clear answer.

I'm frequently receiving the following email alert from Plesk:

We have detected a critical status for one of the server parameters.
Please log in to Plesk and check the server status.
The message from Monitoring:
The memory usage status is critical!
The current value is 3.4 GiB.

When this happens, all websites on the server start running extremely slow.

After connecting via SSH and running the following command:

ss -tan state established | grep ":80\|:443" | awk '{print $4}'| cut -d':' -f1 | sort -n | uniq -c | sort -nr

I notice that one IP always has a significantly high number of connections (e.g., 184 connections from xx.xx.xx.xx).

This looks like a DDoS attack, and my current solution is to manually ban the IP using Fail2Ban, which immediately restores normal performance. However, I want to automate or prevent this before it affects my server performance.

My questions:

Is there a way to automatically block an IP that exceeds a certain number of connections in Plesk or Fail2Ban?
Any other best practices to prevent these types of issues?

I’d appreciate any advice or guidance from the community!

Thanks in advance!

Bitpalast · Feb 7, 2025

This here is still relevant, although in the meanwhile @Kaspar suggested a valuable improvement of one of the key regex lines

How to Avoid High CPU Load & Block Bad Bots with Plesk

Learn how to effectively reduce CPU usage and keep bad bots and hackers away from your websites using Plesk Cgroups Manager and Fail2Ban.

www.plesk.com

Improved key regex line:

Code:

failregex = ^<HOST> -[^"]*"(GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d)" \d+ \d+ "[^"]*" "[^"]*(%(badbots)s)[^"]*"$

If you want to annoy the bad guys more, then use this badbot list and regex section instead of the one I provided in the article:

Code:

badbots = VIZIO|meta-externalagent/1\.1|facebookexternalhit/1\.1 \(|python-httpx/|Pinterestbot/|aiohttp/|        |Cookiebot/|\(Amazonbot/|ClaudeBot|Optimizer|seobility|Timpibot|Go-http-client/1\.1|colly|GPTBot|AmazonBot|Bytespider|Bytedance|thesis-research-bot|fidget-spinner-bot|EmailCollector|WebEMailExtrac|TrackBack/1\.02|sogou music spider|seocompany|LieBaoFast|SEOkicks|Uptimebot|Cliqzbot|ssearch_bot|domaincrawler|AhrefsBot|DigExt|Sogou|MegaIndex\.ru|majestic12|80legs|SISTRIX|HTTrack|Semrush|MJ12|MJ12bot|MJ12Bot|Ezooms|CCBot|TalkTalk|Ahrefs|BLEXBot|Atomic_Email_Hunter/4\.0|atSpider/1\.0|autoemailspider|bwh3_user_agent|China Local Browse 2\.6|ContactBot/0\.2|ContentSmartz|DataCha0s/2\.0|DBrowse 1\.4b|DBrowse 1\.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1\.4b|Educate Search VxB|EmailSiphon|EmailSpider|EmailWolf 1\.00|ESurf15a 15|ExtractorPro|Franklin Locator 1\.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|Guestbook Auto Submitter|Industry Program 1\.0\.x|ISC Systems iRc Search 2\.1|IUPUI Research Bot v 1\.9a|LARBIN-EXPERIMENTAL \(efp@gmx\.net\)|LetsCrawl\.com/1\.0 \+http\://letscrawl\.com/|Lincoln State Web Browser|LMQueueBot/0\.2|LWP\:\:Simple/5\.803|Mac Finder 1\.0\.xx|MFC Foundation Class Library 4\.0|Microsoft URL Control - 6\.00\.8xxx|Missauga Locate 1\.0\.0|Missigua Locator 1\.9|Missouri College Browse|Mizzu Labs 2\.2|Mo College 1\.9|MVAClient|Mozilla/2\.0 \(compatible; NEWT ActiveX; Win32\)|Mozilla/3\.0 \(compatible; Indy Library\)|Mozilla/3\.0 \(compatible; scan4mail \(advanced version\) http\://www\.peterspages\.net/?scan4mail\)|Mozilla/4\.0 \(compatible; Advanced Email Extractor v2\.xx\)|Mozilla/4\.0 \(compatible; Iplexx Spider/1\.0 http\://www\.iplexx\.at\)|Mozilla/4\.0 \(compatible; MSIE 5\.0; Windows NT; DigExt; DTS Agent|Mozilla/4\.0 efp@gmx\.net|Mozilla/5\.0 \(Version\: xxxx Type\:xx\)|NameOfAgent \(CMS Spider\)|NASA Search 1\.0|Nsauditor/1\.x|PBrowse 1\.4b|PEval 1\.4b|Poirot|Port Huron Labs|Production Bot 0116B|Production Bot 2016B|Production Bot DOT 3016B|Program Shareware 1\.0\.2|PSurf15a 11|PSurf15a 51|PSurf15a VA|psycheclone|RSurf15a 41|RSurf15a 51|RSurf15a 81|searchbot admin@google\.com|ShablastBot 1\.0|snap\.com beta crawler v0|Snapbot/1\.0|Snapbot/1\.0 \(Snap Shots&#44; \+http\://www\.snap\.com\)|sogou develop spider|Sogou Orion spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sogou spider|Sogou web spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sohu agent|SSurf15a 11 |TSurf15a 11|Under the Rainbow 2\.2|User-Agent\: Mozilla/4\.0 \(compatible; MSIE 6\.0; Windows NT 5\.1\)|VadixBot|WebVulnCrawl\.unknown/1\.0 libwww-perl/5\.803|Wells Search II|WEP Search 00

failregex = ^<HOST> -[^"]*"(GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d)" \d+ \d+ "[^"]*" "[^"]*(%(badbots)s)[^"]*"$
            ^<HOST> .*GET .*aws(/|_|-)(credentials|secrets|keys).*
            ^<HOST> .*GET .*(credentials/aws|secretes/(aws|keys)|oauth/config|config/oauth).*
            ^<HOST> .*"GET .*(freshio|woocommerce).*frontend.*" (301|404).*
            ^<HOST> .*"GET .*contact-form-7/includes.*" (301|404).*
            ^<HOST> .*"(GET|POST) .*author=.*" 404.*
            ^<HOST> .*"(GET|POST) /.*wp-json/tdw/save_css.*" (301|404).*
            ^<HOST> .*"GET /.*/.git/config.*" 404.*
            ^<HOST> .*"GET /error-404.*" (301|302|404).*
            ^<HOST> .*"(GET|POST) .*xmlrpc\.php.*" (403|404|301).*
            ^<HOST> .*"(GET|POST) //?xmlrpc\.php.*" 200.*
            ^<HOST> .*"(HEAD|GET) /(bc|bk|home|backup|old|new|wp|blog|wordpress|app/.*) .*" 404.*
            ^<HOST> .*"(HEAD|GET) .*js/core\.js .*" 404.*
            ^<HOST> .*"(HEAD|GET) .*\?(back|SubmitCurrency|order)=.*2525252525.*252525252.*" 200.*
            ^<HOST> .*"(HEAD|GET) .*((login|admin|config|lock|simple|radio|alfa|txt|autoload_classmap|wp-includes/autoload_classmap|makeasmtp|yanz|filefuns|gel4y|\.tmb/admin|access|wp-admin/includes/xmrlpc|\.well-known/pki-validation/cloud|inicio-sesion|admin-post|sidwso|pl/payu/pay)\.php|\.env|package\.json|angular\.json|config\.py|base\.py|config/env\.json|config/dev\.json|config/settings\.js|config/config\.go|config/prod\.json|appsettings\.json|config/dev_settings\.py|config/prod_settings\.py|config/application\.yml|wp-includes/Requests/(Auth|Cookie|Exception|Proxy|Response|Transport|Utility)/|wp-includes/Requests/Exception/(HTTP|Transport)) .*" 404.*
            ^<HOST> .*"GET /((aa|ss|rr|ig|in|be|go)/|/?wp-admin/(install|setup-config)\.php|/?(blog|web|wordpress)?/wp-includes/wlwmanifest.xml|/?wp-json/wp/v2/users/|/?wp-json/oembed/1.0/embed.*|.*\+41|(blocks|a11y|media-utils|api-fetch|commands|components|patterns|core-data|editor|rich-text|preferences|block-editor|keycodes)\.js|wp-content/plugins/member-access|wp-content/plugins/xml-sitemaps|wp-content/plugins/wp-hide-dashboard|post_login|wp-content/plugins/google-sitemap-generator|(inetpub|admin|tmp|temp|old)\.war|.*db\.rar|.*sql\.tar\.gz|.*db\.tar\.gz|.*backup\.zip|.*db\.tgz|.*backup\.tgz|notip\.html|images/pt_logo\.svg|images/process\.jpg|san_filez/img/alert\.svg|files/img/blank\.gif|merchantbank/pageBank/bank).*" 404.*
            ^<HOST> .*"POST (/v[1-3]/graphql|/graphql(/v[1-3])?|/graph/api).*" (404).*
            ^<HOST> .*"GET / HTTP/.*" (200|301|503) .* "http://.*:(80|443)/" ".*

It'll ban all typical attacks used these days, but careful with Wordpress, as this will not allow many mistakes on Wordpress login-related files. You might lock yourself out in case you address missing admin or login files or address XMLRPC.

Further, I recommend to set the jail.conf to immediate bans after the first violation of the rules and to ban long-term. At least 14d.

Webmore · Feb 7, 2025

Really great help and article thank you.

Webmore · Feb 15, 2025

Update: Improving Fail2Ban Regex for Better Protection

After extensive testing and analysis, I found that using the following regex in Fail2Ban significantly improved security and prevented about 60% of attacks:

failregex = ^<HOST> -[^"]*"(GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d)" \d+ \d+ "[^"]*" "[^"]*(%(badbots)s)[^"]*"$

However, attackers started using different techniques to bypass this rule and continue their malicious activities.

Enhancing the Regex for More Security

To increase protection and block even more malicious requests, I modified the regex to:

[Definition]
failregex = ^<HOST> - - \[.*\] "(GET|POST|HEAD) \/.* HTTP\/[0-9.]+" \d+ \d+ "-" ".*(bot|crawl|spider|scraper|scanner).*"
ignoreregex =

This stopped around 90% of attacks by catching a wider range of suspicious bots, crawlers, and scanners.

Unexpected Issue: Webmail Users Got Banned

However, I discovered a side effect—this new regex was blocking legitimate users who accessed webmail using URLs like:

/?_task=mail&_mbox=INBOX
/?_task=mail&_action=compose&_id=...

Since these webmail URLs matched some patterns used by bots, Fail2Ban mistakenly flagged and banned real users trying to access their emails.

Solution: Excluding Webmail from Fail2Ban

To fix this, I added an ignoreregex rule to ensure that webmail requests are not caught by Fail2Ban:

ignoreregex = .*webmail\..*\..*

After applying this rule, webmail users were no longer getting banned, and the security improvements remained effective.

If you're running a Plesk server and using Fail2Ban, I highly recommend applying these updated regex rules to better protect against automated attacks while allowing legitimate users to access webmail without issues.

Kaspar · Feb 15, 2025

Good to read your customized failregex helped you fend of most of the attacks your where dealing with. However I would not recommend anyone else to use this regex as is has some serious flaws. It might be a perfect fit for your use case, but I doubt it is for others.

The original fail2ban badbot filter is meant to filter specific bad bots. Which it does comparing user agent strings from requests to a filter list containing stings with names of known bad bots. Which allows you to ban request/connections from specific bad bots, but allow legimate bots and scrapers to do there thing. It also makes it easier to append the list with bad bots with new bot names.

Your failregex however only bans requests if the user agent string contains the words bot, crawl, spider, scraper or scanner. This leaves out a huge number of bad bots which do not have any of those words in their names. For example VIZIO, meta-externalagent, facebookexternalhit, Optimizer, seobility, Go-http-client, colly, Bytedance, EmailCollector, WebEMailExtrac, TrackBack and many, many others. Conversely it blocks bots which are generally considered legit, most notability googlebot. Your failregex also fails to take requests with referer into account.

Webmore · Feb 15, 2025

Thank you for taking the time to review my custom Fail2Ban failregex and for your detailed feedback. I really appreciate your insights, as it's clear that you have much more experience in this area than I do.

I now see the flaws in my original approach, especially the issue with only filtering based on certain keywords like "bot, crawl, spider, scraper, scanner." As you pointed out, this left out many bad bots that don’t include these words in their User-Agent strings while also mistakenly blocking legitimate bots like Googlebot.

Based on your recommendations, I am modifying my failregex to incorporate a more robust approach that targets a broader list of known bad bots while avoiding unintended bans on legitimate scrapers. I will also look into properly handling referer-based filtering to refine the accuracy further.

Again, thanks for your guidance—I genuinely appreciate it!

Best regards,

Kaspar · Feb 16, 2025

Webmore said:
Based on your recommendations, I am modifying my failregex to incorporate a more robust approach that targets a broader list of known bad bots while avoiding unintended bans on legitimate scrapers. I will also look into properly handling referer-based filtering to refine the accuracy further.

If you're searching for inspiration, have a look the failregex @Bitpalast posted in this thread. It's by far the most comprehensive failregex for fail2ban I've come across.

Webmore said:
Again, thanks for your guidance—I genuinely appreciate it!

Sure, your welcome.

Bitpalast · Feb 21, 2025

Man, Amazon is getting on my nerves. In some cases their bot now appears as "Amazonbot" without the intro parenthesis, and it is increasingly annoying, because it crawls dozens of websites at exactly the same time on the same host (shared IP). And they are doing it from dozens of source IP addresses at that very same time, so obviously Amazon is aware that they are being blocked by many due to causing high cpu load, else this behavior would not make sense. If they would simply do it sequentially, the world would be a better place.

The bot line needs to be changed from

Code:

...|\(Amazonbot/|...

to now

Code:

...|\(?Amazonbot/|...

to cater for the now optional parenthesis.

Question How to Prevent High Connection Counts Causing High Memory Usage (Possible DDoS?)

Webmore

New Pleskian

My questions:

Bitpalast

Plesk addicted!

How to Avoid High CPU Load & Block Bad Bots with Plesk

Webmore

New Pleskian

Webmore

New Pleskian

Update: Improving Fail2Ban Regex for Better Protection

Enhancing the Regex for More Security

Unexpected Issue: Webmail Users Got Banned

Solution: Excluding Webmail from Fail2Ban

Kaspar

API expert

Webmore

New Pleskian

Kaspar

API expert

Bitpalast

Plesk addicted!

Similar threads

Question How to Prevent High Connection Counts Causing High Memory Usage (Possible DDoS?)

New Pleskian

My questions:​

Plesk addicted!

New Pleskian

New Pleskian

Update: Improving Fail2Ban Regex for Better Protection​

Enhancing the Regex for More Security​

Unexpected Issue: Webmail Users Got Banned​

Solution: Excluding Webmail from Fail2Ban​

​

API expert

New Pleskian

API expert

Plesk addicted!

Similar threads

My questions:

Update: Improving Fail2Ban Regex for Better Protection

Enhancing the Regex for More Security

Unexpected Issue: Webmail Users Got Banned

Solution: Excluding Webmail from Fail2Ban