• Our team is looking to connect with folks who use email services provided by Plesk, or a premium service. If you'd like to be part of the discovery process and share your experiences, we invite you to complete this short screening survey. If your responses match the persona we are looking for, you'll receive a link to schedule a call at your convenience. We look forward to hearing from you!
  • We are looking for U.S.-based freelancer or agency working with SEO or WordPress for a quick 30-min interviews to gather feedback on XOVI, a successful German SEO tool we’re looking to launch in the U.S.
    If you qualify and participate, you’ll receive a $30 Amazon gift card as a thank-you. Please apply here. Thanks for helping shape a better SEO product for agencies!
  • The BIND DNS server has already been deprecated and removed from Plesk for Windows.
    If a Plesk for Windows server is still using BIND, the upgrade to Plesk Obsidian 18.0.70 will be unavailable until the administrator switches the DNS server to Microsoft DNS. We strongly recommend transitioning to Microsoft DNS within the next 6 weeks, before the Plesk 18.0.70 release.
  • The Horde component is removed from Plesk Installer. We recommend switching to another webmail software supported in Plesk.

Question How to Prevent High Connection Counts Causing High Memory Usage (Possible DDoS?)

Webmore

New Pleskian
Hello everyone,

I know this might be a simple question and may have been asked multiple times, but I couldn’t find a clear answer.

I'm frequently receiving the following email alert from Plesk:

We have detected a critical status for one of the server parameters.
Please log in to Plesk and check the server status.
The message from Monitoring:
The memory usage status is critical!
The current value is 3.4 GiB.
When this happens, all websites on the server start running extremely slow.

After connecting via SSH and running the following command:


ss -tan state established | grep ":80\|:443" | awk '{print $4}'| cut -d':' -f1 | sort -n | uniq -c | sort -nr

I notice that one IP always has a significantly high number of connections (e.g., 184 connections from xx.xx.xx.xx).

This looks like a DDoS attack, and my current solution is to manually ban the IP using Fail2Ban, which immediately restores normal performance. However, I want to automate or prevent this before it affects my server performance.

My questions:​

  1. Is there a way to automatically block an IP that exceeds a certain number of connections in Plesk or Fail2Ban?
  2. Any other best practices to prevent these types of issues?
I’d appreciate any advice or guidance from the community!

Thanks in advance!
 
This here is still relevant, although in the meanwhile @Kaspar suggested a valuable improvement of one of the key regex lines
Improved key regex line:
Code:
failregex = ^<HOST> -[^"]*"(GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d)" \d+ \d+ "[^"]*" "[^"]*(%(badbots)s)[^"]*"$

If you want to annoy the bad guys more, then use this badbot list and regex section instead of the one I provided in the article:
Code:
badbots = VIZIO|meta-externalagent/1\.1|facebookexternalhit/1\.1 \(|python-httpx/|Pinterestbot/|aiohttp/|        |Cookiebot/|\(Amazonbot/|ClaudeBot|Optimizer|seobility|Timpibot|Go-http-client/1\.1|colly|GPTBot|AmazonBot|Bytespider|Bytedance|thesis-research-bot|fidget-spinner-bot|EmailCollector|WebEMailExtrac|TrackBack/1\.02|sogou music spider|seocompany|LieBaoFast|SEOkicks|Uptimebot|Cliqzbot|ssearch_bot|domaincrawler|AhrefsBot|DigExt|Sogou|MegaIndex\.ru|majestic12|80legs|SISTRIX|HTTrack|Semrush|MJ12|MJ12bot|MJ12Bot|Ezooms|CCBot|TalkTalk|Ahrefs|BLEXBot|Atomic_Email_Hunter/4\.0|atSpider/1\.0|autoemailspider|bwh3_user_agent|China Local Browse 2\.6|ContactBot/0\.2|ContentSmartz|DataCha0s/2\.0|DBrowse 1\.4b|DBrowse 1\.4d|Demo Bot DOT 16b|Demo Bot Z 16b|DSurf15a 01|DSurf15a 71|DSurf15a 81|DSurf15a VA|EBrowse 1\.4b|Educate Search VxB|EmailSiphon|EmailSpider|EmailWolf 1\.00|ESurf15a 15|ExtractorPro|Franklin Locator 1\.8|FSurf15a 01|Full Web Bot 0416B|Full Web Bot 0516B|Full Web Bot 2816B|Guestbook Auto Submitter|Industry Program 1\.0\.x|ISC Systems iRc Search 2\.1|IUPUI Research Bot v 1\.9a|LARBIN-EXPERIMENTAL \(efp@gmx\.net\)|LetsCrawl\.com/1\.0 \+http\://letscrawl\.com/|Lincoln State Web Browser|LMQueueBot/0\.2|LWP\:\:Simple/5\.803|Mac Finder 1\.0\.xx|MFC Foundation Class Library 4\.0|Microsoft URL Control - 6\.00\.8xxx|Missauga Locate 1\.0\.0|Missigua Locator 1\.9|Missouri College Browse|Mizzu Labs 2\.2|Mo College 1\.9|MVAClient|Mozilla/2\.0 \(compatible; NEWT ActiveX; Win32\)|Mozilla/3\.0 \(compatible; Indy Library\)|Mozilla/3\.0 \(compatible; scan4mail \(advanced version\) http\://www\.peterspages\.net/?scan4mail\)|Mozilla/4\.0 \(compatible; Advanced Email Extractor v2\.xx\)|Mozilla/4\.0 \(compatible; Iplexx Spider/1\.0 http\://www\.iplexx\.at\)|Mozilla/4\.0 \(compatible; MSIE 5\.0; Windows NT; DigExt; DTS Agent|Mozilla/4\.0 efp@gmx\.net|Mozilla/5\.0 \(Version\: xxxx Type\:xx\)|NameOfAgent \(CMS Spider\)|NASA Search 1\.0|Nsauditor/1\.x|PBrowse 1\.4b|PEval 1\.4b|Poirot|Port Huron Labs|Production Bot 0116B|Production Bot 2016B|Production Bot DOT 3016B|Program Shareware 1\.0\.2|PSurf15a 11|PSurf15a 51|PSurf15a VA|psycheclone|RSurf15a 41|RSurf15a 51|RSurf15a 81|searchbot admin@google\.com|ShablastBot 1\.0|snap\.com beta crawler v0|Snapbot/1\.0|Snapbot/1\.0 \(Snap Shots&#44; \+http\://www\.snap\.com\)|sogou develop spider|Sogou Orion spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sogou spider|Sogou web spider/3\.0\(\+http\://www\.sogou\.com/docs/help/webmasters\.htm#07\)|sohu agent|SSurf15a 11 |TSurf15a 11|Under the Rainbow 2\.2|User-Agent\: Mozilla/4\.0 \(compatible; MSIE 6\.0; Windows NT 5\.1\)|VadixBot|WebVulnCrawl\.unknown/1\.0 libwww-perl/5\.803|Wells Search II|WEP Search 00

failregex = ^<HOST> -[^"]*"(GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d)" \d+ \d+ "[^"]*" "[^"]*(%(badbots)s)[^"]*"$
            ^<HOST> .*GET .*aws(/|_|-)(credentials|secrets|keys).*
            ^<HOST> .*GET .*(credentials/aws|secretes/(aws|keys)|oauth/config|config/oauth).*
            ^<HOST> .*"GET .*(freshio|woocommerce).*frontend.*" (301|404).*
            ^<HOST> .*"GET .*contact-form-7/includes.*" (301|404).*
            ^<HOST> .*"(GET|POST) .*author=.*" 404.*
            ^<HOST> .*"(GET|POST) /.*wp-json/tdw/save_css.*" (301|404).*
            ^<HOST> .*"GET /.*/.git/config.*" 404.*
            ^<HOST> .*"GET /error-404.*" (301|302|404).*
            ^<HOST> .*"(GET|POST) .*xmlrpc\.php.*" (403|404|301).*
            ^<HOST> .*"(GET|POST) //?xmlrpc\.php.*" 200.*
            ^<HOST> .*"(HEAD|GET) /(bc|bk|home|backup|old|new|wp|blog|wordpress|app/.*) .*" 404.*
            ^<HOST> .*"(HEAD|GET) .*js/core\.js .*" 404.*
            ^<HOST> .*"(HEAD|GET) .*\?(back|SubmitCurrency|order)=.*2525252525.*252525252.*" 200.*
            ^<HOST> .*"(HEAD|GET) .*((login|admin|config|lock|simple|radio|alfa|txt|autoload_classmap|wp-includes/autoload_classmap|makeasmtp|yanz|filefuns|gel4y|\.tmb/admin|access|wp-admin/includes/xmrlpc|\.well-known/pki-validation/cloud|inicio-sesion|admin-post|sidwso|pl/payu/pay)\.php|\.env|package\.json|angular\.json|config\.py|base\.py|config/env\.json|config/dev\.json|config/settings\.js|config/config\.go|config/prod\.json|appsettings\.json|config/dev_settings\.py|config/prod_settings\.py|config/application\.yml|wp-includes/Requests/(Auth|Cookie|Exception|Proxy|Response|Transport|Utility)/|wp-includes/Requests/Exception/(HTTP|Transport)) .*" 404.*
            ^<HOST> .*"GET /((aa|ss|rr|ig|in|be|go)/|/?wp-admin/(install|setup-config)\.php|/?(blog|web|wordpress)?/wp-includes/wlwmanifest.xml|/?wp-json/wp/v2/users/|/?wp-json/oembed/1.0/embed.*|.*\+41|(blocks|a11y|media-utils|api-fetch|commands|components|patterns|core-data|editor|rich-text|preferences|block-editor|keycodes)\.js|wp-content/plugins/member-access|wp-content/plugins/xml-sitemaps|wp-content/plugins/wp-hide-dashboard|post_login|wp-content/plugins/google-sitemap-generator|(inetpub|admin|tmp|temp|old)\.war|.*db\.rar|.*sql\.tar\.gz|.*db\.tar\.gz|.*backup\.zip|.*db\.tgz|.*backup\.tgz|notip\.html|images/pt_logo\.svg|images/process\.jpg|san_filez/img/alert\.svg|files/img/blank\.gif|merchantbank/pageBank/bank).*" 404.*
            ^<HOST> .*"POST (/v[1-3]/graphql|/graphql(/v[1-3])?|/graph/api).*" (404).*
            ^<HOST> .*"GET / HTTP/.*" (200|301|503) .* "http://.*:(80|443)/" ".*

It'll ban all typical attacks used these days, but careful with Wordpress, as this will not allow many mistakes on Wordpress login-related files. You might lock yourself out in case you address missing admin or login files or address XMLRPC.

Further, I recommend to set the jail.conf to immediate bans after the first violation of the rules and to ban long-term. At least 14d.
 

Update: Improving Fail2Ban Regex for Better Protection

After extensive testing and analysis, I found that using the following regex in Fail2Ban significantly improved security and prevented about 60% of attacks:


failregex = ^<HOST> -[^"]*"(GET|POST|HEAD) \/.* HTTP\/\d(?:\.\d)" \d+ \d+ "[^"]*" "[^"]*(%(badbots)s)[^"]*"$

However, attackers started using different techniques to bypass this rule and continue their malicious activities.

Enhancing the Regex for More Security

To increase protection and block even more malicious requests, I modified the regex to:


[Definition]
failregex = ^<HOST> - - \[.*\] "(GET|POST|HEAD) \/.* HTTP\/[0-9.]+" \d+ \d+ "-" ".*(bot|crawl|spider|scraper|scanner).*"
ignoreregex =

This stopped around 90% of attacks by catching a wider range of suspicious bots, crawlers, and scanners.

Unexpected Issue: Webmail Users Got Banned

However, I discovered a side effect—this new regex was blocking legitimate users who accessed webmail using URLs like:

  • /?_task=mail&_mbox=INBOX
  • /?_task=mail&_action=compose&_id=...
Since these webmail URLs matched some patterns used by bots, Fail2Ban mistakenly flagged and banned real users trying to access their emails.

Solution: Excluding Webmail from Fail2Ban

To fix this, I added an ignoreregex rule to ensure that webmail requests are not caught by Fail2Ban:


ignoreregex = .*webmail\..*\..*

After applying this rule, webmail users were no longer getting banned, and the security improvements remained effective.


If you're running a Plesk server and using Fail2Ban, I highly recommend applying these updated regex rules to better protect against automated attacks while allowing legitimate users to access webmail without issues.
 
Good to read your customized failregex helped you fend of most of the attacks your where dealing with. However I would not recommend anyone else to use this regex as is has some serious flaws. It might be a perfect fit for your use case, but I doubt it is for others.

The original fail2ban badbot filter is meant to filter specific bad bots. Which it does comparing user agent strings from requests to a filter list containing stings with names of known bad bots. Which allows you to ban request/connections from specific bad bots, but allow legimate bots and scrapers to do there thing. It also makes it easier to append the list with bad bots with new bot names.

Your failregex however only bans requests if the user agent string contains the words bot, crawl, spider, scraper or scanner. This leaves out a huge number of bad bots which do not have any of those words in their names. For example VIZIO, meta-externalagent, facebookexternalhit, Optimizer, seobility, Go-http-client, colly, Bytedance, EmailCollector, WebEMailExtrac, TrackBack and many, many others. Conversely it blocks bots which are generally considered legit, most notability googlebot. Your failregex also fails to take requests with referer into account.
 
Thank you for taking the time to review my custom Fail2Ban failregex and for your detailed feedback. I really appreciate your insights, as it's clear that you have much more experience in this area than I do.

I now see the flaws in my original approach, especially the issue with only filtering based on certain keywords like "bot, crawl, spider, scraper, scanner." As you pointed out, this left out many bad bots that don’t include these words in their User-Agent strings while also mistakenly blocking legitimate bots like Googlebot.

Based on your recommendations, I am modifying my failregex to incorporate a more robust approach that targets a broader list of known bad bots while avoiding unintended bans on legitimate scrapers. I will also look into properly handling referer-based filtering to refine the accuracy further.

Again, thanks for your guidance—I genuinely appreciate it!

Best regards,
 
Based on your recommendations, I am modifying my failregex to incorporate a more robust approach that targets a broader list of known bad bots while avoiding unintended bans on legitimate scrapers. I will also look into properly handling referer-based filtering to refine the accuracy further.
If you're searching for inspiration, have a look the failregex @Bitpalast posted in this thread. It's by far the most comprehensive failregex for fail2ban I've come across.

Again, thanks for your guidance—I genuinely appreciate it!
Sure, your welcome.
 
Man, Amazon is getting on my nerves. In some cases their bot now appears as "Amazonbot" without the intro parenthesis, and it is increasingly annoying, because it crawls dozens of websites at exactly the same time on the same host (shared IP). And they are doing it from dozens of source IP addresses at that very same time, so obviously Amazon is aware that they are being blocked by many due to causing high cpu load, else this behavior would not make sense. If they would simply do it sequentially, the world would be a better place.

The bot line needs to be changed from
Code:
...|\(Amazonbot/|...
to now
Code:
...|\(?Amazonbot/|...
to cater for the now optional parenthesis.
 
Back
Top