AdamJacobMuller 4 months ago
301 response to a selection of very large files hosted by companies you don't like.
When their AWS instances start downloading 70000 windows ISOs in parallel, they might notice.
Hard to do with cloudflare but you can also tar pit them. Accept the request and send a response, one character at a time (make sure you uncork and flush buffers/etc), with a 30 second delay between characters.
700 requests/second with say 10Kb headers/response. Sure is a shame your server is so slow.
swiftcoder 4 months ago
gzip bomb is good if the bot happens to be vulnerable, but even just slowing down their connection rate is often sufficient - waiting just 10 seconds before responding with your 404 is going to consume ~7,000 ports on their box, which should be enough to crash most linux processes (nginx + mod-http-echo is a really easy way to set this up)
neya 4 months ago
I was so pissed off that I setup a redirect rule for it to send them over to random porn sites. That actually stopped it.
xena 4 months ago
yabones 4 months ago
bigfatkitten 4 months ago
scrps 4 months ago
rkagerer 4 months ago
I wrote a quick-and-dirty program that reads the authoritative list of all AWS IP ranges from https://ip-ranges.amazonaws.com/ip-ranges.json (more about that URL at the blog post https://aws.amazon.com/blogs/aws/aws-ip-ranges-json/), and creates rules in Windows Firewall to simply block all of them. Granted, it was a sledgehammer, but it worked well enough.
Here's the README.md I wrote for the program, though I never got around to releasing the the code: https://markdownpastebin.com/?id=22eadf6c608448a98b6643606d1...
It ran for some years as a scheduled task on a small handful of servers, but I'm not sure if it's still in use today or even works anymore. If there's enough interest I might consider publishing the code (or sharing it with someone who wants to pick up the mantle). Alternatively it wouldn't be hard for someone to recreate that effort.
G'luck!
bcwhite 4 months ago
MrThoughtful 4 months ago
jedberg 4 months ago
geraldcombs 4 months ago
stevoski 4 months ago
This is from your own post, and is almost the best answer I know of.
I recommending you configure a Cloudflare WAF rule to block the bot - and then move on with your life.
Simply block the bot and move on with your life.
locusm 4 months ago
_pdp_ 4 months ago
Depending on how the crawler is designed this may or may not work. If they are using SQS with Lambda then that will obviously not work but it will fire back nevertheless because the serverless functions will be running for longer (5 - 15 minutes).
Another technique that comes to mind is to try to force the client to upgrade the connection (i.e. websocket). See what will happen. Mostly it will fail but even if it gets stalled for 30 seconds that is a win.
Retric 4 months ago
Through discovery you can get the name of the parties involved from Amazon, but Amazon is very likely to drop them as a client solving the issue.
pickle-wizard 4 months ago
molszanski 4 months ago
Scotrix 4 months ago
n_u 4 months ago
Sometimes these crawlers are just poorly written not malicious. Sometimes it’s both.
I would try a zip bomb next. I know there’s one that is 10 MB over the network and unzips to ~200TB.
jimrandomh 4 months ago
tushar-r 4 months ago
Jean-Papoulos 4 months ago
Rothnargoth 4 months ago
It sounds like the bot operator is spending enough on AWS to withstand the current level of abuse reports.
If you really wanted to retaliate, you could try getting a warrant to force AWS to disclose the owners of that AWS instance.
bcwhite 4 months ago
shishcat 4 months ago
1a527dd5 4 months ago
theginger 4 months ago
g-mork 4 months ago
Similarly, you can also try delivering one byte every 10 seconds or 30 seconds or whatever keeps the client on the other end hanging around for without hitting an internal timeout.
for char in itertools.repeat(b"FUCKOFF"):
await resp.send(char)
await resp.flush()
await asyncio.sleep(10)
# etc
In the SMTP years we called this tarpitting IIRClucastech 4 months ago
I wish AWS would curtail abuse from their networks. My hope is to build some tools to automate detection and reporting of this sort of abuse, so we can force it into AWS's court.
giardini 4 months ago
kachapopopow 4 months ago
Bender 4 months ago
Assuming one trusts the user-agent in this case one could reduce the traffic reply to them and avoid touching the disk or any applications in Nginx with something like:
if ($http_user_agent ~ (crawler|some-other-bot) ) { return 200 '\n\n\n\nBot quota exceeded, check back in 2150 years.\n\n\n\n'; }
There are other variables to look for to see if something is a bot but such things should be very well tested. $http_accept_language, $http_sec_fetch_mode, etc...I don't use CF but maybe they have a way to block the entire ASN for AWS on your account assuming one does not need inbound connections from them. I just blackhole their CIDR blocks [1] but that won't help someone using a CDN.
___timor___ 3 months ago
Gzip, deflate bomb could be another alternative.
Or maybe redirect them to a boat that can respond with TARP IT, filling their connections.
4 months ago
Comment deletedznpy 4 months ago
Make it follow redirects to some kind of illegal website. Be creative, I guess.
The reasoning being that if you can get AWS to trigger security measures on their side, maybe AWS will shut down their whole account.
2OEH8eoCRo0 4 months ago
> The traffic is hitting numbers that require me to re-negotiate my contract with CloudFlare and is otherwise a nuisance when reviewing analytics/logs.
So you're able to show financial hardship
lloydatkinson 4 months ago
nijave 4 months ago
Another idea is replying with large cookies and seeing if the bot saves them and replies with them (once again, to eat traffic)
The idea is to increase their egress to the point someone notices (the bill)
janis1234 4 months ago
nurettin 4 months ago
lgats 4 months ago
I decided to do some testing with redirecting to a small vps that just keeps the connections open and sends a byte every 10-30 seconds. This worked and the traffic substantially dropped off. After doing some more digging though, I got concerned this may be in itself an abuse of my VPS providers ToS. The risk did not outweigh the benefit. Gzip bombs fell under a similar category of concern.
hamburgererror 4 months ago
hyperknot 4 months ago
4 months ago
Comment deletedsp1982 4 months ago
reconnecting 4 months ago
I'd suggest taking a look into patterns and IP rotation (if any) and perhaps blocking IP CIDR at the web server level, if the range is short.
Why simple deny from 12.123.0.0/16 (Apache) is not working for you?
jeroenhd 4 months ago
As for trying to get them to stop, maybe redirect the bot to random IP:port combinations in a network that's less friendly to being scanned? I believe certain parts of DoD IP space tends to not look kindly upon attempts to scan them.
Depending on your setup, you could try to poison the bot's DNS for your domain. Send them the IP address of their local police force maybe.
My guess is that this is yet another AI scraper. There are others complaining about this bot online but all they seem to come up with is blocking the ASN in Cloudflare.
If there's no technical solution, if consider consulting with a legal professional to see if you can get Amazon to take action. Lawyers are expensive, but so is a Cloudflare bill when they decide you need to be on the "enterprise" tier.
TZubiri 4 months ago
ahazred8ta 4 months ago
sim7c00 4 months ago
otherwise, maybe redirect to aws customer portal or something -_- maybe they will stop it if it hit themselves...
sph 4 months ago
iptables -A INPUT -s $bot_ip -j DROPpknerd 4 months ago
brunkerhart 4 months ago
ipaddr 4 months ago
snvzz 4 months ago
jiggawatts 4 months ago
Instant results, I guarantee it.
Look up key AWS staff names in Singapore (blogs, talks, etc…) and mention them as plaintiffs.
Nobody cares about these things until they are directly impacted themselves.
Nothing has to actually happen! A letter is cheap.
But it’s the implication that matters. Just discovery can cost them more than the profit from some scummy web scraper.
cactusplant7374 4 months ago
4 months ago
Comment deletedpingoo101010 4 months ago
It's a reverse-proxy / load balancer with built-in firewall and automatic HTTPS. You will be able to easily block the annoying bots with rules (https://pingoo.io/docs/rules)
2000swebgeek 4 months ago
throwaway127482 4 months ago
realaaa 4 months ago
JCM9 4 months ago
AWS has become rather large and bloated and does stupid things sometimes, but they do still respond when you get their lawyers involved.
reisse 4 months ago