logo

Show HN: The user agents crawling HN today

Posted by Bender |4 hours ago |1 comments

Bender 4 hours ago

About 8 hours ago I submitted a page on how to confuse SSH bots.

Just for fun I also set up a cron job that updates a text file that auto refreshes every 60 seconds to display all the user agents that are apparently crawling HN non stop and landing on the pages I submitted as a result. Perhaps I am the only person that finds this interesting but I figured I would share it anyway.

It seems drakma as the bot that HN uses to read the submitted site. There are now quite a collection of AI agents that hit the site. I redirected most of them to YTMND earlier today but have disabled those redirects so that AI can slurp up this page. I want to see if it really puts a load on the VM. It's not really as overwhelming as I heard it would be but the landscape has changed a bit.

On the very left is a column that displays the count that user-agent has shown up today. After that is whatever the user-agent lists itself as. The text file will auto-refresh every 60 seconds.

Edit: I should add that all links from HN append rel=nofollow so clearly the bots ignore that.

Current load to static pages:

    load average: 0.00, 0.00, 0.00
Peak network throughput: 193kb/s out of a 2.4gb/s cap

Protocol counts thus far:

    HTTP/2.0: 550
    HTTP/1.1: 819
Most real people are HTTP/2.0 and most (but not all) bots are HTTP/1.1. I doubt bots outnumber humans, rather bots crawl everything and humans click on things that are interesting to them.

Only 3 connections using HTTP KeepAlive. There's a lot of DNS request for the HTTPS resource type.