Blocking invasive web crawlers from my site

Today, I decided to take a peek at my web server access logs, and saw that there were two bots that were attempting to index literally anything they could find on my sites, following links, adding php options to the urls, etc. Take a look at the example below:

git.snowcake.me:443 85.208.96.210 - - [18/Sep/2023:19:44:04 -0400] "GET /mirrors/Bibata_Cursor/issues?assignee=1&labels=0&milestone=0&poster=0&project=0&q&sort=mostcomment&state=open&type=all HTTP/1.1" 200 9511 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" git.snowcake.me:443 85.208.96.207 - - [18/Sep/2023:19:45:01 -0400] "GET /mirrors/Atmosphere/commit/fca213460bcd8cd826dc507769ee5100695d496e HTTP/1.1" 200 21244 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" git.snowcake.me:443 85.208.96.209 - - [18/Sep/2023:19:45:21 -0400] "GET /primrose/switch-sigpatches/pulls?assignee=0&labels&milestone=-1&poster=0&project=0&q&sort=farduedate&state=closed&type=all HTTP/1.1" 200 9509 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" git.snowcake.me:443 185.191.171.9 - - [18/Sep/2023:19:45:27 -0400] "GET /mirrors/hbc-archive/issues?assignee=1&labels=0&milestone=-1&poster=0&project=-1&q&sort=farduedate&state=open&type=all HTTP/1.1" 200 9551 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" git.snowcake.me:443 85.208.96.207 - - [18/Sep/2023:19:46:11 -0400] "GET /primrose/emsite/pulls?assignee=1&labels&milestone=0&project=-1&q&sort=farduedate&state=closed&type=all HTTP/1.1" 200 9406 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" git.snowcake.me:443 185.191.171.6 - - [18/Sep/2023:19:47:31 -0400] "GET /mirrors/apple_cursor/issues?labels&milestone=0&poster=0&project=0&q&sort=farduedate&state=open&type=all HTTP/1.1" 200 9463 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" git.snowcake.me:443 185.191.171.14 - - [18/Sep/2023:19:47:45 -0400] "GET /mirrors/wii/commit/2a16bf72f527e66eef7689469d5bdc95228b74de?show-outdated&style=unified&whitespace=ignore-eol HTTP/1.1" 200 16062 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" git.snowcake.me:443 185.191.171.8 - - [18/Sep/2023:19:47:53 -0400] "GET /mirrors/wii/issues?assignee=-1&labels=0&poster=0&q&sort=oldest&state=closed&type=all HTTP/1.1" 200 9527 "-" "Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)" git.snowcake.me:443 51.222.253.1 - - [18/Sep/2023:19:49:40 -0400] "GET /mirrors/speedie-page/commit/17c140a78744619953b9ad0406842942e980088d HTTP/1.1" 200 15480 "-" "Mozilla/5.0 (compatible; AhrefsBot/7.0; +http://ahrefs.com/robot/)"

This nonsense of attempting to index all the git repositories on my Forgejo instance goes on and on and on in the log for tons of lines at at time. If you look closer, some of the GET requests seem to be getting full commits, and of course, this will eventually add up and waste my bandwidth.

There is also another notable bot that I found while perusing through my access logs:

snowcake.me:443 162.216.149.224 - - [18/Sep/2023:19:12:03 -0400] "GET / HTTP/1.1" 200 2453 "http://98.116.68.200:80/" "Expanse, a Palo Alto Networks company, searches across the global IPv4 space multiple times per day to identify customers' presences on the Internet. If you would like to be excluded from our scans, please send IP addresses/domains to: scaninfo@paloaltonetworks.com"

I am not even a customer of Palo Alto Networks, so this is just yet another annoyance for me.

To solve these issues, I simply put the following into each virtual host config for the nginx reverse proxy:

if ($http_user_agent ~* semrushbot|dotbot|expanse|palo|alto|gptbot) { return 403; }

This works really well, and because this catches the annoying bots at the reverse proxy level, the other virtual machines that run my apache web servers don't need to waste their resources processing these bogus requests.

I also put the GPTBot into the blacklist, because fuck OpenAI and fuck artificial intelligence in general. There are probably a ton of other crawlers used to gather training data for AI models, but I am not sure where to find a big list of them. If you happen to know of some I can add to my blacklist, please send me an email at primrose@snowcake.me. Thanks!

Signed,

Primrose


Stray has got to be the best PC game I've ever played

Yesterday, I finished the game Stray. For those who don't know, Stray is a game that focuses around a stray ginger cat exploring a hidden cyberpunk underground city, and looks for a way to get back to the outside again after getting separated from their group of cat friends. In my opinion, the storyline is one of the best I've ever seen in a game, and the graphics and soundtrack are amazing. You get to do so much along the way, and the ending left me in tears. I can't go into too many details without spoiling it for those who haven't played yet though.

What's even more surprising is the fact that this is Bluetwelve Studio's first game. Absolutely astonishing considering the quality of this game. Took me about 6 hours to get through everything, but it was worth it. Maybe I'm biased because I love cats, but even if you take the cats away from the game, how high quality everything else is, is remarkable. I highly recommend playing Stray if you don't know what else to play and are looking for a game with a good storyline.

You can find more information on the game at it's official website, linked here.

Signed,

Primrose


Pokemon Scarlet/Violet runs like trash on the Switch

This is a bit off topic from what I usually write about, and is probably old news too but, I wanted to write about this anyway.

Pokemon Scarlet/Violet runs like absolute garbage on switch hardware. It is nothing but constant frame rate drops and dropped frames. Personally, something like that bothers me A LOT.

To me, it's annoying enough that it makes the game unplayable. It has left such a sour taste in my mouth that I think i'll just delete my save data for Pokemon Scarlet/Violet and play through Sword/Shield again, because at least that actually runs well on switch hardware.

I can't believe Nintendo and Gamefreak gave this the go ahead as the final product. Absolutely disgusting.

I may revisit Scarlet/Violet in an emulator, or once, if ever, Gamefreak irons out the performance issues.

This game has been out for almost a year now, and these performance issues still exist. Unacceptable.

Signed,

Primrose


Why do so many apple shills buy poorly made accessories for their devices?

Apple products are all around us. Normies love apple products! But, i'm sure all of you already knew that.

What I find funny about this though, is how the people who buy apple products end up buying crappy, poorly made accessories for their devices, made in chinese sweatshops other than the foxconn ones.

The most common cheap accessory I see apple users carrying around are non-OEM chargers.

I mean, seriously, if you spent 100 USD on that product, I would understand, but if your spending upwards of 700-1000 USD, or however much that apple product costs, wouldn't you want to protect your expensive (yet overpriced) purchase?

It defies common logic in my opinion, and I will never understand the rationale behind this logic.

I guess these people are too broke after buying a product for the logo on it that they just don't have enough money left over in their budget for OEM chargers. (especially now since apple isn't including power supplys with their cell phones! I love greenwashing!)

The last thing anyone should do though, is cheap out on chargers for their devices (especially overpriced apple ones).

It's not going to be fun when their cheap knockoff chargers or power supplys send more voltage to the charging ICs in their device than they are rated for, and their motherboards go boom.

Then the "geniuses" over at the "genius" bar in the apple store are just going to tell you to "buy a new one".

Also, cheap power supplys tend to be horribly made and often meltdown, explode, and cause fires, so uh, unless you want your place of residence to go up in flames while your sleeping, I would stay as far away from those as possible.

I think i'll end this ramble here though, so thanks for reading, and goodbye.

Signed,

Primrose


I found a laptop with a boatload of personal data still on it

Last Sunday, I was downstairs in the basement of my apartment building, where I stumbled across two laptops. Both looked fairly old, and upon closer inspection later, were from 2012-2013.

I decided to take them with me just to tinker around with them and see if anything was wrong with them.

The HP machine was in pretty good condition, so I decided to take a look at that one first. I went ahead and inserted a USB with Linux Mint on it, and booted the machine up.

I then went ahead and mounted the internal drive, and to my surprise, it was not wiped.

I was able to go through the documents and downloads folder of the previous owner, and oh boy, was there a LOT of personal data on it.

Documents such as passport applications, scans of drivers licenses, bank statements, and more could be found RIGHT THERE on the drive. I was in awe at how someone could just toss a machine out like this.

The previous owner didn't even bother taking out the hard drive, even though it could have been taken out by removing two screws on the bottom of the machine.

I was planning on contacting the previous owner about their very stupid mistake, but by the time I tried to access the data on the drive again, the drive started failing on me and giving me a bunch of input/output errors, so I wasn't able to.

The Dell machine on the other hand, was in poor condition, and looked like it was used by a child judging by the stickers on the machine.

When I tried to look through it's internal drive, it was already dead, so there's that.

To wrap this all up though, I would like to close with this:

WIPE THE DATA ON YOUR MACHINES BEFORE YOU THROW THEM OUT, THANK YOU

Signed,

Primrose