In parts one through three, we set up a web server, connected it to a domain name, and instituted some basic security. Now comes the fun part! The web server is humming along doing its thing, and we can watch and admire, making little improvements here and there. It’s not required, but for some, this is part of the payoff.

A webserver needs weeding and watering

It can be counterintuitive that a thing made out of code should need ongoing attention and care. On the surface it seems like it should be self-sufficient, like a wall made of stones. We put the pieces in place where we want them and when we’re happy with it, we stop. It's all silicon and bits after all, why should it need watching?

The bigger picture, though, is that the web server operates in a world that’s always changing. Software updates cause tools to behave differently. Edits happen to the HTML and other content we host. There are dramatic changes in who is trying to reach the content and for what purposes. There can be outages, policy changes, and any number of second-order effects in the wider world that can make our web server stop operating the way we want. So on that scale a web server starts to more closely resemble a vegetable garden—something growing, decaying, and very much a product of the environment that it's in.

This page is my notes on care and feeding practices I've found helpful and enjoyable. I'm not an expert on this, and this is not authoritative by any means. But I put it here in case you find it helpful. If I missed anything, or got it egregiously wrong, please let me know.

Setting

In these examples I'm working with a DigitalOcean droplet at 138.197.69.146 hosting the content for the domain brandonrohrer.com. It's running an nginx server on Ubuntu 24.04. My local machine is a MacBook pro, where I'm working from the Terminal. My go-to text editor is vim, but you can use nano instead. Adjust the snippets below for your situation.

Browse the logs

I've been curious about self-hosting for a while, but the thing that pushed me over the edge into migrating off Netlify was a desire to see which IP addresses were visiting which pages when. This information is all in the logs.

Because I set up a server block specifically for my domain, the log files live in /var/log/nginx/brandonrohrer.com/.

access.log has the main logs, and error.log keeps a log of when things go poorly.

To browse the logs

cat /var/log/nginx/brandonrohrer.com/access.log

which gives something like

49.51.195.195 - - [14/Sep/2025:08:20:50 -0400] "GET /hosting2 HTTP/1.1" 200 5957 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 13_2_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.3 Mobile/15E148 Safari/604.1"
183.98.90.239 - - [14/Sep/2025:08:20:50 -0400] "GET /images/ml_logo.png HTTP/1.1" 301 178 "https://www.brandonrohrer.com/blog.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36"
24.126.100.175 - - [14/Sep/2025:08:21:02 -0400] "GET /feed.xml HTTP/1.1" 304 0 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:136.0) Gecko/20100101 Firefox/136.0"
52.187.246.128 - - [14/Sep/2025:08:21:16 -0400] "GET /transformers.html HTTP/1.1" 200 36194 "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot"

cat /var/log/nginx/brandonrohrer.com/error.log

which gives errors like

2025/09/14 00:37:00 [error] 187956#187956: *28044 access forbidden by rule, client: 78.153.140.50, server: brandonrohrer.com, request: "GET /.env.bak HTTP/1.1", host: "138.197.69.146"
2025/09/14 00:37:00 [error] 187956#187956: *28045 access forbidden by rule, client: 78.153.140.50, server: brandonrohrer.com, request: "GET /mail/.env.db HTTP/1.1", host: "138.197.69.146"
2025/09/14 00:37:01 [error] 187956#187956: *28046 access forbidden by rule, client: 78.153.140.50, server: brandonrohrer.com, request: "GET /dev/.env.old HTTP/1.1", host: "138.197.69.146"
2025/09/14 00:37:01 [error] 187956#187956: *28047 access forbidden by rule, client: 78.153.140.50, server: brandonrohrer.com, request: "GET /crm/.env.bak HTTP/1.1", host: "138.197.69.146"

Scanning through these for a few minutes can reveal some fascinating patterns, a few of which will pop up later in the post.

Check for missed pages

With a slight tweak, this command can pull out only the logs containing a "404", the http status code for Page Not Found. (It actually pulls out every log that has a 404 anywhere in it, but the majority of these are page not found entries.)

cat /var/log/nginx/brandonrohrer.com/access.log | grep 404

This slice of the logs reveals files I don't have, but are normal to look for, like /favicon.ico and /.well-known/traffic-advice. It also showed mistakes like a misspelled filename or a missing image.

Catch pages without the `.html`

One of the first surprises I got was looking 404's in the logs was that visitors coming to my page looking for a page called transformers.html would be turned away if they only put in transformers. There were a lot of 404 log entries for misses of this sort. This was a bummer. These were people who are trying to visit my website but are being denied on a technicality.

Luckily nginx makes it straightforward to fix this. I edited the file containing the server block

sudo vi /etc/nginx/sites-available/brandonrohrer.com

and changed the line that read

try_files $uri $uri/ =404;

so that it read

try_files $uri $uri.html $uri/ =404;

When nginx parses the webpage being requested, $uri is the variable containing the page name. The modified line instructs the server to first try the exact page name requested, then to try it with a .html tagged onto the end, then to try it with a backslash on the end, and if none of those turn anything up, return a 404 page not found error.

After making this change (or any of the other changes described below) it's important to first test that we didn't goober anything up

sudo nginx -t

and if our change passes the test and nginx is happy with it, then restart nginx so that the change takes effect

sudo systemctl restart nginx

Redirects

In addition to accounting for missing html extensions, I've found it helpful to automatically expand short names into longer ones. For example, numba automatically redirects to numba_tips.html. It's also good for fixing typos in published pages. I've redirected statistics_resources.html to stats_resources.html because I shared the wrong URL in a publication.

I also use it as a link shortener, so that fnc actually points to a pdf in a codeberg repository and bp redirects to a video about backpropagation on youtube. For the record, brandonrohrer.com is a terrible domain name for a link shortener. I also picked up tyr.fyi as a shorter domain for when I want erally short links.

https://www.digitalocean.com/community/tutorials/nginx-rewrite-url-rules

Redirects are implemented in the domain's server block.

sudo vi /etc/nginx/sites-available/brandonrohrer.com

Add a line like this within the server block.

location = /numba {
    return 301 $scheme://brandonrohrer.com/numba_tips.html;
}

This instruction tells the server to look for a request that looks like brandonrohrer.com/numba and forwards the request to brandonrohrer.com/numba_tips.html. $scheme preserves the http or https, whatever was used in the original request.

Similarly, these lines take any request like brandonrohrer.com/bp and redirect it toward a specific video URL.

location = /bp {
    return 301 https://www.youtube.com/watch?v=6BMwisTZFr4;
}

Location blocks

Redirects are an example of what can be done with location blocks. They can also be used to rewrite requests, or block access.

In addition to exact matching on page names, location blocks can match on partial names, directories, and regex-specified patterns. They give fine grained control for how individual pages are accessed and even which IP addresses are allowed to access them, but they can quickly become complicated. Individual requests can match multiple location blocks, and the match is not determined on a first-come, first-served bases, but rather based on a set of rules. Use with due caution. The DigitalOcean docs on how location blocks get matched are a great resource if you want to dig into this.

Set up log rotation

By default, access logs are stored in access.log and error logs are stored in error.log and those files just keep getting longer and longer. I find it really helpful to have one an automatic log rotation in place where each file contains just one day's access logs or errors. Todays are called access.log and error.log, yesterday's are access.log.1 and error.log.1, and the day before that access.log.2, et cetera, covering the last couple of weeks. (Starting at day 2 they are also gzipped.) There is a great guide to setting this up in the DO docs. The meat of the setup involves modifying the file /etc/logrotate.d/nginx

After modifying it, this is what my log rotation config looks like

/var/log/nginx/brandonrohrer.com/*.log {
	daily
	missingok
	rotate 14
	compress
	delaycompress
	notifempty
	create 0644 www-data adm
	sharedscripts
	prerotate
		if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
			run-parts /etc/logrotate.d/httpd-prerotate; \
		fi \
	endscript
	postrotate
		invoke-rc.d nginx rotate >/dev/null 2>&1
	endscript
}

Find a content provider for larger files

If you start offering files larger than a couple megabytes and you start getting more than a handful of views per day, your bandwidth requirements can climb quickly. One way around this is to keep your large files like video, audio, and big images, and other beefy files somewhere other than your web server.

There are several services that offer image, audio, and video hosting, and some of them have free tiers. YouTube is a popular option for hosting vides, but I find ads and irrelevant recommendations so annoying that I've paid a few dollars for a bottom-tier Vimeo account. I don't do a lot of audio-only content, so no recommendations there. My biggest bandwidth hog is my image catalog. I explored using a content hosting service for them but wasn't interested in all the bells and whistles. I eventually settled on abusing GitHub as a hosting service. There seem to be no limitations on repository size, no throttling on bandwidth (at least at the scales I'm using it for) and I'm OK with the trade-off of free hosting in exchange for giving Microsoft unfettered access to my images.

As a result the bandwidth from my server tends to settle around 10 kB/s on average, while serving a few thousand page views per day. It helps me keep my hosting costs low.

The baseline bandwidth of the webserver has an average of about 10 kb/s