With a new domain, a new host and maybe even a new framework, launching a new website is a mixture of excitement and frustration.
At some point the question “Am I handling status codes and redirects correctly?” came out.

The first thing to do is to write down some specifications for your URL scheme:

  • www or not www?
  • .html suffix?
  • trailing slashes?

You mentally consolidate the specifications and after some juggling with Nginx/Apache you end up with your scheme.
But… is it consistent? Have you covered every case? Are you doing more than one 301 redirect from one URL to another?

There are already tools to assess this structure, from tiny command-line tools to full-blown headless browsers.
We just need to test the HTTP status code and the redirect URL, we can make good use of this situation to write a simple shell script.

Tool specifications

Our tool shouldn’t rely on external tools that aren’t included in a basic Linux installation.

The features that we need are simple:

  • check the HTTP status code
  • follow the redirect and check the destination URL

Url specification

We have a good idea of the URL scheme, so let’s write down some URL examples. We will start with the 2xx and 4xx status codes:

URL:https://gianlucafabrizi.dev/blog/posts/1brc-php/
STATUS CODE:200

URL:https://gianlucafabrizi.dev/blog/posts/not-found
STATUS CODE:404

URL:https://www.gianlucafabrizi.dev/blog/posts/also-not-found
STATUS CODE:404

then we move on to the 301 redirects:

URL:http://gianlucafabrizi.dev/
STATUS CODE:301
REDIRECTS TO:https://gianlucafabrizi.dev/

URL:http://gianlucafabrizi.dev
STATUS CODE:301
REDIRECTS TO:https://gianlucafabrizi.dev/

URL:http://www.gianlucafabrizi.dev/
STATUS CODE:301
REDIRECTS TO:https://gianlucafabrizi.dev/

URL:https://www.gianlucafabrizi.dev/blog/posts/1brc-php
STATUS CODE:301
REDIRECTS TO:https://gianlucafabrizi.dev/blog/posts/1brc-php/

We should avoid multiple 301, so we have to test if http://www.gianlucafabrizi.dev/ redirects to https://gianlucafabrizi.dev/ without intermediate hops.

Check HTTP Status

Let’s start with the code to check HTTP status:

check_status_code () {
    curl_output=$(curl -k --silent --head $1)
    status_code=$(echo "$curl_output" | grep "HTTP/" | awk '{print $2}')
    if [ $status_code -eq $2 ]; then
        echo -e " STATUS CODE: $2"
    else
        echo -e " EXPECTED STATUS CODE: $2 GOT $status_code"
    fi
}

We use grep to find this line:

HTTP/2 200

then we use awk to split the line into columns (space is the separator), and assign the second column (200 in this example) to the status_code variable.

Check redirect location

For redirected URL we first need to check for the status code, then we need to check the Location: value from the response’s header:

check_redirect () {
    curl_output=$(curl -k --silent --head $1)
    status_code=$(echo "$curl_output" | grep "HTTP" | awk '{print $2}')
    if [ $status_code -eq $2 ]; then
        echo -e "STATUS CODE: $2"
    else
        echo -e "EXPECTED STATUS CODE: $2 GOT $status_code"
    fi

    redirect_url=$(echo "$curl_output" | grep -i "location:" | awk '{print $2}' | tr -d '\r\n')
    if [ "$redirect_url" = "$3" ]; then
        echo -e "LOCATION: $3"
    else
        echo -e "EXPECTED LOCATION: $3 GOT $redirect_url"
    fi
}

The first part is the same as check_status_code ().
In the second part we grep again (this time with the -i flag for case-insensitive search) to get the Location: line, print the second column with awk and use tr to remove trailing carriage returns and newlines.
Why is grep not case-sensitive this time? Because the Location header is sometimes title case, sometimes lowercase, depending on the configuration of the web server, the cache server, the load balancer, etc…

UI improvements

The code is OK(ish…): sure there are some duplications (the status check code), but this makes the flow easier to understand and maintain.
The output is boring and not usable: there is no clear indication of whether the tests have passed or failed.
Let’s add some colour, a nicer look, a meaningful exit code and some reports about failed and passed tests:

#!env bash

BLACK='\033[0;30m'
RED='\033[0;31m'
GREEN='\033[0;32m'

BOLD_WHITE='\033[1;37m'
BOLD_ULTRA_WHITE='\033[1;97m'

NC='\033[0m' # Reset Color

BG_RED='\033[41m'
BG_GREEN='\033[42m'
BG_WHITE='\033[47m'

COUNT_TEST=0
FAILED=0
SUCCESS=0


run_tests () {
    # WRITE HERE YOUR TESTS
    check_status_code "https://www.google.com" 200
    check_status_code "https://www.google.com/not-found" 404
    check_redirect "https://google.com" 301 "https://www.google.com"
}

check_status_code () {
    echo "┌──"
    echo -e "│ ${BOLD_WHITE}Testing $1${NC}"
    echo "└──"

    COUNT_TEST=$((COUNT_TEST+1))
    curl_output=$(curl -k --silent --head $1)
    status_code=$(echo "$curl_output" | grep "HTTP/" | awk '{print $2}')
    if [ $status_code -eq $2 ]; then
        echo -e " ${BOLD_ULTRA_WHITE}${BG_GREEN} [PASS] ${NC} STATUS CODE: $2"
        SUCCESS=$((SUCCESS+1))
    else
        echo -e " ${BOLD_ULTRA_WHITE}${BG_RED} [FAIL] ${NC} EXPECTED STATUS CODE: $2 GOT $status_code"
        FAILED=$((FAILED+1))
    fi

    echo ""
}

check_redirect () {
    echo "┌──"
    echo -e "│ ${BOLD_WHITE}Testing $1${NC}"
    echo "└──"

    COUNT_TEST=$((COUNT_TEST+1))
    curl_output=$(curl -k --silent --head $1)
    status_code=$(echo "$curl_output" | grep "HTTP" | awk '{print $2}')
    test_status="ok"
    if [ $status_code -eq $2 ]; then
        echo -e " ${BOLD_ULTRA_WHITE}${BG_GREEN} [PASS] ${NC} STATUS CODE: $2"
    else
        echo -e " ${BOLD_ULTRA_WHITE}${BG_RED} [FAIL] ${NC} EXPECTED STATUS CODE: $2 GOT $status_code"
        test_status="ko"
    fi

    # using tr to remove newlines and carriage returns
    redirect_url=$(echo "$curl_output" | grep -i "location:" | awk '{print $2}' | tr -d '\r\n')
    if [ "$redirect_url" = "$3" ]; then
        echo -e " ${BOLD_ULTRA_WHITE}${BG_GREEN} [PASS] ${NC} LOCATION: $3"
    else
        echo -e " ${BOLD_ULTRA_WHITE}${BG_RED} [FAIL] ${NC} EXPECTED LOCATION: $3 GOT $redirect_url"
        test_status="ko"
    fi

    if [ $test_status = "ok" ]; then
        SUCCESS=$((SUCCESS+1))
    else
        FAILED=$((FAILED+1))
    fi

    echo ""
}


run_tests

echo -e "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓"
echo -e "┃                     ${BOLD_WHITE}Test results${NC}                     ┃"
echo -e "┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛"
echo -e " ${BLACK}${BG_WHITE} $COUNT_TEST ${NC} test(s) run, ${BOLD_ULTRA_WHITE}${BG_GREEN} $SUCCESS ${NC} passed, ${BOLD_ULTRA_WHITE}${BG_RED} $FAILED ${NC} failed\n"

if [ $FAILED -ne 0 ]; then
    exit 1
fi

To better separate the test URLs from the command itself, all the URLs to test are wrapped in a function called run_tests ().

Limitations

This is our final code. It does its job, but there are some obvious limitations:

  • the test URLs are coupled with the check logic. If you want to add/remove/modify tests, you have to edit the tool itself;
  • complex scenario can’t be handled: this tool can’t check specific response headers, HTTP method, or pass payload to URL
  • some URL may not respond to the HEAD method, but only to the GET method

Conclusion

This tool wasn’t intended to be used in a working/production/professional environment.
It’s a quick implementation of an HTTP status checker/tester. Why is that? The answer is always: because we can and because it’s fun!

The full code is also available on GitHub:
https://github.com/gfabrizi/http-status-codes-tester