With a new domain, a new host and maybe even a new framework, launching a new website is a mixture of excitement and frustration.
At some point the question “Am I handling status codes and redirects correctly?” came out.
The first thing to do is to write down some specifications for your URL scheme:
wwwor notwww?.htmlsuffix?- trailing slashes?
You mentally consolidate the specifications and after some juggling with Nginx/Apache you end up with your scheme.
But… is it consistent? Have you covered every case? Are you doing more than one 301 redirect from one URL to another?
There are already tools to assess this structure, from tiny command-line tools to full-blown headless browsers.
We just need to test the HTTP status code and the redirect URL, we can make good use of this situation to write a simple shell script.
Tool specifications
Our tool shouldn’t rely on external tools that aren’t included in a basic Linux installation.
The features that we need are simple:
- check the HTTP status code
- follow the redirect and check the destination URL
Url specification
We have a good idea of the URL scheme, so let’s write down some URL examples. We will start with the 2xx and 4xx status codes:
| URL: | https://gianlucafabrizi.dev/blog/posts/1brc-php/ |
| STATUS CODE: | 200 |
| URL: | https://gianlucafabrizi.dev/blog/posts/not-found |
| STATUS CODE: | 404 |
| URL: | https://www.gianlucafabrizi.dev/blog/posts/also-not-found |
| STATUS CODE: | 404 |
then we move on to the 301 redirects:
| URL: | http://gianlucafabrizi.dev/ |
| STATUS CODE: | 301 |
| REDIRECTS TO: | https://gianlucafabrizi.dev/ |
| URL: | http://gianlucafabrizi.dev |
| STATUS CODE: | 301 |
| REDIRECTS TO: | https://gianlucafabrizi.dev/ |
| URL: | http://www.gianlucafabrizi.dev/ |
| STATUS CODE: | 301 |
| REDIRECTS TO: | https://gianlucafabrizi.dev/ |
| URL: | https://www.gianlucafabrizi.dev/blog/posts/1brc-php |
| STATUS CODE: | 301 |
| REDIRECTS TO: | https://gianlucafabrizi.dev/blog/posts/1brc-php/ |
We should avoid multiple 301, so we have to test if http://www.gianlucafabrizi.dev/ redirects to https://gianlucafabrizi.dev/ without intermediate hops.
Check HTTP Status
Let’s start with the code to check HTTP status:
check_status_code () {
curl_output=$(curl -k --silent --head $1)
status_code=$(echo "$curl_output" | grep "HTTP/" | awk '{print $2}')
if [ $status_code -eq $2 ]; then
echo -e " STATUS CODE: $2"
else
echo -e " EXPECTED STATUS CODE: $2 GOT $status_code"
fi
}
We use grep to find this line:
HTTP/2 200
then we use awk to split the line into columns (space is the separator), and assign the second column (200 in this example) to the status_code variable.
Check redirect location
For redirected URL we first need to check for the status code, then we need to check the Location: value from the response’s header:
check_redirect () {
curl_output=$(curl -k --silent --head $1)
status_code=$(echo "$curl_output" | grep "HTTP" | awk '{print $2}')
if [ $status_code -eq $2 ]; then
echo -e "STATUS CODE: $2"
else
echo -e "EXPECTED STATUS CODE: $2 GOT $status_code"
fi
redirect_url=$(echo "$curl_output" | grep -i "location:" | awk '{print $2}' | tr -d '\r\n')
if [ "$redirect_url" = "$3" ]; then
echo -e "LOCATION: $3"
else
echo -e "EXPECTED LOCATION: $3 GOT $redirect_url"
fi
}
The first part is the same as check_status_code ().
In the second part we grep again (this time with the -i flag for case-insensitive search) to get the Location: line, print the second column with awk and use tr to remove trailing carriage returns and newlines.
Why is grep not case-sensitive this time? Because the Location header is sometimes title case, sometimes lowercase, depending on the configuration of the web server, the cache server, the load balancer, etc…
UI improvements
The code is OK(ish…): sure there are some duplications (the status check code), but this makes the flow easier to understand and maintain.
The output is boring and not usable: there is no clear indication of whether the tests have passed or failed.
Let’s add some colour, a nicer look, a meaningful exit code and some reports about failed and passed tests:
#!env bash
BLACK='\033[0;30m'
RED='\033[0;31m'
GREEN='\033[0;32m'
BOLD_WHITE='\033[1;37m'
BOLD_ULTRA_WHITE='\033[1;97m'
NC='\033[0m' # Reset Color
BG_RED='\033[41m'
BG_GREEN='\033[42m'
BG_WHITE='\033[47m'
COUNT_TEST=0
FAILED=0
SUCCESS=0
run_tests () {
# WRITE HERE YOUR TESTS
check_status_code "https://www.google.com" 200
check_status_code "https://www.google.com/not-found" 404
check_redirect "https://google.com" 301 "https://www.google.com"
}
check_status_code () {
echo "┌──"
echo -e "│ ${BOLD_WHITE}Testing $1${NC}"
echo "└──"
COUNT_TEST=$((COUNT_TEST+1))
curl_output=$(curl -k --silent --head $1)
status_code=$(echo "$curl_output" | grep "HTTP/" | awk '{print $2}')
if [ $status_code -eq $2 ]; then
echo -e " ${BOLD_ULTRA_WHITE}${BG_GREEN} [PASS] ${NC} STATUS CODE: $2"
SUCCESS=$((SUCCESS+1))
else
echo -e " ${BOLD_ULTRA_WHITE}${BG_RED} [FAIL] ${NC} EXPECTED STATUS CODE: $2 GOT $status_code"
FAILED=$((FAILED+1))
fi
echo ""
}
check_redirect () {
echo "┌──"
echo -e "│ ${BOLD_WHITE}Testing $1${NC}"
echo "└──"
COUNT_TEST=$((COUNT_TEST+1))
curl_output=$(curl -k --silent --head $1)
status_code=$(echo "$curl_output" | grep "HTTP" | awk '{print $2}')
test_status="ok"
if [ $status_code -eq $2 ]; then
echo -e " ${BOLD_ULTRA_WHITE}${BG_GREEN} [PASS] ${NC} STATUS CODE: $2"
else
echo -e " ${BOLD_ULTRA_WHITE}${BG_RED} [FAIL] ${NC} EXPECTED STATUS CODE: $2 GOT $status_code"
test_status="ko"
fi
# using tr to remove newlines and carriage returns
redirect_url=$(echo "$curl_output" | grep -i "location:" | awk '{print $2}' | tr -d '\r\n')
if [ "$redirect_url" = "$3" ]; then
echo -e " ${BOLD_ULTRA_WHITE}${BG_GREEN} [PASS] ${NC} LOCATION: $3"
else
echo -e " ${BOLD_ULTRA_WHITE}${BG_RED} [FAIL] ${NC} EXPECTED LOCATION: $3 GOT $redirect_url"
test_status="ko"
fi
if [ $test_status = "ok" ]; then
SUCCESS=$((SUCCESS+1))
else
FAILED=$((FAILED+1))
fi
echo ""
}
run_tests
echo -e "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓"
echo -e "┃ ${BOLD_WHITE}Test results${NC} ┃"
echo -e "┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛"
echo -e " ${BLACK}${BG_WHITE} $COUNT_TEST ${NC} test(s) run, ${BOLD_ULTRA_WHITE}${BG_GREEN} $SUCCESS ${NC} passed, ${BOLD_ULTRA_WHITE}${BG_RED} $FAILED ${NC} failed\n"
if [ $FAILED -ne 0 ]; then
exit 1
fi
To better separate the test URLs from the command itself, all the URLs to test are wrapped in a function called run_tests ().
Limitations
This is our final code. It does its job, but there are some obvious limitations:
- the test URLs are coupled with the check logic. If you want to add/remove/modify tests, you have to edit the tool itself;
- complex scenario can’t be handled: this tool can’t check specific response headers, HTTP method, or pass payload to URL
- some URL may not respond to the
HEADmethod, but only to theGETmethod
Conclusion
This tool wasn’t intended to be used in a working/production/professional environment.
It’s a quick implementation of an HTTP status checker/tester. Why is that? The answer is always: because we can and because it’s fun!
The full code is also available on GitHub:
https://github.com/gfabrizi/http-status-codes-tester