Today's Music


Writing an HTTP server in Rust (Part I)

Now available in Chinese (yes, you read that right)

So… You want to write an HTTP server. Well, you’re in luck, The Internet Engineering Task Force (IETF) is here to help us. Whenever they’re not fighting with fiber cables (at least that’s what I imagine they do), they are writing useful specifications for us!

Okay, the IETF is cool and all, but how do you write an HTTP server?

Well, first you prepare yourself a big pot of coffee. Once you’re appropriately caffeinated, you read the 57897 word RFC 2616 specification, written by none other, than the IETF. Exciting.

Notice, it says HTTP/1.1, and if you were really attentive, notice that it was written in June 1999. For our purposes, that’s fine, as this is not a writeup on how to do a proper HTTP server implementation of the newest version (The HTTP/3 spec was recently introduced, on 26 September 2019). This post serves as a gentle introduction to how HTTP servers work and the basic principles behind it. This guide is not meant for writing a server for use in production, if you wish to do so, use a reliable server like Nginx, or Apache.

If you are interested in the differences between versions and the history of HTTP, this is a great article.

What is HTTP?

HTTP stands for Hyper Text Transfer Protocol. It’s the vehicle that delivers essentially all resources (files and other data) on the World Wide Web. Most of the time, HTTP takes place through TCP/IP sockets, this is the other protocol that we will be using (though not implementing).

This does not preclude HTTP from being implemented on top of any other protocol on the Internet, or on other networks. HTTP only presumes a reliable transport; any protocol that provides such guarantees can be used, and the mapping of the HTTP/1.1 request and response structures onto the transport data units of the protocol in question is outside the scope of this specification.

The communication that happens between clients and servers happens using the HTTP protocol (there is also the Gopher protocol if you’re a hipster, or MQTT if you’re in the IOT space). Where a client might be a web browser or software that understands HTTP. Both the TCP and HTTP protocols are request-response protocols. This means that the client will initiate a request to the server, which will listen for requests, and make an appropriate response, containing some resource.

HTTP transmits resources, which are simply chunks of data that are identified by a Uniform Resource Locator (URL). A resource might be a file or some generated query result.

You might ask:

What and how do these servers send these responses back?

Well, that’s what the RFC is for, it defines the formats that they take. TCP is a level lower than HTTP and describes how information gets sent from one location to another, but never specifies what the information is. This is where HTTP comes in.

First contact

The code can be found on Github (link of repo at the time of writing)

Our first order of business is to listen for and handle incoming TCP connections on some specific port. I will avoid using any libraries that make this process trivial (e.g. an http crate), as the point of this is to focus on how a server works.

Progress spacecraft in process of docking to the International Space Station

So, let’s start a new project, let’s call her Linda:

$ cargo new linda
$ cd linda

Subsequently, we will accept and handle that connection. I will also be using the log facade crate with simple_logger for the implementation, as it will be useful to know what’s going on when our server is running.

[dependencies]
simple_logger = "1.3.0"
log = "0.4.8"

First, we need to open a socket that the client can connect to. To connect, we will be using a TcpListener, to which we need to bind to. If we look at the documentation, we see that it returns a Result<TcpListener>, that’s bound to the address. The Result<> return means that it can fail, and that will have to be handled . TcpListener implements incoming(), an iterator over all received connections that we will subsequently be handling.

 1use log::{error, info};
 2use std::net::TcpListener;
 3
 4fn main() {
 5    simple_logger::init().unwrap();
 6    info!("Starting server...");
 7
 8    let ip = "127.0.0.1:8594";
 9
10    let listener = TcpListener::bind(ip).expect("Unable to create listener.");
11    info!("Server started on: {}{}", "http://", ip);
12
13    for stream in listener.incoming() {
14        match stream {
15            Ok(stream) => match handle_connection(stream) {
16                Ok(_) => (),
17                Err(e) => error!("Error handling connection: {}", e),
18            },
19            Err(e) => error!("Connection failed: {}", e),
20        }
21    }
22}
  • Line 8: the IP (localhost) and port to bind to
  • Line 10: listener that binds to the IP:Port, returns an error if it fails to do so
  • Line 13: loop through the incoming connections
  • Lines 14-20: match the returned Result<> possibilities, as the connection might fail
  • Lines 15-18: match the handle_connection(stream) that also returns a Result<>, which we have to yet implement

Rust doesn’t have exceptions. Instead, it has the type Result for recoverable errors and the panic! macro that stops execution when the program encounters an unrecoverable error. (if you are not familiar, do look at the Result<> documentation)

If you try and go to http://127.0.0.1:8594 in your browser, you will see a “Connection was reset”, that’s because the server isn’t sending any data back.

Handling the client

Request: HTTP GET Omlette. Response: 404 NOT FOUND

We have established a connection to the TCP socket, now we need to handle its stream. This is done by calling handle_connection(stream) on line 18 from the previous code block, which we have to yet implement.

At the moment we are only passing the request line as specified in the RFC spec Request-Line = Method SP Request-URI SP HTTP-Version CRLF, not the whole request

A full request looks like this (taken from the RFC spec):

Request  = Request-Line              ; Section 5.1
           *(( general-header        ; Section 4.5
            | request-header         ; Section 5.3
            | entity-header ) CRLF)  ; Section 7.1
           CRLF
           [ message-body ]          ; Section 4.3
 1fn handle_connection(mut stream: TcpStream) -> Result<(), Error> {
 2    // 512 bytes is enough for a toy HTTP server
 3    let mut buffer = [0; 512];
 4
 5    // writes stream into buffer
 6    stream.read(&mut buffer).unwrap();
 7
 8    let request = String::from_utf8_lossy(&buffer[..]);
 9    let request_line = request.lines().next().unwrap();
10
11    match parse_request_line(&request_line) {
12        Ok(request) => {
13            info!("\n{}", request);
14        }
15        Err(()) => error!("Badly formatted request: {}", &request_line),
16    }
17
18    Ok(())
19}

This is quite a bit of new code, so let’s go through it in chunks. Note that we return a Result<(), Error> that’s matched in main.rs.

Reading the stream into the buffer

First, we want to take a mutable TcpStream and read its data into a buffer, here I simply read it into a &[u8] of 512 bytes. If we were doing multiple writes, we would buffer them up and flush the stream when all of them are done. This is useful if we’d be chunking data, in that case we’d use a BufWriter, this is useful when sending large files and in that case is a lot more efficient. However, we’re serving files which are already in memory anyways, so we have no need.

let mut buffer = [0; 512];

stream.read(&mut buffer).unwrap();

let request = String::from_utf8_lossy(&buffer[..]);
let request_line = request.lines().next().unwrap();

We pass the buffer as a mutable reference, then we convert it into a String, so that we can later parse it. lines() breaks the string up into lines and returns an iterator, next() returns the next element from the iterator.

In Rust a String is different from a &str, namely String is stored on the heap and can be grown, whereas &str is stored on the stack and cannot.

Note from harvey_bird_person from /r/rust:

It’s true that &str cannot be grown, but that’s because it’s a non-mutable reference. Any piece of data that has a non-mutable reference cannot be changed. The actual text that the &str refers to could be anywhere - the text could be allocated on the heap, or it could be a const string, or anything. We don’t know, and we don’t need to know.

Parsing the request line

match parse_request_line(&request_line) {
    Ok(request) => {
        info!("Request: {}", &request);
    }
    Err(e) => error!("Bad request: {}", e),
}

Ok(())

Here we pass the Request-Line (as per RFC) to a currently unimplemented function parse_request_line(), which we pass to by reference. If the parser decides that the request is Ok, we print it, if not we return an error. Now to the parsing itself…

 1fn parse_request_line(request: &str) -> Result<Request, Box<dyn Error>> {
 2    let mut parts = request.split_whitespace();
 3
 4    let method = parts.next().ok_or("Method not specified")?;
 5    // We only accept GET requests
 6    if method != "GET" {
 7        Err("Unsupported method")?;
 8    }
 9
10    let uri = Path::new(parts.next().ok_or("URI not specified")?);
11    let norm_uri = uri.to_str().expect("Invalid unicode!");
12
13    const ROOT: &str = "/path/to/your/static/files";
14
15    if !Path::new(&format!("{}{}", ROOT, norm_uri)).exists() {
16        Err("Requested resource does not exist")?;
17    }
18
19    let http_version = parts.next().ok_or("HTTP version not specified")?;
20    if http_version != "HTTP/1.1" {
21        Err("Unsupported HTTP version, use HTTP/1.1")?;
22    }
23
24    Ok(Request {
25        method,
26        uri,
27        http_version,
28    })
29}

On line 2 we split the request line string at every whitespace, which returns an Iterator, which we can then loop through. This is precisely what we do on lines 4, 10, 19: next() returns the next part of the string and then ok_or() transforms the Option<> into a Result<> (if you are not familiar with Rust’s Result<>, do look at the documentation). If the ok_or() returns an error, we specify some descriptive error messages.

ok_or() maps Some(v) to Ok(v) and None to Err(err) and finally we propagate the error up with ?.

We also specify the document root on line 13, this is where the server will look for files. Then we concatenate the static root directory with the uri and check if it exists on the filesystem. If it does not, we return an error. Observe the return signature of the function Result<Request, Box<dyn Error>>, where dyn stands for dynamic, meaning any kind of error can be returned. This gives us the opportunity to return better formatted error messages in the future.

Lastly, we check if the method is GET (A compliant HTTP/1.1 implementation must include support for HEAD as well). Then we check if the URI actually exists in the filesystem and if the HTTP version is equal to HTTP/1.1. If these don’t apply, we propagate the error up.

If all is good, we return a Request wrapped in an Ok().

The Request struct

One thing that I did not mention yet is the Request struct. In it we store the request line, as specified in the RFC spec:

Request-Line = Method SP Request-URI SP HTTP-Version CRLF

SP is a space character and CRLF stands for carriage return and line feed (which originates from the days of typewriters) .We write the CRLF sequence as \r\n, where \r is a carriage return and \n is a line feed.

This is how it would be formatted:

format!("{} {} {}\r\n", self.method, self.uri.display(), self.http_version)

This is the list of methods that we could use: (from the spec)

  • OPTIONS
  • GET
  • HEAD
  • POST
  • PUT
  • PATCH
  • COPY
  • MOVE
  • DELETE
  • LINK
  • UNLINK
  • TRACE
  • WRAPPED

We will only be implementing the GET method for now. Then we have the request URI as per the specification:

The GET method means retrieve whatever information (in the form of an entity) is identified by the Request-URI.

So if we’d GET a /index.html, and the server root location would have this file, we’d return it in a response body.

It builds on the discipline of reference provided by the Uniform Resource Identifier (URI) [3], as a location (URL) [4] or name (URN) [20], for indicating the resource on which a method is to be applied.

We will be storing the URI as a std::path::Path.

And lastly the HTTP version we are going to be using is HTTP/1.1, which we store as a &str.

struct Request<'a> {
    method: &'a str,
    uri: &'a Path,
    http_version: &'a str,
}

Note that we are using references to strings, not Strings, therefore we have to pass lifetime annotations with 'a' to them.

However when we try and compile, the compile gives us an error:

error[E0277]: `Request<'_>` doesn't implement `std::fmt::Display`
  --> src/main.rs:57:27
   |
57 |             info!("\n{}", request);
   |                           ^^^^^^^ `Request<'_>` cannot be formatted with the default formatter
   |
   = help: the trait `std::fmt::Display` is not implemented for `Request<'_>`
   = note: in format strings you may be able to use `{:?}` (or {:#?} for pretty-print) instead
   = note: required by `std::fmt::Display::fmt`

This means that we have to manually implement the fmt::Display trait, as Rust does not understand how to properly format the Request struct for printing.

Here’s the fmt::Display implementation:

impl<'a> fmt::Display for Request<'a> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(
            f,
            "{} {} {}\r\n",
            self.method,
            self.uri.display(),
            self.http_version
        )
    }
}

We also have to manually specify the lifetimes for Request in the implementation of Display.

A hacky response

Currently our server doesn’t actually serve anything… So let’s write a temporary solution: we will create an index.html file that we will be sending as part of the response.

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>This is a title</title>
  </head>
  <body>
    <h1>Hello from Linda!</h1>
  </body>
</html>

You can make it anything you like, but consider that we currently do not have the capability to send other media, such as images (we’d need to implement MIME types, which will be covered next time). Let’s import the filesystem library:

use std::fs;
 1match parse_request_line(&request_line) {
 2    Ok(request) => {
 3        info!("Request: {}", &request);
 4
 5        let contents = fs::read_to_string("index.html").unwrap();
 6        let response = format!("{}{}", "HTTP/1.1 200 OK\r\n\r\n", contents);
 7
 8        info!("Response: {}", &response);
 9        stream.write(response.as_bytes()).unwrap();
10        stream.flush().unwrap();
11    }
12    Err(()) => error!("Badly formatted request: {}", &request_line),
13}

We first read the file as a string from the filesystem. Then we craft a response, as per the RFC spec (currently we’re only returning a Status-Line and Entity-Body):

Full-Response   = Status-Line               ; Section 6.1
                  *( General-Header         ; Section 4.3
                  | Response-Header        ; Section 6.2
                  | Entity-Header )        ; Section 7.1
                  CRLF
                  [ Entity-Body ]           ; Section 7.2

The status line which is defined as: Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF. The line is temporarily hardcoded, in part 2 of this guide we will do this “properly”.

let contents = fs::read_to_string("index.html").unwrap();
let response = format!("{}{}", "HTTP/1.1 200 OK\r\n\r\n", contents);

The first digit of the Status-Code defines the class of response. The last two digits do not have any categorization role. There are 5 values for the first digit:

  >- 1xx: Informational - Request received, continuing process

  - 2xx: Success - The action was successfully received,
    understood, and accepted

  - 3xx: Redirection - Further action must be taken in order to
    complete the request

  - 4xx: Client Error - The request contains bad syntax or cannot
    be fulfilled

  - 5xx: Server Error - The server failed to fulfill an apparently
    valid request

Then we call as_bytes on our response string, which converts it into bytes. This &[u8] data is then passed on to stream through the write method, which sends it to the connection. Note, that the write and flush operations could fail, hence we unwrap() these methods, this is not proper error handling, we will tackle this in the next post.

stream.write(response.as_bytes()).unwrap();
stream.flush().unwrap();

The code can be found on Github (link of repo at the time of writing)

Here’s the full code

use log::{error, info};
use std::error::Error;
use std::fmt;
use std::fs;
use std::io::{Read, Write};
use std::net::{TcpListener, TcpStream};
use std::path::Path;

/// Request-Line = Method SP Request-URI SP HTTP-Version CRLF
struct Request<'a> {
    method: &'a str,
    uri: &'a Path,
    http_version: &'a str,
}

impl<'a> fmt::Display for Request<'a> {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        write!(
            f,
            "{} {} {}\r\n",
            self.method,
            self.uri.display(),
            self.http_version
        )
    }
}

fn parse_request_line(request: &str) -> Result<Request, Box<dyn Error>> {
    let mut parts = request.split_whitespace();

    let method = parts.next().ok_or("Method not specified")?;
    // We only accept GET requests
    if method != "GET" {
        Err("Unsupported method")?;
    }

    let uri = Path::new(parts.next().ok_or("URI not specified")?);
    let norm_uri = uri.to_str().expect("Invalid unicode!");

    const ROOT: &str = "/home/ongo/Programming/linda";

    if !Path::new(&format!("{}{}", ROOT, norm_uri)).exists() {
        Err("Requested resource does not exist")?;
    }

    let http_version = parts.next().ok_or("HTTP version not specified")?;
    if http_version != "HTTP/1.1" {
        Err("Unsupported HTTP version, use HTTP/1.1")?;
    }

    Ok(Request {
        method,
        uri,
        http_version,
    })
}

fn handle_connection(mut stream: TcpStream) -> Result<(), Box<dyn Error>> {
    // 512 bytes is enough for a toy HTTP server
    let mut buffer = [0; 512];

    // writes stream into buffer
    stream.read(&mut buffer).unwrap();

    let request = String::from_utf8_lossy(&buffer[..]);
    let request_line = request.lines().next().unwrap();

    match parse_request_line(&request_line) {
        Ok(request) => {
            info!("Request: {}", &request);

            let contents = fs::read_to_string("index.html").unwrap();
            let response = format!("{}{}", "HTTP/1.1 200 OK\r\n\r\n", contents);

            info!("Response: {}", &response);
            stream.write(response.as_bytes()).unwrap();
            stream.flush().unwrap();
        }
        Err(e) => error!("Bad request: {}", e),
    }

    Ok(())
}

fn main() {
    simple_logger::init().unwrap();
    info!("Starting server...");

    let ip = "127.0.0.1:8594";

    let listener = TcpListener::bind(ip).expect("Unable to create listener.");
    info!("Server started on: {}{}", "http://", ip);

    for stream in listener.incoming() {
        match stream {
            Ok(stream) => match handle_connection(stream) {
                Ok(_) => (),
                Err(e) => error!("Error handling connection: {}", e),
            },
            Err(e) => error!("Connection failed: {}", e),
        }
    }
}

In the real implementation, I have separated everything into lib.rs and only exposed handle_connection() to main(). Next post I will also do some refactoring to accomodate responses.

Running it!

Finally, the moment of truth: if we cargo run and open up http://127.0.0.1:8594 in the browser, we will observe the following output if all is well:

INFO  [linda] Request: GET / HTTP/1.1

And from the browser we can see the html file being rendered correctly and presented to us!

index.html will be sent back if it detects a system path that exists. In our case ROOT/ does exist and, as we have hardcoded the file string into contents for now, we will see the rendered index.html output. We will later check if the file exists and serve it.

https://d33wubrfki0l68.cloudfront.net/d40dfd96abbf99ffa72b6d173c0b1cc24428ed7a/b0b6a/posts/linda/hello.png

Note that we are only logging the line itself, not the whole request! The whole request would look something like this:

GET / HTTP/1.1
Host: 127.0.0.1:8594
User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:68.0) Gecko/20100101 Firefox/68.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive
Cookie: csrftoken=VbaHdSoP0mPmMqaeaEiaCOywh4ZKKy68MnHRNIZDVTqBgqGDFyFQspCguESsTbDy; sessionid=2xumbk29qxyhd8rsqltadllshxeftzaa
Upgrade-Insecure-Requests: 1
Cache-Control: max-age=0

You can also try to http GET (this is the httpie package, but you could just curl it) the URL.

And if you provide some other, unsupported method like POST, you will receive an error:

http: error: ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) while doing POST request to URL: http://127.0.0.1:8594/

With Linda’s log looking like (we simply print the Request-Line):

ERROR [linda] Bad request: Unsupported method

We have a few problems however, for example if we had multiple requests and one would take a long time, the other requestee would not be served anything, as the server only runs on one thread.

But, we will deal with the problems and the rest of the specification next time! Next time we will implement:

  • Multithread it
  • Headers (Content types)
  • Return success/error responses
  • Body (Serve static files from a root folder)
  • Status Codes (200 OK, 404 NOT FOUND)

The code can be found on Github (link of repo at the time of writing)


2675 Words

2019-10-03 13:54

Feedback and discussion are highly encouraged, please leave a comment below (markdown enabled).