Types of web pages

Hannes Karppila

We are planning to build a website, and need to pick a technology. Hopefully we know already what functionality the site should have. We would, of course, like to select the simplest suitable technology. Today there it's trendy to use tools built for complex web apps, like React/Redux, for every project.

If you only have a hammer, all problems look like nails. However, if we already have enough experience with other tools to effitively use any of them, so it might make sense to use a simpler tool.

Evolution of website complexity

Next I describe different kinds of projects we might be building. These are loosely sorted by increasing complexity, although exceptions occur sometimes.

Document

The simplest website is a plaintext document, i.e. a .txt file with no images, hyperlinks or any other fancy stuff. At some point, formatting is required, so you'll switch to HTML so you can use tables and bullet-point lists. Now it looks like this.. You think it looks ugly, so you sprinkle some CSS on it, and get this. Then you need some images, so you add those and a bit more CSS.

Now we have reached the point where the website has reached the limits of what a static document can be. It'll still work when printed (although the hyperlinks and some other things lose fuunctionality), and it will look same on paper and on screen. These are almost equivalent to .pdf files. Maybe a document could still have video or audio as a part of it, but not much else.

These can be created using any text editor, or just standard office tooling like MS word. Hosting is super simple, as just using a cloud drive or Github works.

Plain old static website

Anyway, if we have any more functionality, we are building something else than a document. Maybe we are building a website? Simple websites (like blogs or news sites) typically have multiple documents some navigation to find the documents. Usually they'll also have a search functionality, and links to relevant other documents. Site generators like Jekyll can be used to build sites like this quickly.

These sites are still almost static in structure. No per-client state is needed. Relevant links and search indices can be generated ahead of time. They don't need much or any JavaScript to run, and there is next to no benefit of having an SPA.

Nowadays, such sites are typically built and managed using a CMS such as Wordpress or Squarespace. These allow the owner of the website to edit the available content using the website itself. I feel like this almost pushes them to the next category. However, from the perspective of the end user, the site is still just a plain old website.

Living website

The next increase in complexity are online shops and online forums. These need to hold some state for each client: for shops, the contents of the shopping cart. Forums typically need some login information at least. Backend also has to do more than just serving static files from the disc: it must write to a database, and do some kind of access control and validation logic as well.

This can still be done completely without JavaScript, using e.g. Django or phpBB. They use server-generated HTML code, and user actions are performed via <form> tags and <input> fields. Each form submission reloads the site, making the server update it's state.

However, for these sites it sometimes makes sense to have an SPA, as not having the page reload after adding a product to the shopping cart might make the customers happy. For a fast-paced forum, it might be nicer if new comments appeared without reloading. Some forums like Reddit are transitioning to the SPA model, however the new site is unfortunately unusable due to bad UX design.

Web app, separate state per device

After this there are web apps. These have more complex functionality. Things like office apps, video editors, and single-player games. For these, the program is interactive, and cannot be described using declarative HTML. Instead, JavaScript is used to generate the page content on the fly. When user interacts with the site, the client doesn't immediately send the action or changed state to the server. Instead, the new version of the content is first built and displayed in the browser, and the required parts (if any) are sent to the server. Typically the server stores very little state, although with office apps and editors the server is used as cloud storage for documents.

At this point it isn't required to be a website anymore. A native application using the same API server would behave similarly. However, I'm focusing mostly on websites here, but these technologies are usable without the browser as well.

Web app, stream of shared events

The previous category doesn't contain apps that require data exchange between users. These apps don't just have a single state, but instead the changes are distributed to multiple clients in real time. However, each client works separately, publishing only to their own, separate data set. Most chat and video call applications as well as social media sites fall into this category. As a clean abstraction, in this category each client publishes their own separate stream of events. The server distributes the events to other clients, possibly compacting them first.

For video calls and some fast-paced games the clients bypass the server and use peer-to-peer connections. They talk to each other directly through WebRTC, while the server simply assits with ICE.

Web app with realtime collaboration

Collaborative editing and multiplayer games require shared mutable state between clients. For these, the server acts as the coordinator. Sometimes it's the authoritative source for data, and does things like merging together edits from multiple documents.

These apps don't just have a single state. Instead, the server and each of the clients have their own state and these are synchronized in real time. For document-based software this typically means either complicated conflict resolution logic or using only CRDTs. Realtime games can use methods like rollback netcode.

Web app with offline state

Some web apps work offline as well. At first this sounds weird, because how are we loading the page in the first place, if we are offline? The answer here is PWA, which is in simple terms a website pretending to be an app. The other solution is having a native application that accesses the same web API, as mentioned before.

Offline apps are similar to realtime apps, but the conflict resolution logic tends to be even more complicated. With realtime collaboration, it can be assumed that branching point for changes is usually only a few seconds in the past. The issues arising from conflict resolution logic are small and corrected almost immediately. Offline apps can make no such assumptions. When a computer goes offline, it's like starting a new branch in git, and when it comes back online, the branches have to be merged. In nontrivial cases, manual conflict resolution by the user is the only sane option.

It's important to notice that offline features have issues without collaborative features. Offline apps, however, cause issues even when a single person performs actions on multiple devices, each of which can be offline. Naturally, the collaborative features combine together the challenges of the both requirements.

State management

Somewhere around here we notice that this not a collection of documents, but a full-blown distributed system. The first steps are about content and presentation, and those can be handled without any code or state. The server is the single source of truth, and the client simply displays this state. The only useful state management technique for the client is caching the state for some time. However, the more client state we add, the more we have to think about synchronization, caches and conflicts.

When transferring state between the server and the client, they must have a common data format for communication. With JavaScript, we almost always use JSON for this. Some smaller apps get around with just using some ad-hoc JSON blobs, but larger apps usually must have some kind of schema. Typically the API server publishes the schema in some format like Swagger. The client rarely needs to verify anything against the schema, since it can trust that the server only produces valid data. When using TypeScript, however, we must specify type information about the JSON messages. When the frontend and backend are written in different languages (e.g. Python+FastAPI on the backend, and TypeScript on the frontend), this causes lots of duplicated code. Typically we must specify the messages both as Python classes and TypeScript types. Even worse, when using an ORM, that usually requires typing out the types once more. Not all this redundancy is useless, but it's tedious and error prone. This can be somewhat alleviated if we use same language through the stack. TypeScript is the usual choice here, although through WebAssembly Rust can be used as well.

With distributed systems the ability to define schemas and data structures becomes much more important, as it's hard to make sure that all clients are running the same version of the code. Even worse, offline apps might require the ability to update old data to new models before they can be synchronized. Both forwards and backwards compatibility has to be taken into account.