Re-architecting CS Blogs

Where are we now?

As I mentioned in my previous post the current CS Blogs system grew out of a prototype. This meant that the requirements of the system were discovered in parallel with designing and implementing the system, resulting in the slightly weird architecture shown below.

I say its weird because the `web-app` component isn’t really just a web application — it’s also an API server for the android application (and in theory any other app) and includes all the business logic of the system.

The decision to use MongoDB was born partly out of the desire to be “JavaScript all the way down” and partly out of the desire to be using what was cool at the time. Unfortunately at the time of building the system MongoDB wasn’t supported as a SaaS offering on Microsoft Azure — where CS Blogs is currently hosted — so the database was hosted on MLab, making database calls more expensive in terms of networking time than necessary.

The `feed-aggregator` is a small node.js application ran as an Azure WebJob. It was hacked together in a few days and really only supports certain RSS and ATOM feeds. For example it works great for ATOM feeds using <description> tags, but not ones which use <content> tags. These oversights were made due to the software not being developed on much real data, essentially only my own feed, and the homogeneous nature of our users blogs — they’re mainly all Blogger or WordPress.com.

Despite the obvious and numerous flaws of the system it has worked well for the past year or so. However, when I wanted to add the concepts of organisation to the system — a way of seeing blogs only written by people at a certain company or university — I found the system to be a hodge-podge of technical debt, to the point where adding new features was going to take longer than developing a good, modular, expandable system. It was time to pay down the technical debt.

Requirements

The first thing to do was to determine what parts of the old system were good — and try to ensure that these poistive things didn’t regress in the new system –, which things were in need of improvement and what new features we should add in at the same time.

Fortunately CS Blogs does do a number of things well:

Short lead time — New posts appear in the system within 15 minutes
Good Web App — The front end works well both on desktop and on mobile and is very performant due to its lack of scripts. The work Rob did on the styling makes it a joy to use
Good Authentication — Users enjoy being able to use Github, Stack Exchange or WordPress to sign in and I enjoy not having to look after their passwords

A few things it could improve on are:

Support for a larger range of RSS and ATOM feeds — ATOM support in particular isnt great in the current system
A lot of functionality only works in the web app — Any method which requires authentication, such as signing up to the system, isn’t avaliable through the API
Feed aggregation downloads every authors feed every 15 minutes, this is a lot of data and wouldn’t be economic to scale to 100s of users
Code maintainability is poor due to a complete lack of automated testing and linting

The additional user-facing features I want to implement are:

Notifications of new blog posts for CS Blogs applications on Android/iOS
Support for the aforementioned organisations feature

Designing a Distributed System

The system you can see in the diagram below was designed with the intention of fulfilling out of the requirements which I outlined above. You’ll notice the use of Amazon Web Services icons, as I have recently switched hosting from Azure to AWS. There are a enough reasons for this decision to warrent its own blog post, so I wont go into detail here.

newcsb-3 — The new CS Blogs Architecture

In the new system all applications are treated as first class citizens, meaning there is nothing that the web application can do that any other application can’t. This is achieved by having all of the business logic, authentication and database interaction handled by the `api-server` — which is accessable by anthing that can make HTTPS:// requests and handle JSON responses.

This means that the mobile applications will be able to perform actions such as registering a user and editing their details, which they cannot under the current system. Another benefit to the mobile applications that isn’t shown on this diagram is that the `feed-downloader` calls Amazon SNS with information about how many new blog posts it has found every time it runs, this in turn is relayed to the mobile applications in the form of notifications.

Whereas in the old system we used MongoDB, I’ve opted to use PostgreSQL — via the Sequelize Node.js ORM — this time around. Some of the features I want to implement in the future, such as organisations, make more sense as relations rather than as document in my mind and the ecosystem of applicatons for interacting with SQL databases, and in partciular PostgreSQL, is much more mature than MongoDB.

The `feed-downloader` is portable, but contains an entry point so that it can be used as a infrastructureless AWS Lambda function (and I suppose this entry point would also work for the newly released Azure Function system). It’s a bit more clever than the old `feed-aggregator` in that it uses If-Modified-Since HTTP requests to only download and parse RSS or ATOM feeds that purport to have changed since the last time an aggregation was ran.

Implementation

The implementation of the `feed-downloader`, `api-server` and `web-app` components follows my guide to writing better quality Node.js applications. Node.js was chosen due to its abundance of good quality libraries, ease of interaction with JSON objects and the authors familarity with it in production scenarios.

ES2015 JavaScript features including the module system, string interpolation and destructuring are used throughout to aid readability of the system — therefore Babel is required for transpilation.

In order to meet the requirement of good maintainability the `feed downloader` was built using the test driven development methodology and currently has 99% test coverage. These tests use real data, feeds from actual CS Blogs authors, including feeds from Blogger, WordPress.com, WordPress.org, Ghost and Jekyll.

Theres still a lot to be done before before the new CS Blogs can be released, so why not hit up the contribution guide and get involved?

Danny

Where are we now?

Requirements

Designing a Distributed System

Implementation

Leave a comment Cancel reply