Categories
Software Engineering

Re-architecting CS Blogs

Where are we now?

As I mentioned in my previous post the current CS Blogs system grew out of a prototype. This meant that the requirements of the system were discovered in parallel with designing and implementing the system, resulting in the slightly weird architecture shown below.

Old CSBlogs Architecture
Old CSBlogs Architecture

I say its weird because the `web-app` component isn’t really just a web application — it’s also an API server for the android application (and in theory any other app) and includes all the business logic of the system.

The decision to use MongoDB was born partly out of the desire to be “JavaScript all the way down” and partly out of the desire to be using what was cool at the time. Unfortunately at the time of building the system MongoDB wasn’t supported as a SaaS offering on Microsoft Azure — where CS Blogs is currently hosted — so the database was hosted on MLab, making database calls more expensive in terms of networking time than necessary.

The `feed-aggregator` is a small node.js application ran as an Azure WebJob. It was hacked together in a few days and really only supports certain RSS and ATOM feeds. For example it works great for ATOM feeds using <description> tags, but not ones which use <content> tags. These oversights were made due to the software not being developed on much real data, essentially only my own feed, and the homogeneous nature of our users blogs — they’re mainly all Blogger or WordPress.com.

Despite the obvious and numerous flaws of the system it has worked well for the past year or so. However, when I wanted to add the concepts of organisation to the system — a way of seeing blogs only written by people at a certain company or university — I found the system to be a hodge-podge of technical debt, to the point where adding new features was going to take longer than developing a good, modular, expandable system. It was time to pay down the technical debt.

Requirements

The first thing to do was to determine what parts of the old system were good — and try to ensure that these poistive things didn’t regress in the new system –, which things were in need of improvement and what new features we should add in at the same time.

Fortunately CS Blogs does do a number of things well:

  • Short lead time — New posts appear in the system within 15 minutes
  • Good Web App — The front end works well both on desktop and on mobile and is very performant due to its lack of scripts. The work Rob did on the styling makes it a joy to use
  • Good Authentication — Users enjoy being able to use Github, Stack Exchange or WordPress to sign in and I enjoy not having to look after their passwords

A few things it could improve on are:

  • Support for a larger range of RSS and ATOM feeds  — ATOM support in particular isnt great in the current system
  • A lot of functionality only works in the web app — Any method which requires authentication, such as signing up to the system, isn’t avaliable through the API
  • Feed aggregation downloads every authors feed every 15 minutes, this is a lot of data and wouldn’t be economic to scale to 100s of users
  • Code maintainability is poor due to a complete lack of automated testing and linting

The additional user-facing features I want to implement are:

  • Notifications of new blog posts for CS Blogs applications on Android/iOS
  • Support for the aforementioned organisations feature

Designing a Distributed System

The system you can see in the diagram below was designed with the intention of fulfilling out of the requirements which I outlined above. You’ll notice the use of Amazon Web Services icons, as I have recently switched hosting from Azure to AWS. There are a enough reasons for this decision to warrent its own blog post, so I wont go into detail here.

newcsb-3
The new CS Blogs Architecture

In the new system all applications are treated as first class citizens, meaning there is nothing that the web application can do that any other application can’t. This is achieved by having all of the business logic, authentication and database interaction handled by the `api-server` — which is accessable by anthing that can make HTTPS:// requests and handle JSON responses.

This means that the mobile applications will be able to perform actions such as registering a user and editing their details, which they cannot under the current system. Another benefit to the mobile applications that isn’t shown on this diagram is that the `feed-downloader` calls Amazon SNS with information about how many new blog posts it has found every time it runs, this in turn is relayed to the mobile applications in the form of notifications.

Whereas in the old system we used MongoDB, I’ve opted to use PostgreSQL — via the Sequelize Node.js ORM — this time around. Some of the features I want to implement in the future, such as organisations, make more sense as relations rather than as document in my mind and the ecosystem of applicatons for interacting with SQL databases, and in partciular PostgreSQL, is much more mature than MongoDB.

The `feed-downloader` is portable, but contains an entry point so that it can be used as a infrastructureless AWS Lambda function (and I suppose this entry point would also work for the newly released Azure Function system). It’s a bit more clever than the old `feed-aggregator` in that it uses If-Modified-Since HTTP requests to only download and parse RSS or ATOM feeds that purport to have changed since the last time an aggregation was ran.

Implementation

The implementation of the `feed-downloader`, `api-server` and `web-app` components follows my guide to writing better quality Node.js applications. Node.js was chosen due to its abundance of good quality libraries, ease of interaction with JSON objects and the authors familarity with it in production scenarios.

ES2015 JavaScript features including the module system, string interpolation and destructuring are used throughout to aid readability of the system — therefore Babel is required for transpilation.

Just some of the feed-downloader tests
Just some of the feed-downloader tests

In order to meet the requirement of good maintainability the `feed downloader` was built using the test driven development methodology and currently has 99% test coverage. These tests use real data, feeds from actual CS Blogs authors, including feeds from Blogger, WordPress.com, WordPress.org, Ghost and Jekyll.

Theres still a lot to be done before before the new CS Blogs can be released, so why not hit up the contribution guide and get involved?

Danny

Categories
Blogging

Computer Science Blogs Beta

Rob and I have both been doing a lot of work on CS Blogs since the last time I blogged about it. Its now in a usable state, and the public is now welcome to sign up and use the service, as long as they are aware there may be some bugs and changes to public interfaces at any time.

The service has been split up into 4 main areas, which will be discussed below:

csblogs.com – The CS Blogs Web App

CSBlogs.com provides the HTML5 website interface to Computer Science Blogs. The website itself is HTML5 and CSS 3 compliant, supports all screen sizes through responsive web design and supports high and low DPI devices through its use of scalable vector graphics for iconography.

Through the web app a user can read all blog posts on the homepage, select a blogger from a list and view their profile — including links to their social media, github and cv — or sign up for the service themselves.

One of the major flaws with the hullcompsciblogs system was that to sign up a user had to email the administrator and be added to a database manually. Updating a profile happened in the same way. CSBlogs.com times to entirely remove that pain point by providing a secure, easy way to get involved. Users are prompted to sign in with a service — either GitHub, WordPress or StackExchange — and then register. This use of OAuth services means that we never know a users password (meaning we can’t lose it) and that we can auto-fill some of their information upon sign in, such as email address and name, saving them precious time.

As with every part of the site a user can sign up, register manage and update their profile entirely from a mobile device.

api.csblogs.com – The CS Blogs Application Programming Interface

Everything that can be viewed and edited on the web application can be viewed and edited from any application which can interact with a RESTful JSON API. The web application itself is actually built onto of the same API functions.

We think making our data and functions available for use outside of our system will allow people to come up with some interesting applications for a multitude of platforms that we couldn’t support on our own. Alex Pringle has already started writing an Android App.

docs.csblogs.com – The CS Blogs Documentation Website

docs.csblogs.com is the source of information for all users, from application developers consuming the API to potential web app and feed aggregator developers. Alongside pages of documentation on functions and developer workflows there are live API docs and support forums.

In the screenshot below you can see a screenshot of a docs.csblogs.com page which shows a developer the expected outcome of an API call and actually allows them to test it, in a similar way to the Facebook graph explorer, live on the documentation page.

CS Blogs API Documentation
CS Blogs API Documentation

Thanks to readme.io for providing our documentation website for free due to us being an open source project they are interested in!

The CS Blogs Feed Aggregator

The feed aggregator is a node.js application which, every five minutes, requests the RSS/ATOM feed of each blogger and adds any new blogs to the CSBlogs database.

The job is triggered using a Microsoft Azure WebJob, however it is written so that it could also be triggered by a standard UNIX chronjob.

Whilst much of the actual RSS/ATOM parsing is provided by libraries it has been interesting to see inconsistencies between different platforms handling of syndication feeds. Some give you links to images used in blog posts, some don’t, some give you “Read more here” links, some don’t. A reasonable amount of code was written to ensure that all blog posts appear the same to end-users, no matter their original source.

Try it!

I welcome anyone who wants to to try to service now at http://csblogs.com. We would also love any help, whether that be submitting bugs via GitHub issues or writing code over at our public repository.

Danny