Archive | Software Engineering RSS for this section

You should be using NSP

A few months ago I integrated the Node Security Platform into the continous integration system we use at pHQ. This week it picked up a vulnerability for the first time (don’t worry, its since been patched ūüėČ) which meant that I was alerted to the vulnerability and provided with a link to read about ways to mitigate the risk involved until a patch was available. Had we paid for a subscription to NSP it would submit a pull request to update the package(s) with the fix as soon as it was available.

In the case shown in the screenshot above you can see that the pHQ platform didn’t directly rely upon the vulnerable package, but had 5 dependencies which included it one way or another. If you’re not automatically checking for vulnerabilities then you may not find them as you probably don’t know how many packages you indirectly depend upon!

If you’re not using node something like Snyk¬†may support your language.

As Software Engineers our job may be seen as producing features for users, but we have a duty to ensure that what we develop is secure and won’t put peoples money or personal information at risk. A dependency vulnerbility checked is one great tool to have in the box.

Danny

Advertisements

A Simple Transaction Log

A little while ago I was curious about how software that uses Transaction Logs works, so I built a toy to-do command line application to get some hands on experience.

Transaction Logs are used when you want to have metadata detailing each action that has taken place on a system but then actually use those logs as the data for the system. Therefore to build up the state of the system at any time you can simply “replay” each action¬†that took place before that time. This enables you to time travel as well as know exactly what happened for user 54 to go into his overdraft.

screen-shot-2016-12-29-at-02-02-38

Above you can see every feature of todo-log (I did say it was a toy application!) being used from the macOS terminal.

Using the add command you can store a todo item with both a title and some additional body text. This todo item is then assigned a short code — shown in red above — which can be used as the unique identifier to manipulate it in other commands. The transaction detailing the addition of the todo item is saved to a JSON Lines (.jsonl) file.

The ls command, without parameters, displays a list of all the todo items which are outstanding at the current time. Some basic formatting is applied including showing a friendly date (e.g. “two hours ago”) for the creation time of a todo-item.

Deleting a todo item is simple, just pass the shortcode of the item to the rm command. The command doesn’t then delete the add transaction from the jsonl file, instead it adds its own remove transaction detailing the time of deletion and the todo item involved.

Invoking the log command just prints out the raw .jsonl to the terminal for inspection. In the above screenshot you can see the format of both the ADD and REMOVE transactions.

Finally, passing a positive integer parameter to the ls command goes back in time by that number of transactions. In this example you can see what the state of the todo list was before the previous rm command was issued. Everytime ls is called it simply builds up the current state of the system by rerunning each of the actions detailed in the .jsonl file — to go back in time we just ignore the last n lines, where n is the number provided to the command.

You can have a look at this fun little project over on my Github.

Though I only spent a couple of hours on it, and it doesn’t do much, I feel like working on this gave me a greater appreciation of what transaction logs can be used for and the challenegs around them. Systems like RDBMS transaction logs and Apache Kafka are much more interesting to me as a result.

Sometimes the easiest way to learn about something and become more interested is just to jump in and build something

Danny

Braces

There are a few things that any seasoned Software Engineer will have had arguments discussions about. Windows vs Linux, Merge vs Rebase and inevitably code indentation style.

Just today Rob and I discussed whether we should diverge from the “One True Brace Style” (1TBS) decreed by the AirBnB JavaScript Guide¬†toward the Stoustrup style of indentation. The only difference?¬†Stroustrup does not use a ‚Äúcuddled else‚ÄĚ, instead else keywords are on their own line.

Does such a minor difference matter? I would argue it does. If being able to read code in a certain style increases a programmers productivity then that is no bad thing. However, this increase in productivity can be easily offset by having to change style when working in different codebases. Consistency is important.

To maintain consistency in the CS Blogs codebase every component would have to be updated. This would mean 100s of lines changing for style, reducing the effectiveness of git blame and muddying the commit history. Even if we were to do this eslint-config-airbnb¬†was downloaded 399,657 times in the last month and I would wager most of the projects using it are sticking with the suggested 1TBS style. The advantage of having code that looks like the “standard” for an open source project is that it enables potential contributers¬†to get involved that bit easier.

My theory about code style guidelines is that in a team of n people n-1 people will be unhappy with at least part of the guideline. The only person that will be completely happy with them will be the person whom decided upon the rules. Programming is merely transcribing processes and thoughts into a language a computer can understand, and in that sense it is very personal and everyone is likely therefore to have strong feelings around how those thoughts look on screen.

As with so many things in Software Engineering,¬†in many ways the style you choose doesn’t matter, but sticking to it and enforcing consistency does. This is why I am against changing the CS Blogs codebase even though I agree with Rob that the stoustrup style is nicer on the eye.

So, what can Rob do in this situation? The first option would be to just keep writing in the 1TBS style until it seems natural (this took me a few days of writing), however he could also use an automated code formatter to change how his local code looks and then automatically have it changed to the prescribed style before any commits to version control. Any mistakes by the automated code formatter would be caught by the ESLint commit hook.

Danny

Re-architecting CS Blogs

Where are we now?

As I mentioned in my previous post the current CS Blogs system grew out of a prototype. This meant that the requirements of the system were discovered in parallel with designing and implementing the system, resulting in the slightly weird architecture shown below.

Old CSBlogs Architecture

Old CSBlogs Architecture

I say its weird because the `web-app` component isn’t really just a web application — it’s also an API server for¬†the android application (and in theory any other app) and includes all the business logic of the system.

The decision to use MongoDB was born partly out of the desire to be “JavaScript all the way down” and partly out of the desire to be using what was cool at the time. Unfortunately at the time of building the system MongoDB wasn’t supported as a SaaS offering on Microsoft Azure — where CS Blogs is currently hosted — so the database was hosted on MLab, making database calls more expensive in terms of networking time than necessary.

The `feed-aggregator` is a small node.js application ran as an Azure WebJob. It was hacked together in a few days and really only supports certain RSS and ATOM feeds. For example it works great for ATOM feeds using <description> tags, but not ones which use <content> tags. These oversights were made due to the software¬†not being developed on much real data, essentially¬†only my own feed, and the homogeneous nature of our users blogs — they’re mainly all Blogger or WordPress.com.

Despite the obvious and numerous flaws of the system it has worked well for the past year or so. However, when I wanted to add the concepts of organisation to the system — a way of seeing blogs only written by people at a certain company or university — I found the system to be a hodge-podge of technical debt, to the point where adding new features was going to take longer than developing a good, modular, expandable system. It was time to pay down the technical debt.

Requirements

The first thing to do was to determine what parts of the old system were good — and try to ensure that these poistive things didn’t regress in the new system –, which things were in need of improvement and what new features we should add in at the same time.

Fortunately CS Blogs does do a number of things well:

  • Short lead time — New posts appear in the system within 15 minutes
  • Good Web App — The front end works well both on desktop and on mobile and is very performant due to its lack of scripts. The work Rob did on the styling makes it a joy to use
  • Good Authentication — Users enjoy¬†being able to use Github, Stack Exchange or WordPress to sign in and I enjoy not having to look after their passwords

A few things it could improve on are:

  • Support for a larger range of RSS and ATOM feeds ¬†— ATOM support in particular isnt great in the current system
  • A lot of functionality only works in the web app — Any method which requires authentication, such as signing up to the system, isn’t avaliable through the API
  • Feed aggregation¬†downloads every authors feed every 15 minutes, this is a lot of data and wouldn’t be economic to scale to 100s of users
  • Code maintainability is poor due to a complete lack of automated testing and linting

The additional user-facing features I want to implement are:

  • Notifications of new blog posts for CS Blogs applications on Android/iOS
  • Support for the aforementioned organisations feature

Designing a Distributed System

The system you can see in the diagram below was designed with the intention¬†of fulfilling out of the¬†requirements which I outlined above. You’ll notice the use of Amazon Web Services icons, as I have recently switched hosting from Azure to AWS. There are a enough reasons for this decision¬†to warrent its own blog post, so I wont go into detail here.

newcsb-3

The new CS Blogs Architecture

In the new system all applications are treated as first class citizens, meaning there is nothing that the web application can do that any other application can’t. This is achieved by having all of the business logic, authentication and database interaction handled by the `api-server`¬†— which¬†is accessable by anthing that can make HTTPS:// requests and handle JSON responses.

This means that the mobile applications will be able to perform actions such as registering a user and editing their details, which they cannot under the current system. Another benefit to the mobile applications that isn’t shown on this diagram is that the `feed-downloader` calls Amazon SNS with information about how many new blog posts it has found every time it runs, this in turn is relayed to the mobile applications in the form of notifications.

Whereas in the old system we used MongoDB, I’ve opted to use PostgreSQL — via the Sequelize Node.js ORM — this time around. Some of the features I want to implement in the future, such as organisations, make more sense as relations rather than as document in my mind and the ecosystem of applicatons for interacting with SQL databases, and in partciular PostgreSQL, is much more mature than MongoDB.

The `feed-downloader` is portable, but contains an entry point so that it can be used as a infrastructureless AWS Lambda function (and I suppose this entry point would also work for the newly released Azure Function system). It’s a bit more clever than the old `feed-aggregator` in that it uses If-Modified-Since HTTP requests to only download and parse RSS or ATOM feeds that¬†purport to have changed since the last time an aggregation was ran.

Implementation

The implementation of the `feed-downloader`, `api-server` and `web-app` components follows my guide to writing better quality Node.js applications. Node.js was chosen due to its abundance of good quality libraries, ease of interaction with JSON objects and the authors familarity with it in production scenarios.

ES2015 JavaScript features including the module system, string interpolation and destructuring are¬†used throughout to aid readability of the system —¬†therefore Babel is required for¬†transpilation.

Just some of the feed-downloader tests

Just some of the feed-downloader tests

In order to meet the requirement of good maintainability the `feed downloader` was built using the test driven development methodology and currently has 99% test coverage. These tests use real data, feeds from actual CS Blogs authors, including feeds from Blogger, WordPress.com, WordPress.org, Ghost and Jekyll.

Theres still a lot to be done before before the new CS Blogs can be released, so why not hit up the contribution guide and get involved?

Danny

Writing better quality Node.js applications

In February last year I started writing my first Node.js App, csblogs.com, alongside Rob Crocombe. The¬†application has run without too many issues since then, serving around 1100 unique visitors per month. However,¬†because it started out as a prototype we didn’t follow many of the best practices we should have done, and its starting to show now we want to extend the application.

Since writing a more complicated application at Trainline¬†— which¬†provides an API to clients and consumes many Windows Services, RESTful APIs and Redis Caches itself — I’ve realised how important it is to be using good software engineering techniques and tools from the very beginning of development.

Whilst most of the concepts in this post are language independent, the example tools I explain are all geared towards Node.

The Basics

These first few things are obvious, and are things you should be doing in all your projects.

Source Control

Source Control everything, even prototypes. The minuscule amount of disk space a git repository will require, and the few seconds every so often to write a commit message will be incomparable to the amount of time you will save by being able to revert a change, or check when changes occurred.

I’ve taken to using Github’s variant of the git flow pattern¬†in which branches are deployed to production and only merged into master once they have been tested “in the wild”. This means that whatever code is in master is always certified as working in production and can be relied upon to be rolled back to in the event a new branch doesn’t work as intended. I like the WordPress Calypso Branch Naming Scheme to make it easy to understand what is being developed in each branch.

Branches use the following naming conventions:

  • add/{something}¬†— When you are adding a completely new feature
  • update/{something}¬†— When you are iterating on an existing feature
  • fix/{something}¬†— When you are fixing something broken in a feature
  • try/{something}¬†— When you are trying out an idea and want feedback

If you don’t like that naming convention or it doesn’t suit your needs, thats fine. Choose a naming convention and stick with it. As with so many stylistic choices in Software Engineering it isn’t the style that is important but the uniformity and consistency it brings.

Documentation

There are few things as annoying when developing software as opening a repository to find no information in how to build the project, run tests against the project or what data and functions the code it contains exposes to its consumers. When developing your code you should try to keep your documentation up to date with at least this information:

  • How to build
  • How to run tests
  • How to deploy
  • Data and functions exposed

Things like how to run your build, test and deployment scripts shouldn’t be changing so often as to be a pain to keep up to date. The data and functions exposed however may change reasonably often, especially if you are iterating whilst developing an API, so in order to make that task easier I suggest you use something like Swagger.io.

Testing

Testing code is one of the things that, 5 years into learning about developing software, I’m still learning about and still eager to learn more about. Good quality testing can be the difference between a bug rearing its head in production and costing you thousands of pounds and it being caught, thought about, and fixed earlier in the development cycle. Automated testing also means that you can be confident that any changes you make won’t be causing¬†regression bugs.

When writing a new class, I first sketch out the interface — e.g. the public constructors, functions and data that the class will expose.

class Train {
     constructor(name, britishRailCode) {

     }

     getTopSpeed() {

     }

     determineLocation() {

     }
}

Then I spend some time thinking about all the potential edge cases, as well as the ‘happy path‘ through each of the functions. For example, what should happen when an invalid BR code is provided to the constructor? What happens if GPS coordinates cannot be determine due to faulty hardware — or in the case of html5 geolocation lack of user permission — in the determineLocation() call? What data should I get back from each of these functions? Is there a timeout after which the function should return an error¬†if it hasn’t completed?

Once I have an idea of the expected behaviour of the class I start writing test, in a Behaviour Driven Development way, using the Chai Assertion Library and Mocha Test Framework.¬†There are many advantages to use BDD, one of my favourites is that you dont write names for your unit tests, you state the expected behaviour in a full sentence — this makes it so much easier to realise the intention of the test and makes it so that test code can, in some senses, document the application code. Another great attribute¬†of BDD is that it allows you to think in terms of what you want your code to behave like, rather than implementation details.

Test Coverage

Test coverage is a highly discredited method of determining the quality of a set of unit tests. Covering every line of code doesn’t mean you have thought of every conceivable edge case and therefore doesn’t guarantee your code is free of defects — indeed no testing can, as testing only shows the existence of bugs, not their absence. However, at minimum you should be covering every line of code — and every branch. (Yes, you can have a branch of code that doesn’t include any lines — I leave working out how as an exercise for the reader)

Linting

JavaScript allows you to make decisions on many areas of syntax. To use semi-colons, or not to use semi-colons — that is just one of the questions. When writing code in a team its easier if all of the code is formatted in¬†the same way, so you don’t waste time reformatting it to a way you prefer code to be written. One way of doing this is to have everyone memorise your projects coding conventions and hope they stick to them, a much better way is to use a linter which will warn the programmer if they break any of the projects¬†rules — this ensures that everybody writes in the same style.

An additional benefit of linting is that, depending on which linter you use, you get static analysis of your code provided to you too. This means the linter can point out any variables you have defined, but haven’t later used, for example.

I personally prefer to use ESLint as it allows different rules to be configured through the use of plugins. This means that you can have a set of rules for React JSX code and a different, more suitable, set of rules for your Mocha tests. For the bulk of my application code I use the official Airbnb style guide¬†ESLint plugin — I like Airbnb’s focus on using modern JavaScript constructs and having code be as explicit as possible. They also provide lint rules¬†for React code.

Commit Hooks

Linting and Testing is all well and good, but you need to have the members of your team buy in to using it. And, even on a one man team, I often forget to run tests and linting before I commit code resulting in broken builds and ugly code being in the master branch of my respository.

 

Pre-commit hooks to the rescue!

Pre-commit hooks to the rescue!

Commit hooks can be used to ensure that your unit tests and linter are always ran before a developer can commit their code. This means they can’t forget to lint or unit test and actually saves time in the long run. I use the pre-commit¬†package to provide this functionality. In combination with good unit tests and linting, pre-commit hooks can help ensure that the code in your repository is always working and readable. (Note: a developer¬†can decide to skip hooks if they’re in a rush to develop a hot fix, but this should be avoided under normal working conditions)

Configuration

Configuration is an important part of any application. It can be as simple as wanting to be able to change which database you want to connect to when on your location machine vs which one you connect to in production, however you don’t want this information to get into the public domain!

I used to use JSON files for configuration. However, these could easily be accidentally committed to git and make the secrets they contain public knowledge.¬†Recently I’ve opted to use Environmental Variables for all the reasons outlined by Twelve Factor. The dotenv module for Node.js makes environmental variables easy to change in development. In the CSBlogs applications I’ve been writing I provide a sample.env file which includes all the names of environmental variables developers¬†should be setting to get the app working in their local environment.

So, there it is. A quick run-down of some very basic steps to give yourself a nice place to work in JavaScript world. Now get developing!

Danny

A Reading List

One of the things I’ve always tried to do is learn stuff outside of my university studies. Its nice to have a multitude of views about any given subject, from different people with different experience. Besides, university isn’t there to teach you everything you need to know — just to give you a very good foundation from which to progress.

I thought it might be a nice idea to note¬†down what I’ve been reading recently, mainly so that I can refer back to it later, but also because it might be of use to someone else.

  1. Coding Horror – Lots of opinions and strategies for dealing with User Experience
  2. The Joel Test РList of things your Software Projects should be aiming to utilise by a former Excel Developer
  3. Slashdot – A website which aggregates general tech news
  4. Daring Fireball – Tech opinions from the guy who *invented* Markdown
  5. dev.ghost.org¬†– Notes and information about the development of the Ghost blogging platform, especially interesting if you’re using that system, but also interesting for insight into how¬†modern open-source development works.

As you can see its not a huge list, but I think its full of interesting opinions and news. If you have any publications you think your fellow computer scientists should be reading feel free to comment below.

One of the things I noticed whilst writing this list is how much I am missing reading HullCompSciBlogs. Don’t worry, CSBlogs.com is still on track to be released soon. You can aid in the development of the system here:¬†https://github.com/csblogs/

Danny

Introducing Dollar IDE – An Update on my Final Year Project

Dollar IDE - The Modern IDE for PHP

In the past few weeks, since the semester 1 exams in January, we third years have been blessed with some calm time without any exams to revise for or coursework to work on, just two days a week of lectures. During this free time I have been working on my final year project.

User Interface, Icons and a Name

As recently as two weeks ago I had very little to show anyone who asked to see my project due to it all being technical back-end stuff without user interaction, which was a shame as I’ve actually done a reasonable amount of work on it, however now I have developed some of the UI and finally have something I can ‘show off’.

My previously unnamed PHP IDE is now known as Dollar IDE, so called because of how all variables in PHP are denoted by a $ sign. At the top of this blog post you can see a fantastic logo which my friend, Harry Galuszka, who studies Graphic Design at Lincoln University designed for me.

The colour scheme I have chosen for the user interface and logos consists of various shades of purple,¬† an unusual choice for me because I don’t usually like the colour, however it is the ‘official’ colour of PHP, as you can see from the PHP.net website, and it also fits in quite well with the design language I am using, which is based on Microsoft’s Modern (formally metro) ideas.

The program doesn’t contain any bitmap images, instead I use XAML vectors, and is built using Windows Presentation Foundation meaning that the program works well on all screen resolutions and pixel densities, including Apples famous “retina displays”.

The main code editing screen isn’t ready to be shown off just yet, but below you can see some images of the “Start Screen”, which is the first window shown to the user when they launch the program. From here the user can opt to create a new Dollar IDE project, or open an existing Dollar IDE project or stand-alone PHP file. The 5 most recent projects and displayed in order to allow the user to rapidly get back to work on their website or web-app.

Dollar IDE - New Project Information Validation

Dollar IDE – New Project Information Validation

An important aspect of the start screen is its ability to inform the user of any issues in creating a new project before they continue, as you can see in the above screenshot issues are denoted by a red circle with a cross in, and fields which are OK are denoted by a green circle with a tick in.

This as-you-type validation means that mistakes can be spotted earlier, thus saving the user time and the exact cause of any issues can be determined and displayed. It’s always annoying when you enter some data into a program only to be told its wrong when you try to save… and then not be told what is wrong with it, being forced to make an educated guess. Dollar IDE gives you no such hassle.

Git Integration

Dollar IDE - Git and GitHub Integration

Dollar IDE – Git and GitHub Integration

One of the coolest aspect of my new IDE is its tight integration with the git revision control system. The program can interface with any git repository, hosted either locally or on a remote server, and has additional support for repositories hosted on the popular GitHub service.

In the main IDE window, which I will show off soon, the user will be able to

  • See which files have uncommitted changes
  • Commit files and projects
  • Merge files with conflicts
  • Pull changes from a remote server

Other Progress So Far

Other than what I’ve shown you in this very quick preview I have the following features working:

  • Correct tokenization of the most common PHP tokens
  • Complete coverage of the tokenizer with unit tests to ensure program correctness
  • A source code editing window with colour highlighted text and line numbers
  • Project loading and saving
  • File associations (to allow project files and PHP files to be loaded from Windows Explorer)

What’s Left to Do

There’s still rather a lot left to do on this project, but fortunately there’s still around another 2 months to do it in. I’m now feeling rather confident that I will finish my product on time with all of my primary functionality and a lot of my secondary functionality implemented.

I will of course keep the blog up to date on my progress.

Danny