17 Jun 2017, 19:28

Deploying a Static Blog with Continuous Integration

In recent years, static sites have seen a resurgence. In this post we’ll explore how to use Hugo, a static site generator, in conjunction with a remote web server and a continuous integration provider. Although the site generation software here is Hugo it could easily be another provider such as Jekyll. The web server will be hosted on a Digital Ocean droplet running Nginx, and CircleCI will be used for for the continuous integration (often abbreviated to simply “CI”). Although it is difficult to speak in specifics due to the multitude of alternatives out there, it should be (hopefully) fairly straightforward to deduce a similar process for other providers.

Firstly, for those of you unfamiliar what Continuous Integration is, here is a definition:

“Continuous Integration is a DevOps software development practice where developers regularly merge their code changes into a central repository, after which automated builds and tests are run” (Amazon Web Services, 2017)

Generally CI is leveraged when there are many developers checking in code to a distributed version control system to ensure the integrity of software before its release to some environment. However we can leverage another one of its key tenets, which is that the building, deployment and testing is being done by a remote service, saving us time and energy. This in turn allows you to focus on writing the blog posts and not how or where the content needs to be deployed.

Some of you may be deploying your static Hugo site (or other static blog) via a bash script or some sort of manual process. This guide will explain how to deploy your blog every single time you commit that code to GitHub, or other cloud based git provider.

Why Bother?

There are a few benefits to setting up CI with your static blog:

  • Avoid repetitive local build and production steps, offload work to another machine
  • Those build steps will always run, so you can’t forget to do them locally (i.e. for theme CSS/JS modification etc)
  • As you need to do it to deploy, it helps you remember to commit your code
  • Clone code to any machine and edit posts from there without worrying about build/deployment dependencies or processes
  • Allows you to edit/create posts using GitHub’s user interface directly
  • Helps keep secrets and sensitive information out of your source code

Setting Up Circle CI and SSH Keys

You will need to register for an account with CircleCI and associate it with your GitHub account. From here you can begin to build GitHub projects from CircleCI. Check out this guide from CircleCI for more specifics regarding that process.

Before we dig into the details, it is important to explain that in order to get the Circle CI talking to our remote server (this case a Digital Ocean droplet), we must setup SSH keys. SSH allows keys allow us to be authenticated by the server whilst avoiding the use of passwords via Public-key cryptography.

It is a little outside the scope of this guide to delve into setting up the keys, however a great guide to generating SSH keys is available from GitHub here, and an explanation of how to use them in conjunction with Circle CI can be found in their docs here. Once you have generated the keys and registered them with CircleCI you can move onto the next section.

Environment Variables

It is desirable to avoid storing any secrets or sensitive information inside the blogs source code itself. One way to do this is to use environment variables on the CI server. CircleCI provides a UI for setting environment in the project settings, on the left hand panel.

For example in this case, the user login password is set as DIGITALOCEAN and the IP of the server is DIGITALOCEAN_IP. This also makes them reusable should you need to use them more than once in your configuration scripts.

The Configuration File

Once our target server and CircleCI are setup with the right keys we can begin to look into how to deploy from CircleCI. With CircleCI we provide a configuration file in the form of ‘circle.yml’. The config is written in YAML which is a minimalist markup language, often used for configs. Here is what my specific circle.yml file looks like:


dependencies:
  pre:
    - wget https://github.com/gohugoio/hugo/releases/download/v0.23/hugo_0.23_Linux-64bit.deb
    - sudo dpkg -i hugo*.deb
    - sudo apt-get install sshpass rsync
    - cd ./themes/impurehugo/ && npm install 

deployment:
  prod:
    branch: master
    commands:
      - cd ./themes/impurehugo/ && npm run build
      - hugo -v
      - sudo sshpass -p "$DIGITALOCEAN" rsync -avz ./public $DIGITALOCEAN_IP:/var/www/html/
      
test:
  override:
    - "true"

You can see in the dependencies section we install hugo at the current latest release (v0.23), and we install the package. We also install sshpass as we will need it to login on the target deployment server.

In my case I do a little extra work with npm and gulp to do some preprocessing (JavaScript/CSS minification, image compression) so we change to the target theme folder and do an npm install there to ge the node dependencies and install gulp. This step can be skipped if you aren’t interested in these preprocessing steps.

With deployment, we pull from master and then we do the frontend pre-processing previously mentioned with gulp compress. After this we run Hugo with the verbose flag (useful for debugging purposes if necessary). Next because Hugo only produces static assets, we can simply move them over to our target server. Here we use rysnc to copy the files to the remote server. rsync has the benefit over scp that it only transfers files that have changed since the last upload. However, you could use something like scp if you are so inclined.

Lastly we override the tests section to simply pass if as our only real major concern is if there are errors in the actual build and deployment steps.

Conclusion

This should hopefully give an overview of how to setup continuous integration with Hugo, and hopefully enough inspiration to adapt it to work with other static site providers if necessary. I really welcome any improvements, requests for clarity of other feedback you might have for me. Feel free to reach out to me on Twitter, or drop me an email.

15 Feb 2017, 20:53

Effective Cartograms

Earlier today I tweeted something that I had been contemplating for a little while about cartograms. For those of you unfamiliar, a cartogram is a type of map that uses a attribute (population, income etc) of a geographic feature to influence it’s area in some capacity. The cartogram that is most familar to most people is the ‘continuous irregular cartogram’, which can be seen as figure C in this image:

A and B are square cartogram and a continuous regular cartogram respectively. Cartograms have a long history spanning back as far as 1973, and there are over 25 noted algorithms for producing them. In addition there are many different types and variations. Today they are often popular in (but not limited to) academic journals, media companies and interest websites.

What are the potential pitfalls of cartograms?

I knew that the tweet might be mildly controversial in the mapping/data visualsiation community but I was interested to see others opinions on the matter:

Although in hindsight I wish I had worded it slightly differently, my underlying premise remains the same upon writing this post. My reasoning is as follows:

  • Cartograms explicitly distort the shape of a geographic area.
  • In many cases distortions away from the socially normalised Mercator are so extreme it makes it difficult for the viewer to interpret which geographic region is which.
  • There is no obvious way of determining what the attribute scale is, as such there can be no legend to aid the viewer.
  • As such it is very difficult to determine middle values. Implicitly it is the geographic region that is least distorted, which may be hard to perceive for the user.

The primary reason for using a map to visualise geographic data is to convey meaning to its viewer, to tell a story about what the underlying geographic trends are. If the user can’t decipher what geographic regions they are actually seeing the medium begins to undermine itself. Cartograms make the interpreter have to constantly compare the distorted image they are seeing to their perceived mental image of what that area should look like. Let me give an example from one the top results on Google Images:

As you can see the world is highly distorted on top of the distortion provided by the Mercator projection. In some ways the map fulfills it’s purpose. We can see the US has low distortion, Australia is relatively large. However, we can’t determine what that hectares value is for any nation. It becomes a lot harder to determine anything of substance about the location or value of low ranking countries other than ‘It must be very low and it’s somewhere in this array of squashed lines’. Through the critical lens, it is a logical conclusion that cartograms primary purpose is to leave the user with an abstract idea of the relative values of some factor between geographic regions.

At a higher level, legibility and simplicity are two of the Ordnance Survey’s key cartographic design principles, both of which I would contend extreme cartograms tend not to conform too. Indeed, my main reasoning for seeing cartograms as unfavourable is not that I think that they are inherently bad, but rather there are more suitable ways to represent said data. I would put forward that a simple choropleth map would often be more suitable than most cartogram representations.

How can cartograms be used effectively?

Having explored why cartograms can be easily misapplied, there are some potential ways that we can make cartograms more effective for users. My goal for this post was to find some constructive ways to help make better cartograms. The main starting point is to consider why we do geographic representations. Sometimes it is very easy to fall into the pitfall of assuming a key factor is ‘does this look cool?’, but arguably we do not often ask ‘does this help users understand the point we’re trying to make?’. To put it plainly, simple and clear tends to beat cool and technical in most use cases.

The use case for cartograms is often to explore relative values of some data set. We forfeit geographic fidelity for (arguably) a more interesting visualisation. So, if we are to utilize this method, how can we use them effectively? Here are some ideas around that:

  • Consider, does the data suit a cartogram? Highly skewed or anomalous data may not work exceptionally well in conjunction with a cartogram.
  • Avoiding algorithms that produce extreme results that may remove any meaningful geographic shape from the visualisation will make the map more readable and digestible.
  • Not mixing the method with other methods like choropleths (i.e. using colours to visualise another variable) can improve readability.
  • Would it work well as an animation? An animation from a normal map to a cartogram might help the user deduce which region is which. This reduces the mental mapping of undistorted regions to the cartogram.
  • Using labels might be another way to help the user determine what each distorted shape represents, preventing them having to mental gymnastics to figure it out. This may be difficult if distortions are extreme.
  • Would the cartogram work well if it was interactive? This would allow users to interogate data using mouse/touch, revealing values and region names.
  • Another point is considering using Dorling cartograms, these use uniform shapes to represent regions. These might provide a more palatable user friendly visualisation. These visualisation may bring there own host of problems however.

Feedback & Credits

Do you have an alternative opinion? Something to add? Feel free to comment here or reach out to me on Twitter!

Credits

22 Jan 2017, 18:33

Dense Spatial Data and User Experience

Across many disciplines we are increasingly seeing the inflation in size of datasets. From megabytes to gigabytes and beyond, with certainly no exception in the geospatial industry. This is in turn posing an ongoing challenges for geospatial developers, web mappers, cartographers and various other spatial data wrangling professionals.

As with web pages, increased density of data on our focus point (our map) can subsequently lead to a poor user experience. Visual clutter and poor cartography detracts from the ability to extract meaning, and also makes exploring data more difficult. As a hyperbolic example see the following image:

This has been a problem that I have faced multiple times at my work helping startups at the Geovation Hub, alongside my own personal projects. As such I thought I would share my thoughts on the subject. The main purpose of this article is to visit a selection of ways to help tackle the problem of visualising dense datasets alongside improving usability of maps along the way.

Disclaimers: The article focuses predominantly on point data from a web mapping perspective; I make no claims at being an expert cartographer!. The Carto team wrote a great blog post covering a fair percentage of the approaches previously, so kudos for that! Lastly this list is not exhaustive and I make no claim any of these methods are particularly novel. With all that out the way, let’s take a look!

Clustering

Clustering; perhaps the most classic method of reducing point data overcrowding. The purpose of clustering is to appropriately pull together (‘cluster’) data points that are close together between different zoom levels to avoid map over crowding. Some libraries have nice declustering effects as you zoom in and out. Many web mapping providers provide built in or plugin based marker clustering to account for this. Here’s an example using MapBox:

Opacity Based Rendering

When I was building the GitHub map, I faced a noticable problem when zoomed out. The density of the cities at these zoom levels made the whole map very cluttered and hard to digest the location of cities with high (normalised!) number of Github users. Due to the power of functions and the expressiveness they provide it was possible to create an effect which would essentially allow all city points to be seen at full opacity when zoomed in, and reduce the opacity of less significant points at lower zoom levels.

Binning

The perhaps unfortunately named binning approach allows you to pull together points in a geographic region and ‘bin’ them together, taking some average value of an attribute data (or even just density) to give a value to the bin shape. The most classic bin is most likely the square, but variations on this include hexagons and triangles. Here’s an example from Mike Bostock (the author of D3.js) using bivarate coloured hexagon bins for data on median age of shoppers of a popular supermarket chain:

Heatmaps

The classic heatmap. Heatmaps allow you to create a continuous surface of some geometry attribute (or again just point density). Heatmaps can often be misused and certainly have their downfalls (this blog post by Kenneth Field gives a good explanation as to how and why) but for certain data and expressed correctly I believe they can be a useful approach. I think this is especially true when the blur effect distance is a function of some real world distance relating to the data points themselves. Here is an example of a heatmap depicting Earthquakes using Esri’s ArcGIS JavaScript API:

Collision Detection

For your users finding a useful data point amongst hundreds can sometimes be like trying to find Wally (Waldo) in a crowd, only an order of magnitude less entertaining. One technique we have been looking into at my current work is using collision detection. Collison detection allows you to determine when markers or labels collide with each other and hide the ones you are less interested in (this can be implemented via some sort of weighting system). For or specific example we wanted to prevent label overcrowding. Thanks to Vladimir Agafonkin’s rbush we were able to determine collisions and weight appropriately for our use case. This allows us to only show the most pertient labels at each zoom level. It turns out that MazeMap already worked on a similar idea as an opensource project for Leaflet, so shout out to them too! Although in our case we tackled map labels the same idea could be used with markers or other map items too.

The Geovation Hub along with our friends at Podaris (a super cool startup building a real-time infrastructure planning platform) helped us release Labelgun to combat the label overcrowding problem. Labelgun works across mapping libraries as it depends on label coordinates rather than their implemented objects. Here’s a demonstration of how that works in Leaflet:

Chips

Chips (think poker, as opposed to what Americans call crisps), are arguably one of the more more recent approaches on the list. They offer a unique way to visualise multiple data points that share same location. Chips stack markers as small chips vertically, allowing them to be more clearly exposed to end users. The Carto team gave some great coverage to chips as a visualisation technique in the aforementioned blog post. From my perspective they lose there advantage on highly dense visualisations as they begin to overlap and expand very far north. If you want to try the approach out Ivan Sanchez has implemented this in Leaflet alongside the explanation given in the Carto blog post.

Spidering

Spidering (arguably another unbefitting nomenclature) is another way of compressing overlapping information points. The approach fans out markers away from their central overlapping point. This works well for locations that lots of markers sharing the same point, but are sparsely placed. This might be appropriate for say highly populated global cities but fall down for more continuous data sets. There are examples of this in Leaflet and also in Google Maps. Here is in Leaflet:

Pie Chart Markers

Pie and chips, both classic English cuisine, but also arguably two sensible ways to express multiple points with different attributes at the same location. Chips and pie charts have a similar set of properties, but pie chart markers have the advantage of showing ratios of attributes to one another slighty better. A good example use case for pie chart markers might be showing the percent of voters for various political parties in a city/ward.

Pie charts also produce a drastically different visual effect, which might be more fitting to your purposes. Furthermore they can be combined with clustering to inherit the its benefits of better visual aggregation and simplification at lower zoom levels. I would posit that their sweet spot is at around 2-5 different catagories displayed but they probably begin to lose focus at a higher number. For a full break down of the approach check out this superb blog post from the breakdown take a look at Ordnance Survey’s cartographic design time regarding their implementation of this method.

Layer Switching

One thing I always ask folks who want to visualise lots of data at once is why? Most of the time it isn’t actually necessary to visualise that much data at one time, and is an active blow to the end user experience. Assuming that you are visualising data of different types/categories (for example, say bars, cafes, restaurants etc) you could add a layer switcher which would allow users to pick the specific type they’re interested in. This should reduce the amount of data you actually show at any given time, allow users to find what they want faster, whilst also boosting performance. Here’s an example from Carto:

Bonus: Stateful Markers

Although not explicitly about reducing density, one feature I’ve seen recently that I think adds a nice effect is to add some change of state to a marker once it has been viewed. This is similar to how hyperlinks might turn purple after you’ve viewed them on a webpage. I first noticed this approach when using Airbnb for the first time. Here, the way in which the markers changed provided utility as it allowed me to see which of the available properties I’d already explored.

Bonus: Data Capping

For performance reasons you may want to set a hard limit on the number of markers that actually get rendered to the screen (I’d put 500-1000 as a decent limit for most maps, potentially less). In a similar manner to collision detection, you may want to refresh this at higher zoom levels so that you can actually see detail as you zoom into the map. Ideally you might want to do this in a way that allows for good dispersion of markers over a geographic area for a better visual effect and reduce.

Conclusion

Hopefully that’s been a useful breakdown of potential approaches to reducing data clutter on your maps and subsequently improving user experience. I really welcome any feedback and comments, or even potential additions! Lastly on a more general note I would advise examining the Ordnance Survey’s cartographic design principles for more in depth look at how to effectively show data on map.