23 Dec 2016, 16:56

Visualising Facebook's Population Data on the Web

Over the past month or so (on and off!) I’ve had the opportunity to explore two really interesting ideas. The first was relating to a talk I saw at FOSS4G in Bonn this year by Fabian Schindler. Fabian’s talk was on how to use GeoTIFF data in the web using a set of libraries called geotiff.js and plotty.js. You can see a post about his talk here. The talk got me thinking and interesting in what sort of GeoTIFF data could be pulled into the web.

Around the same time, a news article caught my eye about Facebook having teamed up with a research institute to create a global dataset of population densities. I later managed to find the data on the International Earth Science Information Network (CIESIN) website here. From here I explored ways of making use of the datasets in the web as per the talk I’d seen at FOSS4G.

The Problems

Upon examining the raw data I noticed some immediate problems with my plan:

  • The data is relatively large for the web - a country could be upward of 100mb
  • The data has multiple bands that may not all be useful to end users
  • WebGL isn’t great with large textures

To take on all these problems it was necessary to preprocess the data into a more manageable and usable format.

Preprocessing the Data

To solve the listed problems was a relatively tricky endeavour. There was a multistep process:

  • Extact a single band
  • Downsample the raster (reduce it’s resolution)
  • Split the raster into managable chunks
  • Allow the browser to know where these chunks are somehow

For all of these processes I made use of Python and GDAL. The band extraction was fairly straight forward process, but the downsampling and splitting were somewhat more complicated issues.

You can see the full solutions to the downsampling and band extraction problems on the GitHub repo. Splitting the data was probably the hardest problem to solve as I struggled to find any examples of this being done across the web that weren’t making call outs to the shell from within Python (something I wanted to avoid).

In order to correctly split data it was necessary to subdivide the raster into a given size grid. For this to work correctly we needed to get the top left and bottom right coordinates of all the grid cells. After some thought on solving this mental puzzle, I deduced that you can create an arbitrary (n by n) sized grid of such coordinates using the following function:

def create_tiles(minx, miny, maxx, maxy, n):

    width = maxx - minx
    height = maxy - miny

    matrix = []

    for j in range(n, 0, -1):
        for i in range(0, n):

            ulx = minx + (width/n) * i 
            uly = miny + (height/n) * j 

            lrx = minx + (width/n) * (i + 1)
            lry = miny + (height/n) * (j - 1)
            matrix.append([[ulx, uly], [lrx, lry]])

    return matrix

Splitting the tiles allows us to send the raster in chunks whilst avoiding using a tile server or any kind of dynamic backend. I created a JSON file that contained metadata for all the necessary resulting files, allowing us to determine their centroid and file location prior to requesting all of them.

Displaying the Data

Displaying the data on the frontend took a little bit of trial and error. I used a combination of OpenLayers 3, plotty.js and geotiff.js to accomplish the end result. geotiff.js allows us to read the GeoTIFF data, and plotty.js allows us to create a canvas element that can be used by OpenLayers 3 to correctly place the elements.

To request and handle the asynchronous loading of the data I used the Fetch API and Promises (I’ve included polyfills for both in the demo). Once all the promises have resolved we now have all the tiffs loaded into memory. From here we can use a select dropdown that allows us to change the colors used for presenting the data.

The end result looks a little something like this:

Pros and Cons of this Method

Pros

  • We got to a point where we can display the data in the web
  • The data can be restyled dynamically clientside
  • No dynamic backend or file server required, just static files after preprocessing

Cons

  • Tiles are actually less size efficient than one big file, but are necessary to get the data to display
  • The downsampling factors have to be quite large to get it to be a reasonable file size
  • Even tiled, file sizes are quite large (i.e. 9 tiles at 2mb a file == 18mb which is a lot for the web)

One interesting idea about having the data client side as opposed to a raw image is you could even go as far as to figure out how to do basic visual analytics on the client. For example you could find a way to display only the highest values or lowest values for a given GeoTIFF.

Have a Go

The repo is located at : https://www.github.com/JamesMilnerUK/facebook-data-viz

There are instructions in the README for usage.

04 Sep 2016, 19:28

10kB Web Pages

Over recent times there has been a lot of stir around the growth of website assets and total transfer over the wire. It has been pointed out that the average size of a website is now larger than Doom! (Credit: mobiForge)

In response to these acusations of page bloat, we have seen an emerging trend which is that top websites are now decreasing there page weights. For example Financial Times has stated they are moving from a “culture of addition” to a “culture of subtraction” in order to reduce page load times. Customer sastisfaction is the obvious benefit here, faster pages means people get to the content they want quicker. But there is also another motive at play; research has shown load speed costs money, some times in a big way. Not only are there costs to the provider for heavy website, but it can also cost users as mobile data plans are often expensive, especially in developing countries.

With all this talk of reducing website page bloat, I thought I’d share with you something I came across over the last week or so called 10k Apart. 10k Apart is a challenge to “Build a compelling web experience that can be delivered in 10kB and works without JavaScript”. I found this an interesting proposition, with modern emphasis on sometimes complex and generally heavyweight JavaScript frameworks it takes a step back to the first princple technologies of the web. I came up with a couple of entries for the competition, both more simple experiences rather than a traditional website per se. My goal was to produce something small, experiential, simple and of course, sans JavaScript. I ended up doing two entries, the first was 10k-tiles and the second was 10k-quadtree. I won’t say too much about them and rather let you have a play. Overall 10k-tiles came to 4.5kB ungzipped and 10k-quadtree came in at 8.7kB. The quadtree was slightly heavier because of all the necessary divs. Overall it was a fun experience which has made me reflect on keeping the web lean and no more complex than it needs to be. In addition I learned more about one of my cryptonites; CSS!

As a final thought I thought I’d leave you with two articles I found interesting regarding page weight; the first is Chris Zacharia’s eye opener post about reducing page weight at YouTube and the positive (if unexpected) effects that had. In addition there is John Allsopp’s commical post/talk ‘The Website Obiesity Crisis’ which is certainly worth a delve.

31 Aug 2016, 22:33

The Geospatial Continuum: Is Geo Breaking Out Of Its Niche?

A curious thing struck me during the recent hype surrounding Pokemon Go. I was discussing with a friend at my current place of work, the Geovation Hub about the intersection of GIS and society, and how Pokemon Go might help bring geo and GIS more into the public eye. My gut reaction was “I doubt that”. Why you may ponder? My reasoning goes that the majority of geospatial innovations don’t really have much to do with the traditional GIS sector. As such they don’t really expose those companies or their technologies. Examining geo products which have infiltrated or reformed our day to day lives in recent years: Google Maps, Uber, Zoopla, CityMapper, Tinder, Deliveroo, Laundrapp and now Pokemon Go they are more closely related to commercial monetization efforts and run-of-the-mill software development teams than the geospatial consortium.

Indeed under the hood the Pokemon Go story is slightly more complex in that in their case Niantic (the company behind the game) was devised by the team behind Keyhole which was later bought out by Google and rebranded as Google Earth. With this glaring exception identified, what I am trying to exhibit here is that the fundamentally the ‘geospatial revolution’ that we hear so much about from the geospatial sector has been predominantly driven by groups of individuals who have limited connections to the traditional geospatial sector itself. For example, whilst having dinner with Chris Sheldrick from What3Words a couple of years ago he described how he used to work in booking and producing live music events before he founded his company, which at the time was a surprise.

You just have to look at a host of location based startups like Deliveroo, Laundrapp, Zoopla and Tinder, almost all of which almost certainly must be collecting and leveraging geo data of some capacity, and you’ll see a very slim percentage are hiring any GIS analysts or even developers. The idea doesn’t really feature on their radar. It doesn’t really have to most the time, and that’s not necessarily a bad thing. They cross those bridges when they come to them and they tend to create their own solutions rather than delve into the sometimes abstruse landscape of geospatial software. Even Uber which has an appreciation for the GIS profession sees many of it’s geo problems as a standard engineering task, see for example it’s geofencing efforts which is all custom written in Go (and an interesting take from an ex Bing Maps engineer. In the paraphrased words of an old colleague perhaps: “you don’t need GIS, you need to solve a problem”.

Clearly although distinctly related, GIS and commercial geo have tended to exist within two very different worlds. With that being said I’d like to argue that perhaps that distinction is showing signs of ebbing. I see both the dichotomy but also the blurring of those worlds occurring with the geospatial startups we work alongside at the Geovation Hub. I think it wouldn’t be an unfair statement to say the vast majority of the teams we work with don’t have a background in GIS, but they do have an appreciation for its use cases and solutions. There is an understanding of how it can help solve their business problems. For example Fatmap and Terrabotics and LandInsight are all currently looking for developers (ideally) with some level of GIS in their skillsets.

Putting the emphasis back on the industry side of things, you might ask the question if traditional GIS has all these fancy toys and insights that we all love and hold dearly, why does it appear innovations in the traditional GIS space are so lacklustre? Why hasn’t it systematically been at the heart of changing every day life like Niantic did with it’s AR gaming application? Upon thinking about it, the conclusion that I’ve come to is perhaps it has done. The catch is it’s, on a whole, not overly fantastic at flaunting itself in the same capacities that consumer facing tech companies tend to be (think Twitter, Instagram, Google etc). That is not to say it doesn’t do cool things (I’d like to think it does), I believe it’s more a testament to the spaces it operates in. GIS doesn’t tend to disrupt industries like food delivery, dating or finding the perfect bar, but it has incrementally shifted behaviours and approaches in sectors like asset management and transport planning. Indeed, GIS is an endeavour that consistently sweeps under the radar of modernities tech news radar.

As of recent, I think there are certainly movements that are gradually bridging the gap between traditional geospatial arena startups and innovation. Some examples include Esri which has it’s Startup Program for accelerating startups wanting to integrate GIS functionality into their products. At the Geovation Hub in London we are working to help startups with geospatial data. We can also see companies like Carto and MapBox that are filling their own niches and doing a great job of pulling in non-traditional cliente like Data Journalists and startup-y types. On top of all this all the biggest players (Esri, Here, Google, MapBox, Carto etc) are offering great APIs that developers can pull right into their apps.

There is an ever blurring exchange between the the worlds of industry and academic GIS and the tech and startup worlds. This is in contrast to an understanding of the more squared off set of ‘commercial geo’ and ‘industrial GIS’. With the continued growth of location based apps and ever increasing hype around drones, autonomous vehicles, big data, VR/AR and IoT, I think the edges of the two will continue to dissipate. It’s a movement away from a geospatial dichotomy to a geospatial continuum. Having said all this, I think in order for geo to truly flourish it must not be kept as an esoteric members club. It must be allowed to cross-pollinate with a whole raft of different ideas and fields, in weird, wonderful and unexpected ways.