Did you know that you can navigate the posts by swiping left and right?

Geographic Data for D3 - from GeoJSON to TopoJSON

January 08, 2018, January 08, 2018 | Comments

category: VISUALIZATION
d3 javascript json

During this post, I will go through from the basics of GeoJSON and TopoJSON to comparing the difference and improvement from one to another and finally use simple examples to illustrate how to optimize the size of TopoJSON by Quantizing and Simplying without losing the quality of data visualization.

1. What is GeoJSON?

Based on 2015 IETF, the Internet Engineering Task Force, GeoJSON is defined as a JSON format for encoding data about geographic features. GeoJSON could represent a region of space (a Geometry), a spatially bounded entity (a Feature), or a list of Features (a FeatureCollection). GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Features in GeoJSON contain a Geometry object and additional properties, and a FeatureCollection contains a list of Features. A Feature consists of Geometry and additional elements and a FeatureCollection is just an array of Feature objects.

1.1 Geometry

A Geometry object consists of a type and a collection of coordinates which defines the position of subject of type. The components start with simple units: Point for one dimension, LineString for two dimensions, and Polygon for three dimensions. The complications of GeoJSON are all based on any of these three types.

Point

Point is just a simple point defined by its coordinates of position by the convention order longitude and latitude.

{ "type": "Point", "coordinates": [0, 0] }  

LineString

LineString is the line with starting point and ending point.

{ "type": "LineString", "coordinates": [[0, 0], [10, 10]] }

Polygon

Polygon is more complicated than Point and LineString since it has shapes. There are two types of Polygons. One comes without holes.

{
  "type": "Polygon",
  "coordinates": [
    [
      [0, 0], [10, 10], [10, 0], [0, 0]
    ]
  ]
}

And the other comes with holes.

{
   "type": "Polygon",
   "coordinates": [
       [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ], // exterior boundary
       [ [100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2] ]  // interior boundary
   ]
}

On top of these three basic units, we have three extensions of each type by adding multiples onto each type.

MultiPoint

An array of Point objects.

{
   "type": "MultiPoint",
   "coordinates": [
       [100.0, 0.0], [101.0, 1.0]
   ]
}

MultiLineString

An array of LineString objects.

{
   "type": "MultiLineString",
   "coordinates": [
       [ [100.0, 0.0], [101.0, 1.0] ],
       [ [102.0, 2.0], [103.0, 3.0] ]
   ]
}

MultiPolygon

An array of Polygon objects.

{
   "type": "MultiPolygon",
   "coordinates": [
       [
           [ [102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0, 3.0], [102.0, 2.0] ]
       ],
       [
           [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ],
           [ [100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2] ]
       ]
   ]
}

GeometryCollection
The above six types of geometry could be combined together to create GeometryCollection.

{ "type": "GeometryCollection",
    "geometries": [
      { "type": "Point",
        "coordinates": [100.0, 0.0]
        },
      { "type": "LineString",
        "coordinates": [ [101.0, 0.0], [102.0, 1.0] ]
        }
    ]
}

All the seven types of Geometries, Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection, are case-sensitive. The order convension of coordinates follow the longitude-latitude-elevation order.

1.2 Feature

A Feature is an object of collection of geometry and additional properties and both geometry and properties are required by Feature. Specifically, Feature will have type property with value Feature, geometry property as well as properties property.

{
   "type": "Feature",
   "geometry": {
       "type": "LineString",
       "coordinates": [
           [100.0, 0.0], [101.0, 1.0]
       ]
   },
   "properties": {
       "prop0": "value0",
       "prop1": "value1"
   }
}

1.3 FeatureCollection

Not surprisingly, FeatureCollection is just an array of Feature which has type property with value FeatureCollection and features.

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [0, 0]
      },
      "properties": {
        "name": "null island"
      }
    }
  ]
}  

1.4 Bounding Box

GeoJSON may have a member called “bbox”, bounding box which contains information on the coordinate range for its geometries, features or featurecollections. It follows the convension of longitude-latitude-elevation min-max order going from left, bottom, right to top counter-clockwise which defines the boundary of underlying geo-information.

{
       "type": "Feature",
       "bbox": [-10.0, -10.0, 10.0, 10.0],
       "geometry": {
           "type": "Polygon",
           "coordinates": [
               [
                   [-10.0, -10.0],
                   [10.0, -10.0],
                   [10.0, 10.0],
                   [-10.0, -10.0]
               ]
           ]
       }
}

2. TopoJSON

TopoJSON is an extension of GeoJSON which eliminates redundancy to allow geometries to be stored more efficiently.

According to TopoJSON Format Specification, it must contain a “type” member, usually “Topology”, a “objects” member, itself another object named “example”. Geometry object Point and MultiPoint must have a “coordinates” member while LineString, Polygon, MultiLineString and MultiPolygon must have a “arcs” memeber. Both “coordinates” and “arcs” are always an array. “bbox” is optional as well as “transform” which is used to construct “quantized” topology. I use the simple examples in the GeoJSON session to illustrate TopoJSON.

//Point
{"type":"Topology","objects":{"example":{"type":"Point","coordinates":[0,0]}},"arcs":[],"bbox":[0,0,0,0]}

//LineString
{"type":"Topology","objects":{"example":{"type":"LineString","arcs":[0]}},"arcs":[[[0,0],[10,10]]],"bbox":[0,0,10,10]}

//Polygon
{"type":"Topology","objects":{"example":{"type":"Polygon","arcs":[[0]]}},"arcs":[[[0,0],[10,10],[10,0],[0,0]]],"bbox":[0,0,10,10]}

//MultiPoint
{"type":"Topology","objects":{"example":{"type":"MultiPoint","coordinates":[[100,0],[101,1]]}},"arcs":[],"bbox":[100,0,101,1]}

//MultiLineString
{"type":"Topology","objects":{"example":{"type":"MultiLineString","arcs":[[0],[1]]}},"arcs":[[[100,0],[101,1]],[[102,2],[103,3]]],"bbox":[100,0,103,3]}

//MultiPolygon
{"type":"Topology","objects":{"example":{"type":"MultiPolygon","arcs":[[[0]],[[1],[2]]]}},"arcs":[[[102,2],[103,2],[103,3],[102,3],[102,2]],[[100,0],[101,0],[101,1],[100,1],[100,0]],[[100.2,0.2],[100.8,0.2],[100.8,0.8],[100.2,0.8],[100.2,0.2]]],"bbox":[100,0,103,3]}

//GeometryCollection
{"type":"Topology","objects":{"example":{"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[100,0]},{"type":"LineString","arcs":[0]}]}},"arcs":[[[101,0],[102,1]]],"bbox":[100,0,102,1]}

//Feature
{"type":"Topology","objects":{"example":{"type":"LineString","arcs":[0],"properties":{"prop0":"value0","prop1":"value1"}}},"arcs":[[[100,0],[101,1]]],"bbox":[100,0,101,1]}

//FeatureCollection
{"type":"Topology","objects":{"example":{"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[0,0],"properties":{"name":"null island"}}]}},"arcs":[],"bbox":[0,0,0,0]}

As we can find out, all TopoJSON counterparties have a “type” member with value “Topology”. The topology objects are all with “example” object and the differences start with it by different types of geometries. For Point and MultiPoint, they have both “coordinates” and “arcs” members although “arcs” is always null since the position information is carried over by “coordinates” while the rest LineString, Polygon, MultiLineString and MultiPolygon only have “arcs” member.

3. From Raw Data to TopoJSON

In reality, we need to create our own TopoJSON file for D3’s consumption from raw ShapeFile formats. I will go through steps borrowed from Bostock’s series of blogs 1, 2, 3 and 4, and Ændrew Rininsland’s another view.

To start with, we need install packages needed for data manipulation, which are shapefile for converting ShapeFile to GeoJSON, and topojson for converting GeoJSON to TopoJSON.

npm install -g shapefile ndjson topojson ndjson-cli

I used US Census Bureau published 2016 States Shapefiles and unzip it into my local directory.

shp2json cb_2016_us_state_5m.shp -o cb_2016_us_state_5m.json
geo2topo cb_2016_us_state_5m.json > cb_2016_us_state_5m.topo.json

For just a quick check, the above two commands would suffice to convert raw shapefiles into TopoJSON file. If you check the size of each file, it is not hart to find out the TopoJSON is only about 70% of original GeoJSON file.

alt text

Usually, it is not optimal to take advantage of TopoJSON’s capability to meet different particular needs for D3. We will deep dive to test a few ways of optimizing the file convension.

First of all, we convert the raw data into newline-delimited features with one feature per line for human-beings easy to read and let us to use convenient ndjson-cli tool.

To start with, we first rely on the newline-delimited file to convert into TopoJSON for benchmarking.

shp2json -n cb_2016_us_state_5m.shp > cb_2016_us_state_5m.ndjson
geo2topo -n cb_2016_us_state_5m.ndjson > cb_2016_us_state_5m.topo1.json

Benchmarking TopoJSON:

Then, we can take this benchmarking TopoJSON file by quantizing and simplying.

Quantizing is basically reducing coordinate precision. It is implemented by topoquantize with option as numbers. Indicated by TopoJSON API, it is typically powers of ten. The bigger number is, the more precise.

topoquantize 1e5 < cb_2016_us_state_5m.topo1.json > cb_2016_us_state_5m.topo2.json

Quantizing:

Simplying is basically reducing the number of nodes used to represent arcs. It is implemented by toposimplify by -p option. Opposite from topoquantize, the value should be from 0 to 1 and the smaller it is, the more precise. f just removes detached rings that are smaller than the simplification threshold after simplifying.

toposimplify -p 1e-1 -f < cb_2016_us_state_5m.topo2.json > cb_2016_us_state_5m.topo3.json

Simplying:

The size of each data conversion is as follows:

alt text

It is not hard to discover that by Quantizing the file, not only does the file size decrease tremendously for fast rendering, but also the quality of visualization is kept.


Reference:

(1). TopoJSON API, https://github.com/topojson/topojson.
(2). The GeoJSON Specification (RFC 7946), https://tools.ietf.org/html/rfc7946.
(3). More than you ever wanted to know about GeoJSON, https://macwright.org/2015/03/23/geojson-second-bite.
(4). The TopoJSON Format Specification, https://github.com/topojson/topojson-specification.
(5). How To Infer Topology, https://bost.ocks.org/mike/topology/.
(6). Spatial data on a diet: tips for file size reduction using TopoJSON, http://zevross.com/blog/2014/04/22/spatial-data-on-a-diet-tips-for-file-size-reduction-using-topojson/.