**
Did you know that you can navigate the posts by swiping left and right?
**

d3 javascript json

During this post, I will go through from the basics of GeoJSON and TopoJSON to comparing the difference and improvement from one to another and finally use simple examples to illustrate how to optimize the size of TopoJSON by *Quantizing* and *Simplying* without losing the quality of data visualization.

Based on 2015 IETF, *the Internet Engineering Task Force*, GeoJSON is defined as a JSON format for encoding data about geographic features. GeoJSON could represent a region of space (a *Geometry*), a spatially bounded entity (a *Feature*), or a list of Features (a *FeatureCollection*). GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Features in GeoJSON contain a Geometry object and additional properties, and a FeatureCollection contains a list of Features. A *Feature* consists of *Geometry* and additional elements and a *FeatureCollection* is just an array of *Feature* objects.

A *Geometry* object consists of a *type* and a collection of coordinates which defines the position of subject of *type*. The components start with simple units: *Point* for one dimension, *LineString* for two dimensions, and *Polygon* for three dimensions. The complications of GeoJSON are all based on any of these three types.

**Point**

*Point* is just a simple point defined by its coordinates of position by the convention order longitude and latitude.

```
{ "type": "Point", "coordinates": [0, 0] }
```

**LineString**

*LineString* is the line with starting point and ending point.

```
{ "type": "LineString", "coordinates": [[0, 0], [10, 10]] }
```

**Polygon**

*Polygon* is more complicated than *Point* and *LineString* since it has shapes. There are two types of *Polygons*. One comes without holes.

```
{
"type": "Polygon",
"coordinates": [
[
[0, 0], [10, 10], [10, 0], [0, 0]
]
]
}
```

And the other comes with holes.

```
{
"type": "Polygon",
"coordinates": [
[ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ], // exterior boundary
[ [100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2] ] // interior boundary
]
}
```

On top of these three basic units, we have three extensions of each type by adding multiples onto each type.

**MultiPoint**

An array of *Point* objects.

```
{
"type": "MultiPoint",
"coordinates": [
[100.0, 0.0], [101.0, 1.0]
]
}
```

**MultiLineString**

An array of *LineString* objects.

```
{
"type": "MultiLineString",
"coordinates": [
[ [100.0, 0.0], [101.0, 1.0] ],
[ [102.0, 2.0], [103.0, 3.0] ]
]
}
```

**MultiPolygon**

An array of *Polygon* objects.

```
{
"type": "MultiPolygon",
"coordinates": [
[
[ [102.0, 2.0], [103.0, 2.0], [103.0, 3.0], [102.0, 3.0], [102.0, 2.0] ]
],
[
[ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0], [100.0, 1.0], [100.0, 0.0] ],
[ [100.2, 0.2], [100.8, 0.2], [100.8, 0.8], [100.2, 0.8], [100.2, 0.2] ]
]
]
}
```

**GeometryCollection**

The above six types of geometry could be combined together to create *GeometryCollection*.

```
{ "type": "GeometryCollection",
"geometries": [
{ "type": "Point",
"coordinates": [100.0, 0.0]
},
{ "type": "LineString",
"coordinates": [ [101.0, 0.0], [102.0, 1.0] ]
}
]
}
```

All the seven types of Geometries, Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection, are case-sensitive. The order convension of coordinates follow the longitude-latitude-elevation order.

A *Feature* is an object of collection of geometry and additional properties and both geometry and properties are required by *Feature*. Specifically, *Feature* will have *type* property with value *Feature*, *geometry* property as well as *properties* property.

```
{
"type": "Feature",
"geometry": {
"type": "LineString",
"coordinates": [
[100.0, 0.0], [101.0, 1.0]
]
},
"properties": {
"prop0": "value0",
"prop1": "value1"
}
}
```

Not surprisingly, *FeatureCollection* is just an array of *Feature* which has *type* property with value *FeatureCollection* and *features*.

```
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [0, 0]
},
"properties": {
"name": "null island"
}
}
]
}
```

GeoJSON may have a member called “bbox”, bounding box which contains information on the coordinate range for its geometries, features or featurecollections. It follows the convension of longitude-latitude-elevation min-max order going from left, bottom, right to top counter-clockwise which defines the boundary of underlying geo-information.

```
{
"type": "Feature",
"bbox": [-10.0, -10.0, 10.0, 10.0],
"geometry": {
"type": "Polygon",
"coordinates": [
[
[-10.0, -10.0],
[10.0, -10.0],
[10.0, 10.0],
[-10.0, -10.0]
]
]
}
}
```

TopoJSON is an extension of GeoJSON which eliminates redundancy to allow geometries to be stored more efficiently.

According to TopoJSON Format Specification, it must contain a “type” member, usually “Topology”, a “objects” member, itself another object named “example”. Geometry object *Point* and *MultiPoint* must have a “coordinates” member while *LineString*, *Polygon*, *MultiLineString* and *MultiPolygon* must have a “arcs” memeber. Both “coordinates” and “arcs” are always an array. “bbox” is optional as well as “transform” which is used to construct “quantized” topology. I use the simple examples in the GeoJSON session to illustrate TopoJSON.

```
//Point
{"type":"Topology","objects":{"example":{"type":"Point","coordinates":[0,0]}},"arcs":[],"bbox":[0,0,0,0]}
//LineString
{"type":"Topology","objects":{"example":{"type":"LineString","arcs":[0]}},"arcs":[[[0,0],[10,10]]],"bbox":[0,0,10,10]}
//Polygon
{"type":"Topology","objects":{"example":{"type":"Polygon","arcs":[[0]]}},"arcs":[[[0,0],[10,10],[10,0],[0,0]]],"bbox":[0,0,10,10]}
//MultiPoint
{"type":"Topology","objects":{"example":{"type":"MultiPoint","coordinates":[[100,0],[101,1]]}},"arcs":[],"bbox":[100,0,101,1]}
//MultiLineString
{"type":"Topology","objects":{"example":{"type":"MultiLineString","arcs":[[0],[1]]}},"arcs":[[[100,0],[101,1]],[[102,2],[103,3]]],"bbox":[100,0,103,3]}
//MultiPolygon
{"type":"Topology","objects":{"example":{"type":"MultiPolygon","arcs":[[[0]],[[1],[2]]]}},"arcs":[[[102,2],[103,2],[103,3],[102,3],[102,2]],[[100,0],[101,0],[101,1],[100,1],[100,0]],[[100.2,0.2],[100.8,0.2],[100.8,0.8],[100.2,0.8],[100.2,0.2]]],"bbox":[100,0,103,3]}
//GeometryCollection
{"type":"Topology","objects":{"example":{"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[100,0]},{"type":"LineString","arcs":[0]}]}},"arcs":[[[101,0],[102,1]]],"bbox":[100,0,102,1]}
//Feature
{"type":"Topology","objects":{"example":{"type":"LineString","arcs":[0],"properties":{"prop0":"value0","prop1":"value1"}}},"arcs":[[[100,0],[101,1]]],"bbox":[100,0,101,1]}
//FeatureCollection
{"type":"Topology","objects":{"example":{"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[0,0],"properties":{"name":"null island"}}]}},"arcs":[],"bbox":[0,0,0,0]}
```

As we can find out, all TopoJSON counterparties have a “type” member with value “Topology”. The topology objects are all with “example” object and the differences start with it by different types of geometries. For *Point* and *MultiPoint*, they have both “coordinates” and “arcs” members although “arcs” is always null since the position information is carried over by “coordinates” while the rest *LineString*, *Polygon*, *MultiLineString* and *MultiPolygon* only have “arcs” member.

In reality, we need to create our own TopoJSON file for D3’s consumption from raw ShapeFile formats. I will go through steps borrowed from Bostock’s series of blogs 1, 2, 3 and 4, and Ændrew Rininsland’s another view.

To start with, we need install packages needed for data manipulation, which are **shapefile** for converting ShapeFile to GeoJSON, and **topojson** for converting GeoJSON to TopoJSON.

```
npm install -g shapefile ndjson topojson ndjson-cli
```

I used US Census Bureau published 2016 States Shapefiles and unzip it into my local directory.

```
shp2json cb_2016_us_state_5m.shp -o cb_2016_us_state_5m.json
geo2topo cb_2016_us_state_5m.json > cb_2016_us_state_5m.topo.json
```

For just a quick check, the above two commands would suffice to convert raw shapefiles into TopoJSON file. If you check the size of each file, it is not hart to find out the TopoJSON is only about 70% of original GeoJSON file.

Usually, it is not optimal to take advantage of TopoJSON’s capability to meet different particular needs for D3. We will deep dive to test a few ways of optimizing the file convension.

First of all, we convert the raw data into newline-delimited features with one feature per line for human-beings easy to read and let us to use convenient **ndjson-cli** tool.

To start with, we first rely on the newline-delimited file to convert into TopoJSON for benchmarking.

```
shp2json -n cb_2016_us_state_5m.shp > cb_2016_us_state_5m.ndjson
geo2topo -n cb_2016_us_state_5m.ndjson > cb_2016_us_state_5m.topo1.json
```

Then, we can take this benchmarking TopoJSON file by quantizing and simplying.

*Quantizing* is basically reducing coordinate precision. It is implemented by *topoquantize* with option as numbers. Indicated by TopoJSON API, it is typically powers of ten. The bigger number is, the more precise.

```
topoquantize 1e5 < cb_2016_us_state_5m.topo1.json > cb_2016_us_state_5m.topo2.json
```

*Simplying* is basically reducing the number of nodes used to represent arcs. It is implemented by *toposimplify* by *-p* option. Opposite from *topoquantize*, the value should be from 0 to 1 and the smaller it is, the more precise. *f* just removes detached rings that are smaller than the simplification threshold after simplifying.

```
toposimplify -p 1e-1 -f < cb_2016_us_state_5m.topo2.json > cb_2016_us_state_5m.topo3.json
```

The size of each data conversion is as follows:

It is not hard to discover that by *Quantizing* the file, not only does the file size decrease tremendously for fast rendering, but also the quality of visualization is kept.

(1). TopoJSON API, *https://github.com/topojson/topojson*.

(2). The GeoJSON Specification (RFC 7946), *https://tools.ietf.org/html/rfc7946*.

(3). More than you ever wanted to know about GeoJSON, *https://macwright.org/2015/03/23/geojson-second-bite*.

(4). The TopoJSON Format Specification, *https://github.com/topojson/topojson-specification*.

(5). How To Infer Topology, *https://bost.ocks.org/mike/topology/*.

(6). Spatial data on a diet: tips for file size reduction using TopoJSON, *http://zevross.com/blog/2014/04/22/spatial-data-on-a-diet-tips-for-file-size-reduction-using-topojson/*.