Did you know that you can navigate the posts by swiping left and right?

Understand JSON - Part 3: Parse JSON with R

March 10, 2016, August 06, 2016 | Comments

category: TECH
json r web

Besides the native support of JavaScript to parse JSON string into object and other JavaScript Libraries’ efforts (see a comprehensive list from JSON although a little out-dated), there are also some other languages with support to JSON data. In this section, I will go through R’s three packages (“rjson”, “RJSONIO” and “jsonlite”) and how they work with JSON data.

0. Install R Packages

Before digging into how each R package manage JSON objects, it is a prerequisite to install R packages first.

#### Install all three packages and their related packages.  

install.packages(c("rjson", "RJSONIO", "jsonlite", "tidyjson"), dependencies = TRUE)  

eg1 <- "[true,false,null]"
eg2 <- '{"a":true,"b":false,"c":null}'  

I also create two R vectors containing characters only to represent JSON objects for following use.

1. rjson

rjson was first implemented for R in 2007 by Alex Couture-Beil. It allows R users to convert JSON objects into R object and vice-verse. There are three functions available under rjson package: fromJSON, toJSON and newJSONParser.

(1a). fromJSON - From JSON to R

#### rjson
library(rjson)

a <- fromJSON( "[true, false, null]" )
a
## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] FALSE
## 
## [[3]]
## NULL
 
class(a)
## [1] "list"

b <- fromJSON( '{"a":true, "b":false, "c":null}' )
b
## $a
## [1] TRUE
## 
## $b
## [1] FALSE
## 
## $c
## NULL

class(b)
## [1] "list"  

(1b). toJSON - From R to JSON

A <- toJSON(a)
A
## [1] "[true,false,null]"

class(A)
## [1] "character"

A == "[true, false, null]"
## [1] FALSE

B <- toJSON(b)
B
## [1] "{\"a\":true,\"b\":false,\"c\":null}"
cat(B)
## {"a":true,"b":false,"c":null}

class(B)
## [1] "character"

B == '{"a":true, "b":false, "c":null}'
## [1] FALSE  

(1c). newJSONParser
It is used to convert a collection of JSON objects into R objects.


(1d). Methods Used for Implementation

c <- toJSON(c(1:1e5))
system.time( C1 <- fromJSON(c,method = "C") )
## user  system elapsed 
## 0.05    0.00    0.05 

system.time( C2 <- fromJSON(c,method = "R") )
## user  system elapsed 
## 92.45    0.41   93.38   

2. RJSONIO

RJSONIO started with a GitHub project by Duncan Temple Lang in 2010. It also provides facilities for reading and writing data in JSON. This allows R objects to be inserted into JavaScript/ECMAScript/ActionScript code and allows R programmers to read and convert JSON content to R objects. It could be used as an alternative to rjson package however it doesn’t use S4/S3 methods. Given this, RJSONIO is extensible, allowing others to define S4 methods for different R classes/types, as well as allowing the caller to specify a different callback handler. Unlike rjson package, RJSONIO package uses a C++ library - libjson, rather than implementing yet another JSON parser so that parsing would be faster than pure interpreted R code. There are three primary functions available in this package: fromJSON, toJSON and asJSVars.

(2a). fromJSON - Convert JSON content to R objects

#### RJSONIO
library(RJSONIO)

a <- fromJSON( "[true, false, null]" )
a
## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] FALSE
## 
## [[3]]
## NULL

class(a)
## [1] "list"

b <- fromJSON( '{"a":true, "b":false, "c":null}' )
b
## $a
## [1] TRUE
## 
## $b
## [1] FALSE
## 
## $c
## NULL

class(b)
## [1] "list"  

(2b). toJSON - Convert an R object to a string in JSON

A <- toJSON(a)
A
## [1] "[true,false,null]"

class(A)
## [1] "character"

A == "[true, false, null]"
## [1] FALSE

B <- toJSON(b)
B
## [1] "{\"a\":true,\"b\":false,\"c\":null}"
cat(B)
## {"a":true,"b":false,"c":null}

class(B)
## [1] "character"

B == '{"a":true, "b":false, "c":null}'
## [1] FALSE  

(2c). asJSVars - Serialize R objects as JavaScript/ECMAScript/ActionScript variables

cat(asJSVars( a = 1:10, myMatrix = matrix(1:15, 3, 5),qualifier = "protected", types = TRUE))
## protected a : Array = [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ] ;
##
## protected myMatrix : Array = [ [ 1, 4, 7, 10, 13 ],
##                              [ 2, 5, 8, 11, 14 ],
##                              [ 3, 6, 9, 12, 15 ] ] ;

3. jsonlite

jsonlite is commonly known to R community starting from a ‘fork’ of RJSONIO package, starting from 2013 but has been completely rewritten in recent versions. Like RJSONIO, it also provides functions, such as fromJSON() and toJSON() to convert between JSON data and R objects. It could also interact with web APIs, building pipelines and streaming data between R and JSON.

(3a). fromJSON and toJSON - Convert Data between R and JSON

library(jsonlite)

jsonlite_a1 <- jsonlite::fromJSON(eg1)
jsonlite_a1
## [1]  TRUE FALSE    NA

class(jsonlite_a1)
## [1] "logical"
is(jsonlite_a1)
## [1] "logical" "vector"

jsonlite_a2 <- jsonlite::fromJSON(eg1, simplifyVector = F)
jsonlite_a2
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] FALSE
## 
## [[3]]
## NULL

class(jsonlite_a2)
## [1] "list"
is(jsonlite_a2)
## [1] "list"   "vector"

jsonlite_b <- jsonlite::fromJSON(eg2)
jsonlite_b
## $a
## [1] TRUE
## 
## $b
## [1] FALSE
## 
## $c
## NULL  

class(jsonlite_b)
## [1] "list"
is(jsonlite_b)
## [1] "list"   "vector"  

jsonlite provides more options in fromJSON function. In the example above, jsonlite converts a JSON array into a R vector but after using arugments “simplifyVector = F”, the JSON array is converted into a R string. However, if it is a JSON object, the conversion will return a R object automatically.

jsonlite_A1 <- jsonlite::toJSON(jsonlite_a1)
jsonlite_A1
## [true,false,null]

class(jsonlite_A1)
## [1] "json"
is(jsonlite_A1)
## [1] "json"     "oldClass"

jsonlite_A2 <- jsonlite::toJSON(jsonlite_a2)
jsonlite_A2
## [[true],[false],{}] 

class(jsonlite_A2)
## [1] "json"
is(jsonlite_A2)
## [1] "json"     "oldClass"  

jsonlite_B <- jsonlite::toJSON(jsonlite_b)
jsonlite_B
## {"a":[true],"b":[false],"c":{}}

class(jsonlite_B)
## [1] "json"
is(jsonlite_B)
## [1] "json"     "oldClass"

jsonlite_B <- jsonlite::toJSON(jsonlite_b,null='null')
jsonlite_B
## {"a":[true],"b":[false],"c":null}

class(jsonlite_B)
## [1] "json"
is(jsonlite_B)
## [1] "json"     "oldClass"

jsonlite_B <- jsonlite::toJSON(jsonlite_b,null='list',pretty=T)
jsonlite_B
## {
##   "a": [true],
##   "b": [false],
##   "c": {}
## } 

class(jsonlite_B)
## [1] "json"
is(jsonlite_B)
## [1] "json"     "oldClass"

toJSON will convert R object to JSON and likewise, it provides more options for caller to make conversion explicitly, depending on the input class of R object. It also allows the output to print in a ‘pretty’ way.

(3b). serializeJSON and unserializeJSON - Convert Data between R and JSON Differently

Away from the class-based encoding way by fromJSON and toJSON pairs, serializeJSON and unserializeJSON twins implement a type-based encoding to convert data between R and JSON.

jsonlite_se_a1 <- jsonlite::serializeJSON(jsonlite_A1)
cat(jsonlite_se_a1)
## {"type":"character","attributes":{"class":{"type":"character","attributes":{},"value":["json"]}},"value":["[true,false,null]"]}

class(jsonlite_se_a1)
## [1] "character"

jsonlite_se_A1 <- jsonlite::unserializeJSON(jsonlite_se_a1)
jsonlite_se_A1
## [true,false,null]

class(jsonlite_se_a1)
## [1] "character"  

jsonlite_se_a2 <- jsonlite::serializeJSON(jsonlite_A2)
cat(jsonlite_se_a2)
## {"type":"character","attributes":{"class":{"type":"character","attributes":{},"value":["json"]}},"value":["[[true],[false],{}]"]}

class(jsonlite_se_a2)
## [1] "character"

jsonlite_se_A2 <- jsonlite::unserializeJSON(jsonlite_se_a2)
jsonlite_se_A2
## [[true],[false],{}] 

class(jsonlite_se_a2)
## [1] "character"

In the examples above, using serializeJSON could convert an R object into JSON and it captures the type, value and attributes of each storage type so the object can be restored almost perfectly from its JSON representation. The cost of it is the lengthiness, sometimes redundancy of the result.

Caveat: Besides the differences in encoding system between class-based (fromJSON and toJSON) and type-based (serializeJSON and unserializeJSON), there is another trivial difference when using them in terms of input file required for functions to work. fromJSON and toJSON are independent from each other so fromJSON can use any JSON file or the returned result from toJSON as input and so can toJSON do. However, unserializeJSON has to use JSON string created from serializeJSON as input.

bad_unse <- jsonlite::unserializeJSON(eg1)
## Error in switch(encoding.mode, `NULL` = NULL, environment = new.env(parent = emptyenv()),  : 
##  EXPR must be a length 1 vector

(3c). stream_in and stream_out - Streaming JSON input and output

As I mentioned before, JSON file could carry huge amount of data from web and it becomes one of its advantages. However, since R stores and processes all data in the memory, the power of JSON is bounded by the limit of specific R machines. In order to address this bottleneck, jsonlite package implements these two functions to process data over a http(s) connection, a pipe, even from a NoSQL database. However different from fromJSON and toJSON, the streaming requires the ndjson format.

library(MASS)
stream_out(cats, stdout())
## {"Sex":"F","Bwt":2,"Hwt":7}
## {"Sex":"F","Bwt":2,"Hwt":7.4}
## {"Sex":"F","Bwt":2,"Hwt":9.5}
## {"Sex":"F","Bwt":2.1,"Hwt":7.2}
## {"Sex":"F","Bwt":2.1,"Hwt":7.3}
## ...

library(curl)
con <- curl("https://jeroenooms.github.io/data/diamonds.json")
mydata <- stream_in(con, pagesize = 1000)
## opening curl input connection.
## Imported 53940 records. Simplifying into dataframe...
## closing curl input connection.

head(mydata)
##   carat       cut color clarity depth table price    x    y    z
## 1  0.23     Ideal     E     SI2  61.5    55   326 3.95 3.98 2.43
## 2  0.21   Premium     E     SI1  59.8    61   326 3.89 3.84 2.31
## 3  0.23      Good     E     VS1  56.9    65   327 4.05 4.07 2.31
## 4  0.29   Premium     I     VS2  62.4    58   334 4.20 4.23 2.63
## 5  0.31      Good     J     SI2  63.3    58   335 4.34 4.35 2.75

Besides the functionality of reading and writing data between JSON and R provided by all of these three packages, they all provide some other different functions in each of them. For example, jsonlite provides base64_dec and base64_enc to convert between raw vectors to text while the other two packages don’t have this function. Validating strings in JSON format is provided by RJSONIO (isJSONValid function) and jsonlite (validate) while rjson doesn’t have. jsonlite also provides the capability of re-formatting JSON file into: 1). structure with indentation added from prettify, 2). file by removing all unnecessary indentation and white spaces which is actually adopted by a lot of JavaScript libraries. In terms of parsing results, This paper gives readers a brief comparison between three packages which is also worthy reading it.


Reference:

(1). rjson reference manual, https://cran.r-project.org/web/packages/rjson/rjson.pdf.
(2). RJSONIO reference manual, https://cran.r-project.org/web/packages/RJSONIO/RJSONIO.pdf.
(3). jsonlite reference manual, https://cran.r-project.org/web/packages/jsonlite/jsonlite.pdf.
(4). tidyjson reference manual, https://cran.r-project.org/web/packages/tidyjson/tidyjson.pdf.
(5). A biased comparsion of JSON packages in R, https://rstudio-pubs-static.s3.amazonaws.com/31702_9c22e3d1a0c44968a4a1f9656f1800ab.html.
(6). Jeroen Ooms, 2014, The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects.