[Yellow Brackets]

JSON with duplicate keys

October 31, 2020 | 7 min read

[

The other day I came across a simple but interesting question. Is the following JSON valid or not?

{
    "foo": "bar",
    "foo": "qux"
}

At first glance my immediate response was ‘of course it’s not’. Turns out that the answer is a bit more complicated than that. In this post we’ll explore the duplicate keys in a JSON and how I ended up scratching my head for a while because of it.

How it started

At the time this happened I was responsible for some Kafka consumers - written in Kotlin - that would receive simple JSON strings, process them and forward them to various downstream systems.

The JSON strings also went through some JSON schema validation library which accepts an org.json.JSONObject that needs to be constructed from the actual JSON string:

import org.everit.json.schema.loader.SchemaLoader
import org.json.JSONObject
import org.json.JSONTokener

val jsonSchema = "{ ... }"
val schema = SchemaLoader.load(JSONObject(JSONTokener(jsonSchema)))

val jsonMessage = "{ ... }"
schema.validate(JSONObject(JSONTokener(jsonMessage)))

The validation worked nicely up until the point when JSONs with duplicate keys started to appear in the Kafka topic - by another team’s mistake - and suddenly our consumers blew up at the schema.validate method call with the following error:

org.json.JSONException: Duplicate key "foo"

Turns out when JSONTokener receives a JSON with duplicate keys, it throws an error.

Whenever we receive bogus data like this we usually hop onto to our Lenses Kafka monitoring tool and we check on the whole message to see the full picture.

// The JSON we saw in the Kotlin Kafka consumer
{
    "foo": "bar",
    "foo": "baz"
}

// The JSON we saw on the Lenses monitoring tool
{
    "foo": "baz"
}

On Lenses the duplication was nowhere to be found, what gives? Let’s have a look what the specs has to say on duplicate keys.

What does the specs say?

JSON object notation

The JSON.org website doesn’t seem to say anything about duplicate keys, however if we have a look at the RFC for JSON we can see the following description:

An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name. The names within an object SHOULD be unique.

Emphasis on SHOULD. The reality is that a JSON with duplicate keys is perfectly valid, although it doesn’t really makes sense because there’s no way to distinguish the duplicate keys on retrieval.

That is why each JSON handling library has a bit of leeway on how they handle this scenario. Let’s have a look how duplicate keys are handled with libraries used in Kotlin (Java) and in JavaScript.

JavaScript vs Kotlin (Java)

In JavaScript we don’t really use third party libraries to manipulate JSON, the most common approach of parsing is the JSON.parse function, which if encounters duplicate keys it removes all occurrences but the last.

const json = JSON.parse(`{"foo": "bar", "foo": "baz"}`)

json.foo === "baz" // "foo": "bar" was dropped

This is the same result in Lenses as well shown above. Now it makes sense why we don’t see the duplicate keys when checking them in the browser based monitoring tool - they are probably relying on the same JSON.parse implementation.

In Kotlin / Java I’ve used multiple libraries for parsing JSON. The first one was the JSON-java reference implementation which threw the JSONException upon providing duplicate keyed JSON.

A more robust implementation is the FasterXML Jackson library, which can be configured whether the parse should throw if duplicate keys are encountered (by default it doesn’t throw).

Since we didn’t have control over the duplicated keyed JSONs coming from Kafka we needed to swap the org.json.JSONTokener` with a more resilient parser provided by Jackson:

import org.everit.json.schema.loader.SchemaLoader
import org.json.JSONObject
import com.fasterxml.jackson.databind.ObjectMapper

val objectMapper = ObjectMapper()

fun parseJSON(json: String): JSONObject {
    return JSONObject(objectMapper.readValue(json, Map::class.java))
}

val jsonSchema = "{ ... }"
val schema = SchemaLoader.load(parseJSON(jsonSchema))

val jsonMessage = "{ ... }"
schema.validate(parseJSON(jsonMessage))

This way our Kafka consumer didn’t blew up anymore and it wasn’t blocked by the team providing the ‘faulty’ JSON.

Conclusion

The key takeaway here is that JSONs are allowed to have duplicate keys from the specs’ point of view, although it doesn’t really make sense. Whether the parser treats duplicate keys as errors depends entirely on each individual implementation:

  • The built-in JSON.parse in JavaScript doesn’t treat it as an error
  • The JSON-java library treats it as an error
  • The FasterXML Jackson library is configurable whether it should allow it (by default it does)

So if you need to handle JSONs in Kotlin / Java, I recommend the more resilient and configurable Jackson over JSON-java. I hope this piece of info helps someone out there and can spare them a bit of head scratching.

Cover photo source

]