- Recipe 6: Serialize data as JSON data
- Recipe 7: Deserialize JSON data
- Recipe 8: Serialize array in JSON
- Recipe 9: Deserialize JSON array data
- Recipe 10: Deserialize data stream
- Summary
Recipe 6: Serialize data as JSON data
In Avro Cookbook : part I, if you open the file /tmp/log
created by recipe 3, you would find that it is definitely not a human readable text format. Avro provides the encoder/decoder mechanism which helps to serial the data to text format as JSON data.
Actually, if I want to serial the POJOs to JSON data, I would rather use Google Gson. Anyway, this is a post about Avro, right ?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Refer to the Avro Cookbook : Part II for the what class LogEntry3
looks like. Here is the output:
1 2 |
|
Recipe 7: Deserialize JSON data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
If you look at the code carefully, you will find several interesting things.
First, there is no explicit separator between every JSON record. That means the Avro JSON decoder can decode the JSON data in the form of stream. This could be very helpful when you have to deserialize a whole bunch of JSON records without any explicit separator between records.
Second, when parsing the JSON data, the generic reader DatumReader<GenericRecord>
instead of specific reader DatumReader<LogEntry3>
is used. I tried to use the specific reader but it was not able to work with the error:
org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697) at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441) at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155) at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193) at org.apache.avro.reflect.ReflectDatumReader.readField(ReflectDatumReader.java:230) at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
It turned out that the Avro JSON decoder can only parse JSON data like this
1 2 |
|
but will not work on data like this
1 2 |
|
After googling the issue, I found this: Issue writing union in avro. So my suggestion is that don’t use Avro for json serialization and deserialization if you have other choices.
Recipe 8: Serialize array in JSON
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
In the code above, not code generation is required, the LogEntry3
is just a normal Java class. The serialized data would be
1
|
|
However, if the type of element of the array is generated from schema defintion(See Recipe 2), the output would be different:
1
|
|
Still, I will not choose Avro to serialize data to JSON if I can use gson. However, this recipe is intended to demonstrate how to serialize a array without code generation from predefined schema.
Recipe 9: Deserialize JSON array data
1 2 3 4 5 6 7 8 9 10 |
|
Note, we can use only generic reader, have not figured out how to use a specific reader yet. Maybe it would be ok for the binary encoder to use specific reader.
Recipe 10: Deserialize data stream
The word stream means that the size of the source data is unknown. For example. if you want to serialize data like this
[ { "name":"Jeff", "resource":"readme.md", "ip":"192.168.5.1" }, { "name":"John", "resource":"readme.txt", "ip":"192.168.5.2" } ][ { "name":"Joe", "resource":"readme.md", "ip":"192.168.5.3" }, { "name":"James", "resource":"readme.txt", "ip":"192.168.5.4" } ]
The data have several unusual characteristics due to which gson can not be simply applied.
* The whole data is not a valid JSON. Instead it is JSON records set. Besides, the size of data could be very large.
* There is no explicit separator between records. If the records are separated by character such as \n
, then one record can be read and parsed easily.
The format of the data is so poor which should be avoided in practice. However, sometimes, we ourselves are just the data consumers. We can parse the data with Avro like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
Summary
This post is the last part of the three part series on Avro. To be honest, I myself use Avro rarely and I am not a Avro expert, thus the code examples may not be the most appropriate. They are just intended to help the starters to play with Avro quickly. Anyone who uses Avro frequently should spend more time on looking deeper into the Avro framework. It would be great if they are helpful to you!