- Initialize the project structure
- Tweak the pom.xml
- Import the project into Intellij IDEA
- Recipe 2: Define a Schema
- Recipe 3: Serialize the Log Data to Disk File
- Recipe 4: Deserialize the Log Data from Disk File
Avro is a data serialization framework. It is an Apache project led by Doug Cutting who is also the author of several other open source projects such as Hadoop, Lucene. Recently I need to leverage Avro to serialize/deserialize some data, however, I found its document is too poor, at least too poor for newbies like me who don’t have much experience on data exchange format frameworks.
In fact, it is very easy to understand what Avro can do. It helps to convert Java objects into bytes and vice versa. The key information the framework needs to know is the format of the date, namely ‘Schema’ in Avro. In this article, I won’t spend any time on explaining what Avro is. ## Recipe 1: Create a Maven Avro Project Intellij IDEA is my favorite Java IDE. The free Community edition has less features than the commercial Ultimate edition, however, great experience may be gained when the free community IDEA works with Maven. They complete each other. So the examples in this article will use Maven and Intellij IDEA as the IDE. Besides, TestNG instead of JUnit will be used as the test framework.
Initialize the project structure
- Create an project with quickstart archetype:
1
|
|
Tweak the pom.xml
- Add Avro dependency:
1 2 3 4 5 |
|
- Use Avro Maven plugin
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
It should be noted that the directory ${project.basedir}/src/main/avro/
must be created even it is empty at first. It is used to place the Avro schema files. The whole pom.xml has been posted to github gist.
Import the project into Intellij IDEA
IDEA provides full support to Maven, so it is very easy to import the Maven project as a IDEA project. Click “Import Project” in the ‘Quick Start’ panel. I suggest enable the Maven Auto-Import feature of IDEA before completing the importing process.
Recipe 2: Define a Schema
Assume that you want to log every access of your server, to make it simple, we only define 3 attributes in a log entry, namely the username, resource and ip. So the schema can be defined as :
1 2 3 4 5 6 7 8 9 10 |
|
Save the content as ${project.basedir}/src/main/avro/LogEntry.avsc
. After running mvn compile
, a Java class me.jeffli.avrosamples.model.LogEntry
will be generated automatically thank to the Avro Maven plugin.
Recipe 3: Serialize the Log Data to Disk File
Assume we want to store the log data to a disk file /tmp/log
. The code snippet would be like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
Recipe 4: Deserialize the Log Data from Disk File
Assume you need to parse the log data from disk files /tmp/log
. Then the code snippet would be:
1 2 3 4 5 6 7 8 9 10 11 |
|