Java serialization is a most misused feature of the language. If you have a choice of not using it, do not use it. Otherwise, soldier on.
This article examines Java serialization from the perspective of the serialized stream, and particularly how various serialization approaches affect the content of the serialization stream. I find it helpful for my serialization design.
Java Serialization Stream Format
Java serialization stream is structured as follows:
Stream Header
A stream starts with the header of two short integers:
See: ObjectStream.writeStreamHeader
Primitives
Primitives are stored in its byte[] representation without any meta-data. As a result, reading and writing primitives must match exactly.
See: ObjectOutputStream#writeLong, #writeShort, #writeByte
Strings
Strings are specially structured, between primitives and full-blown objects. It starts with a marker constant, but does not need type descriptors as objects.
See: ObjectOutputStream#writeString
Objects
There are several variations of the serialization format of objects. In the most common and simple case, a serializable object extends the Serializable marker interface. For example,
public class Foo implements Serializable {
private static final long serialVersionUID = 8999613412220288700L;
private long size ;
private String name;
public Foo(long sizer, String name) {
this.size = size;
this.name = name;
}
// keeps deserialization happy
public Foo() { }
}
The serialization stream of the above object consists of two main sections:
Class descriptor, including class full name, serialization UID, serializable fields
Serial data
* Serialization constants defined in ObjectStreamConstants
See: ObjectStream#writeOrdinaryObject, #writeNonProxy, #writeSerialData
We can see that Java serialization mixes the meta-data and data together. The original motivation of Java serialization is ambitious: to serialize and transfer any Java object across the wire, and the recipient honors any object being sent over. Moveover, by preserving the meta-data as part of the payload, it tries to offer acoss-version compatibility: you can serialize an old version of Java object, and deserialize it in a new version, and vice versa. Unfortunately, this leads to many thorny and fatal problems:
The stream is not compact, as it includes the schema part of the payload. If performance is important to you, Java serialization should be handcrafted, or even better, avoided.
The scheme is a security minefield if the stream cannot be trusted.
Custom writeObject and readObject
The serialization specification supports custom read and write. We can change the class Foo as follows:
transient private long size;
transient private String name;
private void writeObject(ObjectOutputStream out) throws IOException {
out.defaultWriteObject();
out.writeLong(size);
out.writeUTF(name);
}
private void readObject(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException {
in.defaultReadObject();
size = in.readLong();
name = in.readUTF();
}
The methods defaultReadObject() and defaultWriteObject() are responsible for writing and reading serializable fields. In this example, as we make all existing serializable candidate fields transient, the default writeObject is a no-op.
The serialization stream of the object is as follows:
Our custom writeObject writes the serial data more efficiently for two reasons:
making all serializable fields transient reduces the size of the class descriptor (though such a use case is better served by Externalizable)
writeUTF is slightly more compact than write a String object.
In the end, it reduces the stream size from 93 to 62 bytes (with size = 10, name = “abc”).
Custom writeObject and readObject are typically used as an add-on to the Java default serialization. The default write and read take care of serializable fields, and the custom implementation is used to support non-serializable fields.
Of course, you can make all fields transient as in this example. Serialization is then the sole responsibility of your custom write and read. However, such a use case is better served by Externalizable.
Externalizable
Interface Externalizable explicitly shifts the serialization responsibility from the Java platform to the class author via the methods writeExternal and readExternal.
For example
public class Bar implements Externalizable {
private long size;
private String name;
…
@Override
public void writeExternal (ObjectOutputStream out) throws IOException {
out.writeLong(size);
out.writeUTF(name);
}
@Override
public void readExternal(java.io.ObjectInputStream in)
throws IOException, ClassNotFoundException {
in.defaultReadObject();
size = in.readLong();
name = in.readUTF();
}
}
A sample serialization stream is as follows:
The stream is almost the same as the previous custom Serializable one where all fields are marked as transient. If your class is designed to entirely take care of its serialization, you should use Externalizable.
Serialization Proxy
Serialization proxy is a pattern to provide better control of the serialization via a proxy object as defined by the method writeReplace. For example,
public class ProxFoo implements Serializable {
// No needed as the serialization is done by the proxy
// private static final long serialVersionUID = 7484655704105000312L;
private final long size;
private final String name;
public ProxFoo(long size, String name) {
this.size = size;
this.name = name;
}
// Default constructor no longer needed with proxy
// public ProxBar() { }
private void readObject(ObjectInputStream stream) throws InvalidObjectException {
throw new InvalidObjectException("Proxy required");
}
// provides the proxy object to be serialized
private Object writeReplace() {
return new SerializationProxy(this);
}
private static class SerializationProxy implements Externalizable {
private static final long serialVersionUID = 5726340402515774393L;
private long size;
private String name;
public SerializationProxy() { }
SerializationProxy(ProxFoo p) {
size = p.size;
name = p.name;
}
@Override
public void writeExternal(ObjectOutput out) throws IOException {
out.writeLong(size);
out.writeUTF(name);
}
@Override
public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
size = in.readLong();
name = in.readUTF();
}
// Constructs the expected type during deserialization
private Object readResolve() {
return new ProxFoo(size, name);
}
}
}
A sample stream of the class is as follows:
The benefits of the serialization proxy pattern include:
to avoid extralinguistic construction of an object, as readResolve can construct the deserialized type by a normal constructor
to better enforce invariants
to make it possible to deserialize final fields.
Object and Type References
While Java serialization is not a compact format, it does try to minimize the serialization overhead:
If an object is already serialized, it is then referred to by its handle
If a class is already encountered, the class descriptor is referred to by its handle.
A sample serialization structure of an object with a referenced class descriptor is as follows:
In other words, each object with a referenced descriptor spends extra 1+1+4+1=7 bytes for markers and the meta-data reference. Use List<Long> as an example. Serializing each long value takes 8 bytes, but we also need to serialize additional 7 bytes for the markers and meta-data. The scheme is very inefficient.
Serialization by Primitives
Based on what we have learned, when you handcraft serialization, you should aim to directly serialize the primitives of containing fields, instead of relying on the default object serialization. This has many advantages:
It helps you to serialize the logic data model instead of your implementation details, making it possible for you to change the class implementation later.
It is much more compact.
It is more secure.
It minimizes the scope of serialization, as it does not force the containing objects to implement Serializable.
Here is an example,
public class Container implements Externalizable {
private static final long serialVersionUID = 5174256763653270387L;
private List<Bar> bars;
public Container(List<Bar> bars) {
this.bars = (bars == null) ? Collections.emptyList() : bars;
}
//To make serialization happy
public Container() { }
public List<Bar> getBars() {
return bars;
}
@Override
public void writeExternal(ObjectOutput out) throws IOException {
int count = bars == null? 0 : bars.size();
out.writeInt(count);
if (count > 0) {
for (Bar bar : bars) {
// serialize Bar using primitive data
out.writeLong(bar.getSize());
out.writeUTF(bar.getName());
}
}
}
@Override
public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
int count = in.read();
bars = (count == 0) ? Collections.emptyList() : new ArrayList<>(count);
for (int index = 0; index < count; index++) {
long size = in.readLong();
String name = in.readUTF();
bars.add(new Bar(size, name));
}
}
}
Compared with the default scheme to serialize Bar objects directly, this approach reduces the stream size by 50%.
Summary
This article examines how serialization approaches impact the serialization stream, including
Default Serializable
Custom writeObject and readObject
Externalizable
Serialization proxy
Referenced objects and types
When you handcraft serialization, you should serialize via primitives.
Reference
Effective Java, 3rd Ed, Joshua Bloch
No comments:
Post a Comment