DIS Technology & JPEG Push

History

JPEG push also referred to as Server Push is a technology pioneered by Netscape in 1995. It was originally implemented natively in the Netscape Navigator browser as a means of serving an image stream to the browser. Support for the format spread to other browsers including Firefox, Opera and Safari. Subsequently, email clients utilising these browser engines to draw HTML content would also support the format. The format has never been directly supported by Internet Explorer though there was rumoured to be a scriptless hack pre IE6 that is no longer functional.

Server Push is an extension to the MIME header which informs the browser that it should expect a series of images to be received. Images would typically be JPEG encoded though this is not a requirement of the format. Each frame would be encoded with a special inter-frame header indicating to the browser that the stream was not complete. The browser would continue accepting and drawing updated content until a specific end-marker was received or the connection to the server was broken.

The technology was most commonly used with Webcams to create live video streams. However, with dial-up connections being typical and computers of the time being relatively under-powered, performance could often be measured in seconds-per-frame rather than frames-per-second.

With broadband internet access becoming ubiquitous and the constantly increasing power of PCs and mobile devices the viability of JPEG push as a video delivery medium has improved dramatically. It is now possible to serve HD quality image streams at high frame rates.

DIS

Style Campaign's DIS server is a proprietary high performance/high concurrency server written entirely in C. It was designed from the ground up to serve dynamically generated content in the form of static JPEG images and JPEG image streams for display in email client programs.

DIS has several unique features to maximise performance of JPEG stream delivery:

The server monitors bandwidth availability by means of average packet delivery rate and can dynamically adjust compression and frame rate to maintain an optimal stream to the client. There is no buffering on the client so all frame timing is governed by the server to maintain smooth correctly timed playback.

The server utilises a specially modified JPEG encoder to allow direct writing of JPEG data to the socket, decreasing both latency and memory consumption on the server. This is an important feature allowing relatively high concurrency on modestly spec'ed hardware. JPEG encoded frames are never cached on the server and each frame is entirely unique. Encoded frames are written in slices as is typical for JPEG compression, and images are sliced to produce TCP slowstart compatible packet sizes (typical initial CWND 4380 bytes) for optimal network transport.

A more predictive* approach was considered but would have require buffering and result in a decrease in overall throughput. It would also increase the servers memory footprint while under load.

DIS has many dynamic capabilities including time based content, location based content, and personalization parameters including, but not limited to, name, email address and gender. Anything rendered to a single static frame can be rendered to the image stream and can be updated in real time.

DIS can provide a unique stream depending the requesting client type, default client targeted behaviours can be overridden with client specific scripts within the project. URL redirection can also be done based on the requesting client's device type and specific email program as well as any other available input parameters.

We experimented with streams generated from live video sources pushed to the server and that functionality remains within the engine. On a closed LAN this proved very effective and we were able to composit on the fly into the live stream. However, when using colocated servers within a distributed system, where uplink bandwidth is typically limited, pushing frames to the server proved sub optimal in real world scenarios.

DIS was originally conceived to run on dedicated hardware. A version of the server was built to run as a service which can be easily deployed on virtualised instances. This version of the server was tested on AWS. However, having compared performance against bare metal instances, there appears to be no significant benefit beyond the ability to deploy instances on demand. Overall cost of deployment would typically be greater for virtualised instances at the same scale.

Limitations

DIS is effectively a clientless technology, you cannot install plugins or scripts within the client environment (email program), This imposes severe limitations on strategies for controlling and optimising the stream. We cannot bait and switch the socket connection to implement MTU discovery for example, which is a typical technique in streaming media players. We are stuck with a dumb TCP/IP pipe for delivery.

JPEG push is not supported across all clients. We have implemented a fallback strategy that includes both animated GIFs and static JPEGs depending on client. As with the JPEG stream the GIF stream has variable compression properties. There are a choice of two quantising routines to provide the best possible image quality at the lowest bitrate depending on source. The GIF animation streams produced by DIS tend to be roughly 20% more efficient, for the same visual quality, as the GIF processing found in popular image editing software packages.

Despite efforts to minimise per frame file size, JPEG push is still significantly less efficient than video compression codecs such as h.264, the primary encoding format in the HTML5 specification. Bandwidth consumption when using JPEG push is therefore significantly greater. As DIS is a true live streaming technology with no caching on either client or server, bandwidth consumption can become a real issue for large scale deployment.

Being a raw image stream in JPEG format, there is no means to provide a sound channel. All DIS video streams are silent; though this may be seen as an advantage within the target environment. DIS does however, have the unique ability to provide subtitling in real time without the need to edit the source content.

The Future

While DIS was still under development we did consider the possibility of implementing h.264 encoding of the stream for clients were it was supported. With HTML5 support becoming more prevalent this appears a more attractive option. As DIS encodes all streams live, implementing a codec would prove relatively trivial. However, the h.264 encoder is encumbered with patent restrictions, so development for commercial deployment would require addition licensing costs.

Predictive* compression would also have been a possible future development, particularly in conjunction with h.264 encoding, which would result in smoother playback and less noticeable fluctuation in framerate/compression. This would be implemented as an option rather than default due to the performance implications stated above.

However, DIS is no longer under development and there are no plans to implement new features.

* Not all frames are created equal and the content of the image determines the effectiveness of the chosen compression level with respect to output size, predictive compression would involve analysing the image and adjusting compression rate accordingly to match available/target bandwidth. The specifics of implementation are beyond the scope of this document.