5/08/2012

Ad-hoc camera sharing from Android to multiple parallel viewers on Android and Desktop -- part 2 of 2



As part of our ongoing "Ad-hoc Mobile Ecosystem" project we made a research on the “Ad-hoc camera sharing” use-case.

Due to our user centric concept, where we try to minimize switches in a technical context, we decided to integrate ad-hoc real time camera sharing from other mobile users into Augmented Reality view as a next building block. 

This is part two of two.

Result of Reviews and Evaluation steps

(1) SipDroid based steps:

  • First iteration of our  GitHub RTSP-Camera 
  • running with H263-1998 packetizer from SipDroid 
  • new implemented feature is to send out the same payload to a list of RTP/UDP clients in parallel
  • First implementation of our "Android embedded RTSP Video Server" implementation (part of GitHub RTSP-Camera project)

Result:

  • VLC is able to show the video stream with an estimated 1.5 seconds delay
  • Android Media API “VideoView” starts with a buffering of an incoming video stream which needs 10-12 seconds! 
  • There are no configuration options or tricks to circumvent this!
This delay is not acceptable within our use-case.

(2) SpyDroid based steps:

  • Next step based on H264 encoding to evaluate VideoView delay
  • Integrated the H264 packetizer from SpyDroid 
  • Adopted H264 packetizer to our multi-client  RTSP Video Server RTP/UDP sender pattern

Results:

  • VideoView RTSP does not support H264 decoding
  • VLC needs detailed “sprop-parameter-sets” in SDP file to define SPS (Sequence Parameter Sets) & PPS (Picture Parameter Sets) according to level / profile / width / height / framerate.
  • Running VLC with debugging enabled is essential to understand the different errors
    • VideoLAN\VLC\vlc.exe --extraintf=http:logger --verbose=6 --file-logging --logfile=vlc-log.txt
  • For example, a missing or wrong “sprop-parameter-sets” in SDP ends up with continuous:
    • [112aded0] packetizer_h264 packetizer warning: waiting for SPS/PPS

(3) IP-Webcam based steps:

  • Next step based on MJPEG encoding and streaming through HTTP server
  • GitHub HTTP-Camera project

Results:

  • Straight forward implementation as an alternative to RTSP stream based approach, but with shortcomings regarding bandwidth and client support
  • HTML <image> tag in modern desktop browsers (Firefox, Chrome, …) do support streamed MJPEG rendering.
  • HTML <image> tag in Android WebView doesn’t support streamed MJPEG rendering!

(4) IMSDroid based steps:

  • Test a streaming session between two Android phones with local installed OpenSIP serverResult:
    • Delay is under one second

Results:

  • Hold back as a fall back if Orange code base will fail

(5a) Orange RCS/IMS H263 steps:

We preferred Orange Labs native encoding project over IMSDroid due to their maturity level, code structure and active maintenance.
  • Extracted the H263-2000 native encoder for RTSP-Camera
  • Adopted it to our multi-client  RTSP Video Server RTP/UDP sender pattern
  • Extracted H263-2000 native decoder for RTSP-Viewer

Results:

  • Smooth streaming from VLC and our Android RTSP-Viewer app

(5b) Orange RCS/IMS H264 steps:

  • Extracted the H264 native encoder for RTSP-Camera
  • Adopted it to our multi-client RTSP Video Server RTP/UDP sender pattern
  • Extracted H264 native decoder for RTSP-Viewer

Results:

The main difference of H264 to H263 is the need for SPS/PPS information. There are two options for sending that parameters, named:
  • out band, which means either encoded in SDP “sprop” or at the very beginning of the stream
  • in band, which means with each key-frame (IDR)
The native H264 decoder from PV OpenCORE does not support SDP files, at least not with the JNI interface given from Orange. It relies instead of SPS/PPS embedded as NAL units within the stream.

To switch the default implemented “out band” parameters to “in band” was the result of a very deep debugging and learning session. Our use-case has to support “mid-stream” starting of decoder, which means that a client wants to join a running stream. This is different to Orange supported use-case of an initiated session between two partners to share their view. Last one works with one time SPS/PPS package (out band), our use-case failed.

Interesting enough the underlying PV OpenCORE code supports a parameter to switch exactly that handling, called “out_of_band_param_set”.

The obvious Internet research upfront ended with NULL sourcecode snippets which shows an example or project with the right usage of “in band” parameters.

To change the logic on our own of
  1. encoding and continuous NAL unit SPS/PPS injection before each NAL IDR unit and 
  2. to bring decoding with in band SPS/PPS to life 
was a two day work.

In the end we found the right encoder logic to built the right order of NAL units and we found a bug deep down in C++ decoding sources. This convinced us, that this was the reason for "NULL code snippets research", because we are the first which bring this use-case to life based on this library.

At the same time it becomes clear that Android Media APIs based on the same Stagefright / OpenCORE libraries are not able to support mid-stream decoding.

In our case VLC and Android RTSP-Viewer are now enabled to start rendering mid-stream of an H264 encoded stream.

Summary

We learned the pros & cons of all the different approaches through this project. Based on the lessons learned we are now able to estimate risk and effort for further steps accurate. Even more we are able to discuss architectures and evaluate other solutions now.

The confusion that is still around when mobile video-conferencing is discussed on the net, is mostly cleared for us now.

A critical element of our development approach was the very challenging low level debugging which was necessary for that project. For example, none of the extracted encoder solutions or TCP/UDP based communications worked instant. Every evaluation and coding step needed serious debugging and understanding of communication definitions and protocol stack.

"Ad-hoc Expertise" has been proven again as a vital part of our core competencies.

We solved every single encoder / decoder extraction and adoption to our RTSP Video Server multi-client framework. But to get there we had to learn following new technologies "on the fly":
  • RFCs (RTSP, H263/H264 over RTP, H263 and H264 de/encoding, ...),
  • TCP/UDP/RDP/H26x protocol analysis through Wireshark, 
  • Native C++ libraries debugging based on Android NDK

Our working proof-of-concept code skeleton for the “Ad-hoc video sharing” use-case (with parts of the intermediate steps, like Android Media API based versus native JNI encoding/decoding)  is provided for the benefit of all under GPLv3 on GitHub:

References

Ad-hoc camera sharing from Android to multiple parallel viewers on Android and Desktop -- part 1 of 2

As part of our ongoing "Ad-hoc Mobile Ecosystem" project we made a research on the “Ad-hoc camera sharing” use-case.

Due to our user centric concept, where we try to minimize switches in a technical context, we decided to integrate ad-hoc real time camera sharing from other mobile users into Augmented Reality view as a next building block.

This is part one of two.

The camera sharing should be available on demand to multiple viewers in parallel, which leads us to a new "Android Embedded Video Server" capability based on Real Time Sharing Protocol (RTSP) with streaming over Real Time Protocol (RTP).

We have to make a clear differentiation between existing solutions for "Video Conference" and our new "Embedded Video Server" architecture, to clarify why the "Embedded Video Server" is the one, we decided to build our solution on in our given context of an "Ad-hoc Mobile Ecosystem".

Video Conference Architecture

  • Centralized video server which 
    • receives multiple participant camera streams, 
    • merges all streams into one video conference view 
    • and delivers that merged video back to all participants.
  • "Call" Sharing pattern: 
    • Additional conference partners need to be actively "Called" into the conference from the conference owner.
  • A typical implementation of that "Call" is the Session Initiation Protocol (SIP), which is an active coupling of two or more conference participants.
  • A typical implementation for stream delivery is Real Time Protocol (RTP).

Embedded Video Server Architecture

  • Decentralized video servers, embedded within each Android mobile as an App
  • "Pub/Sub" Sharing pattern : 
    • Decoupled sender/client relationship. 
    • A new client can subscribe to all published streams on demand, without interfering with other participants in that stream.
  • A typical provider for the Pub/Sub infrastructure is XMPP.
  • The implementation for a stream subscription is the Real Time Sharing Protocol (RTSP)
  • The implementation for stream delivery is  Real Time Protocol (RTP).

Because we cannot find a ready to use Android feature nor an Open Source project ready to use, but a lot of uncertainty, questions and  partial successes documented over the last years, we started to investigate existing solutions and to compose a solution and implement the missing links on our own.

We don’t want to point you straight to our solution on GitHub, but document instead our approach and the milestones to our solution.

We want to demonstrate, that without existing Open Source projects we wouldn't have a solution at all. But even with all the resources around we still need a very high level of knowledge and understanding for all the related technologies to fill up the missing links.

Nobody is able to have all that knowledge available in a small team of two. But an evaluation of following list of Enterprise level Open Source projects, based on a stack of protocols and standards will need such skills.

To built up “Ad-hoc Expertise” at former unknown topics (technical, standards, architecture, ...) is one essential part of our core competency which makes us a perfect partner for your demanding innovative projects.

Use-case summary

Ad-hoc camera sharing from Android (provider) 
to multiple parallel on-demand viewers (consumers)
on Android Apps and PCs
based on a decoupled Publish & Subscribe pattern 
between provider and consumers

Requirements

Quality

  • QCIF (176 x 144) and CIF  (352 x 288) resolution
  • minimal delay for video playback (near realtime streaming)
  • support for WLAN bandwidth

Open Standards

  • H263 / H264 encoder and decoder for video streams
  • RTSP server for multiple parallel on-demand streaming support (http://www.ietf.org/rfc/rfc2326.txt)
  • Support for  mid stream decoder start (technical background: continuous send of in band parameters SPS/PPS)
  • H263 and H264 over RTP/UDP
    • RTP Payload Format for H.263 Video Streams (http://www.ietf.org/rfc/rfc2190.txt)
    • RTP Payload Format for H.264 Video Streams (http://www.ietf.org/rfc/rfc3984.txt)
  • MJPEG over HTTP

Reuse of Open Source to minimize development time and risk

Android embedded Media API

  • Camera
  • MediaPlayer
  • MediaRecorder
  • VideoView
  • WebView

Desktop / Browser embedded media capabilities

  • HTML <image> tag, capable to consume and display MJPEG
  • HTML5 <video> tag, capable to consume and display RTSP streaming
  • VLC Media player (http://www.videolan.org/) which is Open Source itself and built with a strong support for video/audio codecs. Important for our scenario it is capable to consume and display RTSP streaming with different encodings.

Open Source review candidates from

  • Video conferencing / telephony
  • Web Cameras
  • PC and Android browser support for <image> and <video> tag

Research and Evaluation overview

Our main goal was to compose two functional apps. An Android RTSP-Camera and an Android RTSP-Viewer. Only essential code should be extracted from other projects to provide a working skeleton for further development. Following chronological list of projects displays the most important milestones of our research, evaluation and composition of our own solution.

(0) RTSP Server / Client

  • Real Time Streaming Protocol, or RTSP, is an application-level protocol for control over the delivery of data with real-time properties (http://www.ietf.org/rfc/rfc2326.txt)
  • RTSPClientLib (http://code.google.com/p/rtsplib-java/)

(1) SipDroid (http://code.google.com/p/sipdroid/)

  • Video encoding done through Android Media API: “MediaRecorder” (3GPP / H263-1998)
  • H263-1998 packetizer for RTP
  • Video decoding done through Android Media API: “VideoView” started with a RTSP:// url

(2) SpyDroid (http://code.google.com/p/spydroid-ipcamera/)

  • Video encoding done through Android Media API: “MediaRecorder” (3GPP / H264)
  • H264 packetizer for RTP
  • RTP wrapped up with Flash Video (FLV)
  • Video decoding done through Flash Player

(3) IP-Webcam (https://play.google.com/store/apps/details?id=com.pas.webcam)

  • Closed source but main idea is to send a HTTP based “multipart/x-mixed-replace” stream sending full frame JPEG which can be captured with “onPreviewFrame()” from Android Camera
  • with help of some additional resources a HTTP server, sending the right protocol was a simple task (http://www.damonkohler.com/2010/10/mjpeg-streaming-protocol.html)

(4) IMSDroid (http://code.google.com/p/imsdroid/)

  • Native encoder and decoder provided by a native JNI sub project
  • An IMS (IP Multimedia Subsystem) proof-of-concept spin-off from Doubango project (http://www.doubango.org/)
  • Native packetizer for RTP
  • Native renderer to Android Media API “VideoSurface”

(5) android-rcs-ims-stack (http://code.google.com/p/android-rcs-ims-stack/)

  • Part of RCS-e initiative (http://en.wikipedia.org/wiki/Rich_Communication_Suite#RCS-e)
  • provided and maintained by French Telco “Orange Labs”
  • Native encoder and decoder provided by a native JNI sub project
  • Native renderer to Android Media API “VideoSurface”
  • JNI sub project is directly based on Android internal implementation (PacketVideo OpenCORE, Stagefright), which is hidden from Java APIs. Orange made a small JNI wrapper for their very own use-cases.
Read on with part two in the next blog entry.