Tuesday 26 April 2011

New transport protocols for a better user experience

In the 1980s the telecom industry decided they needed a “broadband standards” and started defining some protocols under the project name “Asynchronous Transfer Mode”. Today people know ATM for something completely different because the project, which actually led to significant deployment in the telecom gears of various countries, came to a stop because of a competing technology called Internet Protocol.
A decade later IP, that did not require an access speed of 155 Mbit/s like ATM, started being deployed with a bitrate that on average did not even reach 3 orders of magnitude less the ATM’s. IP seemed to provide the level of speed that could make customers happy keeping the Plain Old Telephone System (POTS) in place while ATM required reaching millions of subscribers homes with optical fibers.

No one can blame telcos for trying to save trillions of dollars of optical fibers for the less costly Asymmetric Digital Subscriber Line (ADSL). The reality, though, is that our society is more and more video dependent but the fixed telecommunication infrastructure cannot provide the bandwidth that its users require. A lot of talk is being made these days around the “Next Generation Network” (NGN) acronym and, in due time, something is bound to come out, but very little prospects exist for the mobile network which is squeezed between a terrestrial broadcasting industry that sticks to its Ultra High Frequency (UHF) legacy while the need to carry video on mobile networks multiplies by the day.

Video is a strange beast. From time to time I regularly receive the question: How many bit/s are required to transmit video? My regular answer is: as many as you want, even no bits at all. I agree that some may see this answer as non-collaborative, but it contains a profound truth, namely that video is a really flexible beast because you can decide how many bit/s you use to transmit video. No matter how few bits you use, your correspondent will always see “something”.

Operators have exploited this feature to cope with the wide dynamics of networks characteristics. If transmitter is informed that receiver is unable to receive all the bits that it needs to decode a video, it can switch to a version of the video encoded at a lower bitrate. The user at the receiving side will see a less crisp picture and he may complain about the shortsightedness of telcos that did not invest in ATM (if they had done it, what would the phone bill be today?), but that is still better than a picture that keeps on freezing.

The problem is that operators has independently decided to use their own transmitter-receiver protocols. This was acceptable at a time when video was a past time of few, but it is no longer a solution today when video is so pervasive.

MPEG has spotted this problem and is close to releasing a new standard called DASH. The acronyms stands for Dynamic Adaptive Streaming over HTTP and is almost self-explanatory. The pervasive HyperText Transport Protocol is used to stream video but the bitrate used is dynamically adapted to network conditions using a standard protocol that any implementer can use to build interoperable solutions.

See a technical explanation at http://mpeg.chiariglione.org/technologies/mpeg-b/mpb-dash/index.htm

Leonardo Chiariglione

Wednesday 9 March 2011

SAF, the aggregation of LASeR and audiovisual material

SAF (Simple Aggregation Format) is part of the LASeR standard defining tools to fulfill all the requirements of rich-media service design at the interface between scene representation and transport mechanisms. SAF features the following functionality:
- simple aggregation of any type of media streams (MPEG or non-MPEG streams), resulting in a SAF stream with a low overhead multiplexing schema for low bandwidth networks,
- and possibility to cache SAF streams.

The result of the multiplexing of media streams is a SAF stream which can be delivered over any delivery mechanism: download-and-play, progressive download, streaming or broadcasting.
The purpose of the LASeR Systems decoder model is to provide an abstract view of the behaviour of the terminal. It may be used by the sender to predict how the receiving terminal will behave in terms of buffer management and synchronization when decoding data received in the form of elementary streams. The LASeR systems decoder model includes a timing model and a buffer model. The LASeR systems decoder model specifies:
- the conceptual interface for accessing data streams (Delivery Layer),
- decoding buffers for coded data for each elementary stream,
- the behavior of elementary stream decoders,
- composition memory for decoded data from each decoder, and
- the output behavior of composition memory towards the compositor.

Each elementary stream is attached to one single decoding buffer.
A multimedia presentation is a collection of a scene description and media (zero, one or more). A media is an individual audiovisual content of the following type: image (still picture), video (moving pictures), audio and by extension, font data. A scene description is constituted of text, graphics, animation, interactivity and spatial, audio and temporal layout. The sequence of a scene description and its timed modifications is called a scene description stream. A scene description stream is called a LASeR Stream.
Modifications to the scenes are called LASeR Commands. A command is used to act on elements or attributes of the scene at a given instant in time. LASeR Commands that need to be executed at the same time are grouped into one LASeR Access Unit (AU)

A scene description specifies four aspects of a presentation:
- how the scene elements (media or graphics) are organised spatially, e.g. the - spatial layout of the visual elements;
- how the scene elements (media or graphics) are organised temporally, i.e. if and how they are synchronised, when they start or end;
- how to interact with the elements in the scene (media or graphics), e.g. when a user clicks on an image;
- and if the scene is changing, how the scene changes happen.
Mattia Donna Bianco

Friday 4 March 2011

LASeR Standard

LASeR (Lightweight application scene representation ) is the MPEG RichMedia standard dedicated to the mobile, embedded and consumer electronics industries. LASeR provides a fluid user experience of enriched content, including Audio, Video, Text, and Graphics on constrained networks and devices.
LASeR standard is specified in the MPEG-4 Part 20.

The LASeR standard specifies the coded representation of multimedia presentations for rich media services. In the LASeR specification, a multimedia presentation is a collection of a scene description and media (zero, one or more). A media is an individual audiovisual content of the following type: image (still picture), video (moving pictures), audio and by extension, font data. A scene description is composed of text, graphics, animation, interactivity and spatial and temporal layout.
A LASeR scene description specifies four aspects of a presentation:
  • how the scene elements (media or graphics) are organized spatially, e.g. the spatial layout of the visual elements;
  • how the scene elements (media or graphics) are organized temporally, i.e. if and how they are synchronized, when they start or end;
  • how to interact with the elements in the scene (media or graphics), e.g. when a user clicks on an image;
  • and if the scene is changing, how these changes happen.
The sequence of a scene description and its timed modifications is called a LASeR stream.
LASeR handles access units, i.e. self-contained chauncks of data, which may be adapted for transmission over a variety of protocols. LASeR streams may be packaged with some or all of their related media into files of the ISO base media file format family (e.g. MP4) and delivered over reliable protocols.

LASeR :
  • Brings smart and pleasurable navigation within streamed and real-time AV contents,
  • Is compliant with existing business models, and
  • Allows an increased ARPU(Average Revenue Per Unit) by boosting service subscription thanks to interactivity.
Thanks to the LASeR standard, operators can enrich their service offers and generate user-addiction with the next generation rich-media technology, by leveraginge infrastructures and easily deployable over multi devices and networks.
Mattia Donna Bianco

Tuesday 22 February 2011

Video on the web

Traditional Television is a consolidated business that exists since lots of years (¾ of a century). It is a complex environment, but it can be roughly divided in two blocks. The upstream block, made up of content producers, syndicators, distributors, and the downstream block: TV channels who distribute content to end users. The two blocks are well connected, and content produced by the upstream is licensed piece-by-piece to the downstream part of the environment.
In commercial TV, advertisers and ad agencies help feeding the chain that brings content from producers to end users by footing their bill.

Similarly, “video on the web” can be summarised as follows: an upstream block, typically made up of individual videomakers, provides content to the downstream block, who serves end users.
Unlike traditional TV, here there is a lot more content, more and more people watching it, but it looks like just a few players are making business to an extent that is largely unknown. A new web TV appears online, with 500 videos. End users start watching it, but after some time they want new content, and the web TV itself is not able to renew its catalogue so quickly. Some videos are self-produced, but not enough... How to find the right content providers for its audience? The up and downstream block are weakly connected, content producers are many but it is difficult to establish a relationship with them.
On top of that, web TVs that have a little audience are not appealing for advertisers, and for web TVs it is not easy to find other sources of revenue.

wim.tv is creating an alternative environment. In wim.tv there are content producers, syndicators, web TVs, advertisers, ad agencies and end users: both the upstream and downstream blocks exist.
Creators can upload their videos and license them to syndicators and web TVs, at their own conditions. Web TVs can easily find videos for their catalogues from creators and syndicators, thus reducing the cost of settling business relationships with content producers.
Advertisers can create their own video ad campaigns, define a target and find the right web TVs to reach it.
Every time a certain video ad is served by a web tv to an end user, the corresponding advertiser pays an amount, and remunerates all those who took part to the presentation of the ad to the end user (web tv, video creator, and intermediaries).
Riccardo Chiariglione

Tuesday 15 February 2011

The cross-browser challenge

Web standards have been growing fast thanks to the World Wide Web Consotrium and the community around it. It's amazing to see what a browser can do with a few lines of html and javascript. But.. What if you need to deliver a new feature to users and you can't implement it using current web standards ?

This often means you will have to develop your own "browser plugin" using native code. What you would like is to have a plugin that runs on every browser, on every operating system, on every hardware architecture.

In order to deliver encrypted RTSP content inside user's browser, WimTV had to face (and is facing) the very same problems. Just like Adobe Flash we had to implement our own piece of software that runs side by side with the browser, rendering contents in a child window.

While there are a lot of browsers available, it's is possible to group them by the rendering engine they use.

- Webkit: Chrome, Safari, Android Mobile Browser, many others
- Gecko: Firefox, Camino
- Trident: Internet Explorer, other Microsoft products

Luckly most Webkit and Gecko browsers implement NPAPI for their plugins.

NPAPI is a cross-platform plugin architecture that was introduced by Netscape and then received many contributions by Mozilla and Google. Basically it defines a set of APIs that must be implemented on both browser and plugin side in order for them to interact. Mozilla currently offers the best SDK and documentation to help developers write their plugins using C or C++.

WimTV Browser Plugin uses NPAPI to support Webkit and Gecko powered browsers, and a NPAPI-ACTIVEX wrapper in order to work on Microsoft Internet Explorer.

The wrapper is called PluginHostController. It is opensource and the code is available from Mozilla servers. It consist of an ActiveX plugin written in C++ and implementing some basic funcions of NPAPI. Unfortunately it was written some years ago and it is not maintained. In the meantime NPAPI evolved adding some features we used in our plugin, such as Scriptable APIs allowing Javascript interaction. In order to use it we had to develop the missing parts and then bundle wrapper and plugin inside a cab file.
Davide Bertola

Tuesday 8 February 2011

What is a machine-readable license?

In recent years the use of digital technologies has increased dramatically, we can find these new technologies in all those contexts in which media content is created, distributed and consumed. Nowadays, for example, is possible to buy a single song on the internet, using only digital technologies. The broad use of these technologies has created a clash between those who create content, those who distribute them and who consume them.
In order to regulate the use and distribution of digital content licenses may be employed.
The license is a tool that aims to express what the users can and can not do with the licensed content (a video, a song, an ebook...).
To be machine-readable, licenses must be expressed in a particular way. For this reason a language called Rights Expression Language (REL) has been formalized .
One of the most important REL is the MPEG REL (MPEG-21 part 5). MPEG REL adopts a simple and extensible data model for many of its key concepts and elements. This data model consists of four entities and the relationship among entities. This basic relationship is defined by the MPEG REL assertion “grant”. The structure of the MPEG REL grant consists of the following: the principal to whom the grant is issued, the right that the grant specifies, the resource to which the right in the grant applies and finally the conditions that must be met before the right can be exercised.
In wim.tv the REL model is used for all transactions, be they B2C and B2B. The first is used between WebTvs and End Users, the latter between other actors. In these transaction it is possible to express payment and reissuing conditions.
Edoardo Radica

Tuesday 1 February 2011

Behind the scene, the Web APIs

Once upon a time the web used to be a lot different. If you think about it for a second, you'd surely recall that just few years ago the web was all about emails, static web sites and file sharing (either legal or not) .
It's fascinating how much it evolved, and how close the web applications are getting to the old fashion applications you used to install on your laptop.
This was surely because of the evolution of the network itself (growing bigger and faster all over the world) but mainly because lately new paradigms (such as the ad Software as a Service and Web Service ones) came along opening a whole new prospective on how the software can be delivered.

The Wim.tv project started taking deeply in account those new paradigms and focused on delivering a rich user experience to both its business partner and to the consumer joining the platform. But it wasn't until recently that we started to implement a key feature to become successful web platform: a Web API.
When used in the context of web development, an API is typically a defined set of Hypertext Transfer Protocol (HTTP) request messages, along with a definition of the structure of response messages, which is usually in an Extensible Markup Language (XML) or JavaScript Object Notation (JSON) format. While "Web API" is virtually a synonym for web service, the recent trend (so-called Web 2.0) has been moving away from Simple Object Access Protocol (SOAP) based services towards more direct Representational State Transfer (REST) style communications. Web APIs allow the combination of multiple services into new applications known as mashups . This is a key feature of modern web applications because it allows web developers to exploit the APIs in ways never even imagined by the API's development team itself, letting the web evolve. Wim.tv strongly believes in openness and interoperability and it's working hard to provide a simple and reliable Web API to allow other web developers to interact with the platform as much as possible and to integrate our technologies into their web applications.
Alberto Aresca

Tuesday 25 January 2011

How does your pc plays wim.tv videos?

So, when you first come on the wim.tv page was asked you to download a browser plugin, then you're able to watch videos. But what really happens in this process?  What you downoad is a gstreamer-based rtsp player, that receives multimedia contents from the network and allows the browser rendering them within a web page.

Real time streaming protocol and wim.tv

RTSP (http://tools.ietf.org/html/rfc2326) coupled with Real Time Protocol (http://tools.ietf.org/html/rfc3550) is the network protocol used by wim.tv to delivery videos to the end user. RTSP is a real live protocol, that means that the media supplier could create the video at the very same time it streams it. This is implemented by sending a single frame of the video at the time, so in every moment the server could decide what is the next frame to be sent. In the others web-based video streaming platforms each video is an indivisible file, that must be streamed as-is.

Then RTSP allows to insert advertisements within the video in a totally smooth way, without overlaying, switching through different streams or other tricks. In fact, when an adverticement starts, the player just keep on receiving and frames, without ever notice if those frames are part of the video or the adverdicement. Besides improving playback performances, this ensures the advertisers that end users cannot skip the adverticements.

Another thing wim.tv player is able to do is to decrypt encrypted videos. Pay per view premium videos are streamed in an encrypted form, and can be played only by authorized users. If a malevolent user tries to save video data from the network the only thing he could retrieve is video of solid gray frames. But if you're allowed to play the same video, wim.tv browser plugin will decrypt it for you. In that case, each frame of the video is received, decrypted and played. The entire operation is enough fast that you'll never notice that is happening when you watch a video.
Alessio Lagonigro

Wednesday 19 January 2011

A micropayment transaction system

wim.tv allows the streaming of video contents supporting different business models. One of these models is having an End User paying a specific amount to a Web TV in order to enjoy a premium video content. Even if this may be a premium content, it is hard that someone is willing to pay to watch it, unless a very small amount is requested.
Here comes the need to adopt a new technology providing an alternative micropayment method: iPay. 

                   What is iPay?

iPay is a new payment method based on LETS (Local Exchange Trading System) principles and allowing to pay for digital content by exchanging a virtual money, named iPay point. This virtual currency does not substitute real money, instead it is a complement of that. In fact what happens is that iPay users, called Subscribers, creates a virtual account which is held by a VASP (Virtual Account Service Provider). This virtual account is bound to a real method of payment, such as a bank account, a credit card or a PayPal account. In this way Subscribers operating in the iPay system are able to synchronise their virtual money with real one. Hence transactions costs are drastically cut, since many small virtual transactions collapse into a single “large” real transaction. This implies the possibility to easily express also very small amount of money and so to monetise the services contributed by each player, obviously including end users, even if this contribution may be very small. On the one hand this encounters the need of creators, who are now able to profit from their work, without being forced to pass by classic publisher figures or major distribution circuits; on the other hand users have at last a fast, fair and reliable method to purchase digital content.


The iPay project has begun in the form of a textual specification (http://dmin.it/specifiche/summary.htm#12_iPay_specification) given by the dmin.it group (Digital Media in Italia - http://dmin.it), later  transformed into an XML specification, whose purpose is to provide a standard communication protocol for the various elements of the system, a common starting point and reference for software implementation. The different XML messages in the specificationallow especially subscription, payment, cashing and information retrieval. These were converted into Java classes belonging to the Core library. The library has been released with an Open Source license (MPL v. 1.1) in order to enable the development of further software implementation on top of that.
An experimental Java-based software implementation, starting from iPay specification, has been realized in 2009.

                   How is iPay used inside wim.tv?

Wim.tv allows users to play many roles. One of these roles is the Web Bank, the wim.tv name  of the VASP inside iPay: other roles such as Creators and WebTVs can be seen as iPay Subscribers, maintaing business relations among them.
A peculiarity of wim.tv is the adoption of a license model for rights management. This model is based on MPEG standards, namely the MPEG-21 REL (Rights Expression Language) and DIDL (Digital Item Declaration Language), where XML dialects are defined, allowing the representation of Digital Items (DI) and rights related to resources. A license will contain all the information about what a user can or cannot do with a specific content, which is bound in the DI together with metadata.
wim.tv allows a creator to upload a video and to specify rights to his content. In case of a professional license he will also be able to define the amount that he wants other users to pay for the usage of the content.
In other words this will associate a license to the specific content involved. Then a component of wim.tv platform, called License Authorizer, will parse this license in order to verify whether an action on this content can take place (e.g. a user X willing to view a content Y issued by a Web TV Z).
Moreover since many actors are taking part in the wim.tv value chain, some hidden fields are inserted within this license. These fields, called Encrypted Grants, will ensure that all the actors involved in the value chain will receive what is due to them.
Basically the wim.tv platform by parsing a license will retrieve all the information related to payment. This will call a REST API with all the needed information for the Web Bank-related web-services to start a transaction of Wim Cents.
This is just one of the business models implemented in the wim.tv platform. By adopting a micropayment system and REL licenses models together, several digital value chains may be defined where all the actors are sure to be rewarded even for small amounts of money.

Sergio Matone

Monday 10 January 2011

A roadmap to converging video services

Despite the rosy pictures we are often shown of encroachment of new media in the TV turf and the support of statistical evidence suggesting that more people spend more time with non-TV video, TV is as healthy as ever. In a recent Nielsen report Americans are said to have watched more TV in 2010 than ever before: total viewing of broadcast networks and basic cable channels is up ~1 percent, i.e. ~34 hours per person per week.

Conservative” extensions of TV to the web like Hulu, Netflix or Apple TV are reported to fare rather well. On the other hand “innovative” attempts at integrating the television and “web video” experiences, like Google TV, receive mixed reports and see their deployment delayed.

The issue is further complicated by the underground battle around the enabling technologies to be adopted for streaming video to the end user via the internet. In the “analogue TV” age Consumer Electronics (CE) has thrived by adhering to established standards. In the now consolidated “digital TV” age CE has kept on thriving based on established standards. Should the “TV on the web” age be dominated by a handful of behemoths brandishing their technologies as a weapon to preserve and extend their walled gardens?

Judging from the number of initiatives addressing the need for standards in this space, one would say that the relevant industries do think that proprietary technologies should not be the only game in town. Unfortunately most initiatives have issued or are in the process of issuing specifications that appear to be driven by the desire of industries to protect their existing businesses by adding new features while keeping out potential new competitors. Whether this is what consumers are interested in is another story that may very well not be in their priority list.

ISO/IEC JTC 1/SC 29/ WG 11 (MPEG) has been working for the last few years – and keeps on doing so – to develop the key technologies that will enable, as done for digital TV, the creation of a level play field on which the third generation of CE can flourish. Some of these technologies target:
  • New video and audio compression for more rewarding user experiences while keeping down the bitrate
  • Media composition and presentation
  • More attractive ways for the user to interact with services
  • More effective ways to deliver content to end users when network is unreliable
  • Multichannel distribution of content
  • New ways to do business with content
This collection of basic technologies is very important for a smooth transition from “digital TV” to “TV on the web” based on standards. To make this happen, however, industry needs comprehensive specifications that integrate the technologies so that they can be seamlessly integrated in products and deployed to provide interoperable services.

In 2008 the Digital Media Project (DMP), an industry association based in Geneva, is in the process of launching a new project on “Digital media platform for the 2nd decade of the 21st century” (P21-2). The goal of this project is to integrate all technologies that are required to provide a solution that is attractive for consumers, profitable for content creators, secure for service providers and rewarding for device manufacturers.

A precursor of P21-2 is wim.tv, a service on the web that lets different types of entrepreneurs do business with video and advertisement content. wim.tv is enabled by CEDEO’s Platform for Digital Asset Trading (PDAT), designed to offer users all services required to do business with video content on the web effectively and profitably, e.g.
  • Describe content
  • Negotiate terms
  • Request/generate/process events
  • Issue licences
  • Associate content/ads
  • Stream video securely
  • Interact with content
  • Pay/cash
PDAT is an early implementation of the emerging MPEG-M standard (ISO/IEC 23006 Multimedia Service Platform Technologies). Its modular architecture allows for the easy replacement and introduction of existing and new modules to extend the range of services offered to its users.

Currently PDAT supports the following browsers: IE, Firefox, Chrome and Safari, running on Android, Linux, Mac OS (10.5 onward) and Windows (XP onward). The wim.tv player is a PDAT plugin.
Wim.tv is an ideal platform for the convergence of television services. It is based on international standards, has a growing community and its player is easily portable in such environments as Web, IP and mobile TV.

Initiatives such as wim.tv can provide the video ecosystem the means to move to the next level because of the existence of standard API to access services.

Leonardo Chiariglione