Wednesday, January 30, 2013

What is the length of a continuously replayed part of video?

As we mentioned in our previous post, we filter out continuously replayed segments (views) shorter than a magic length. So let us talk a bit about this magic length.

We are monitoring which part of video is being played. Each time you seek in video, click on a slide or follow a search result, you start to replay a new segment of video.

Now, please take a look at the following figure:


As you can see, there are 4 segments. First, the user started to watch the video from the beginning. After a while, he found it probably boring, so he jumped a little bit further. Then, he went through slides and found one interesting slide, so he clicked on it and jumped into the last quarter of the video. Finally, an interesting keyword came to his mind, so he searched for it and found an occurrence of the keyword in this video. He jumped to that part of the video where the keyword was spoken and watched for a while.

We monitor such events. When we sum up all segments and plot a histogram out of them, we get a graph like the following one:

We bind segment lengths into the bins with 5 second intervals. It is a nice example of long-tail distribution. The most frequent segments are the ones shorter than 5 seconds. We found closely 30 thousands of them. It is about 25% of all segments already replayed on SuperLectures.com. We have about 20 thousands of 5 to 10 second long segments. And so on.

When we sum up lengths of the segments from the longest ones to the shortest ones, we get a graph like this:


We emphasized the points where shorter bins of segments are taken into account (from left to right). The y-axis is normalized cumulative video playback time. Now, observe segments shorter than 20 seconds. Despite more than half of the amount of all segments (62k out of 110k), they form only 2% of the total amount of replayed video.

As you can see, there is a huge amount of segments (user behaving), where users are seeking through the video. Jumping from one time to another, clicking on slides and the like. However, we want to provide you with reliable statistics. That is why we chose 20 seconds limit to filter out segments where users are seeking and probing rather than watching videos.