GSoC 2017: Phase 2 Evaluation

|

By the end of Phase 2, I have completed following tasks:

  1. Create Micro window and transfer MediaElement to it.
  2. Implement video transition mechanism reacting on speaker change
  3. Develop basic toolbar buttons embedded in micro mode
  4. Modularize and refactor the micro mode code

https://github.com/jitsi/jitsi-meet-electron/pull/12/commits

There are two modules I created for this project: ‘p2pconnection’ and ‘micromode’.

‘p2pconnection’ module is responsible for transferring the Jitsi-meet MediaElement from one Electron BrowserWindow to another using WebRTC technology. More details can be found in my previous post.

‘micromode’ module is simply a complication of all the codes I have written onto the existing codebase to make the Micro mode work. Instead of writing everything on the main.js and render.js files, I have refactored them out as a module for better readability. Currently, Jitsi-meet-electron app’s code consists of three main parts: main.js, render.js and micro.js files. ‘main.js’ file is the Main Process part of the Electron framework while ‘render.js’ and ‘micro.js’ are the Renderer Process part of the Electron, each responsible for running a BrowserWindow.

In each process, respective part of ‘micromode’ module is imported, initialized and disposed once the application is closed.

code

As shown above, the main process simply has to require the ‘micromode’ module, and call inti, show, hide, dispose methods whenever they are necessary.

There are a few potential areas of improvement from the current version:

1. Add more features to Micro mode’s toolbar

Micro mode currently has audio mute, video mute and hangup features, but there are plenty of other functionalities in the Jitsi-meet application that Micro mode can also provide, such as chat and screen sharing. Some of the features are not suitable for Micro mode, such as live stream, shared document and shared YouTube video. Screen sharing feature especially goes well with the Micro mode because the user might want to show their desktop and be able to watch the remote video at the same time.

2. Optimize the video element in Micro mode.

Micro mode occasionally has a lagging issue, either the video is a fraction of a second slower than the original video, and the transition animation is sometimes clunky. This is probably because of the WebRTC video transmission overhead or too much resources taken by Micro window. One thing I can try at the moment is to lower the video resolution in the Micro mode.

3. Switch to more reliable WebRTC technologies.

Currently, Micro mode’s modules several WebRTC experimental functionalities which are quite unstable and some of which are deprecated. I had no choice because there is no practical alternatives. As WebRTC get more developed, there should be follow-up maintenance works to switch to newer and more stable technologies.

Conclusion

The more I work on this project, the more I realize the lack of IPC supports provided by Electron framework which causes spaghetti codes and runtime overhead, while giving me tremendous amount of headache during development. Nonetheless, the current version of Micro mode finally functions as its purpose, so the rest of the development would be mainly optimization works and adding more functionalities to it.



GSoC 2017: Log #4

|

After resolving the largeVideo transition problem, the next step is to build a toolbar for micro window. I used Jitsi-meet’s ExternalAPI as the interface between the main Jitsi-meet iframe and the Micro mode.

code

JitsiMeetExternalAPI class creates a Jitsi-meet iframe on initialization, and provides several methods to control its components. The most basic features I first added to the micro mode are ‘mute-audio’, ‘mute-video’ and ‘hangup’.

screen

Once the toolbar buttons in micro window are pressed, it sends message to the Jitsi-meet iframe through Electron’s IPC channel, and then JitsiMeetExternalAPI toggles audio and video components accordingly.

Whenever the micro window pops up, the main window has to send the media status to the micro window, whether user’s audio or video is muted. For now I used a ‘hacky’ way, which is to access the ‘audio-mute’ button and ‘video-mute’ button in the iframe and check whether their className includes ‘toggled’ attribute. The main window sends the video status through the IPC channel, and the micro window switches its toolbar button layout accordingly. Since accessing an iframe’s DOM element directly is not really a good practice, a better way would be using the JitsiMeetExternalAPI interface instead. It can provides some ‘get’ methods whether the video or audio inside the Jitsi-meet iframe is muted.



GSoC 2017: Log #3

|

Previously, I managed to transfer the conference video from the main BrowserWindow to micro BrowserWindow using RTCPeerConnection API. However, one problem emerged was that it does not react to the dominantSpkearChanged event, which causes the transition of the largeVideo to another speaker in the Jitsi-meet application. This problem caused immense amount of trouble because it is so hard to solve without directly accessing the Jitsi-meet’s components.

At first, I thought of using importing Jitsi-meet’s APP object and then listen to the dominantSpeakerChanged event, but I got a feedback from the mentor that it is not a good design to directly access the components inside the iframe. So I am limited to work outside the iframe.

Another method recommended was to use HTMLMediaElement.captureStream() API, that returns a MediaStream object which is streaming of a real-time capture of the content being rendered in the media element. Surely, this seemed to be the ideal solution except it does not work in Electron BrowserWindow. The largeVideo DOM object does not has a “src” attribute, instead it has a “srcObject” attribute which is the MediaStream received from the current dominant speaker. Unfortunately, Chromium does not allow using captureStream() method on a video without “src” attribute, and throws the following error.

Uncaught NotSupportedError: Failed to execute ‘captureStream’ on ‘HTMLMediaElement’: The media element must have a source.

This has been resolved in the Chromium version 59.x.x.x, but even the latest version of Electron, v1.7.5 beta uses Chromium version 58.x.x.x. So there is no legal way to use HTMLMediaElement.captureStream() on the Jitsi-meet’s largeVideo in Electron environment.

One work around I tried was to make a canvas copy of the largeVideo, and apply HTMLCanvasElement.captureStream().

/**
 * Create a copy of HTMLvideo in HTMLcanvas for
 * Get reference of the video, width and height as parameters
 * Return canvas object
 */
function copyVideo(video, width, height, frameRate) {
  let canvas = document.createElement('canvas');
    canvas.width = width;
    canvas.height = height;
    let ctx = canvas.getContext('2d');
    setInterval(function() {
      ctx.drawImage(video, 0, 0, canvas.width, canvas.height);
    }, frameRate);
    return canvas;
}

This approach worked and I was able to capture the Jitsi-meet’s largeVideo onto a canvas element, which reacts to the video transition event properly.

I really hoped this canvas approached solve the video transition problem for once and for all, but it did not. Once the window get minimized, the canvas stop capturing the video, probably because for most of the cases there is no point of capturing visual data when the window itself is not visible.

After some desperate researching and all alternatives failed, I posted a question in StackOverflow and got an unexpected solution for this long dragged problem.

https://stackoverflow.com/questions/45156701/html-canvas-drawimage-when-window-minimized

It turned out that HTMLVideoElement.ondurationchange event can detect the change in srcObject of a video element. In this way, I don’t have to access the Jitsi-meet iframe’s App component and still be able to react to the dominantSpeakerChanged event. After discovering this, the rest of the process was very easy; whenever ondurationchange event is emitted, the main window remove the existing MediaStream object from the p2pChannel and attach the new MediaStream object.

I have been struggling so long to solve this problem yet the answer was somewhat disappointingly simple. One thing I realized was that WebRTC is a fairly new technology and a lot of its parts are under development. Even the addStream() method I used to transfer the MediaStream object is deprecated, and was recommended to use addTrack() method instead. However, addTrack() method is not supported in Chrome, so I was forced to use a deprecated method. This is not the only case of using experimental features in WebRTC, so I foresee lots of upcoming maintenance works as some of the features become unsupported in near future.



Game AI: Snake Game

|

snake

https://github.com/leook0209/Genetic-Algorithm-Snake

Snake game is a simple game in which the player moves the head of the snake up, down, right or left to eat a randomly generated food. The snake grows its size by one every time it eats the food, and the snake dies once it hits any part of its body. This project is about training an utility-based snake game agent using a genetic algorithm with a number of heuristics.

Fitness

Goal(fitness) of each game is to have the snake’s length as long as possible, while taking as minimum as possible turns to finish the game. Fitness is calculated by following way:

fitness = Length of Snake- α * ( Number of Turns taken)
α: weight for Number of Turns Taken

The reason for minimizing the number of turns is that snake game has a very easy strategy to beat, which is to circle the snake around the edge and eat the food at the inner side of the field in a safe manner. Hence, α should be set with a reasonable number in order to prevent the agent to take an easy way out.

At each turn, the agent calculates a heuristic value for moving each direction: up, down, left and right. If a direction leads to the snake’s death, the heuristic value is NEGATIVE INFINITY. Even if a position next to the head contains the food, the snake might not decide to take it if it leads to a less desirable future state (e.g. creating a dead end). To prevent the Snake from taking the same motion over and over again, it is designed to be more attracted towards the food with time.

Heuristics

There are 6 heuristics the agent uses to calculate the fitness:

  1. Manhattan distance between the Snake’s head and Food
  2. The position of the Snake’s head from the center of the field
  3. Squareness of the Snake
  4. Compactness of the Snake
  5. Connectivity of the field
  6. Dead End Indicator
  7. First two heuristics are quite intuitive to understand while next four heuristics are not. Those are heuristics concepts I created for this agent.

Squareness

Squareness is an indicator of how the Snake’s body is orientated in a square/rectangular manager.

O – Empty Space, S – Snake, X – Blank Space, H – Snake’s Head
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O S S S H O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O

The above example’s snake is oriented in a perfect rectangular manner. In this case, the squareness value is 0.

O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O S X X X O O O O O O O O O
O O O O O O O S X X X O O O O O O O O O
O O O O O O O S X X H O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O

The squareness value is the number of blank spaces that is within the Snake’s square boundaries but not filled by the snake. The square boundaries refers to the rectangular space taken up by the leftest, rightest, upper most, and lower most part of the snake. For the above case, the squareness value is the number of Xs, which is 8.

Compactness

Compactness is an indicator of how compactly the Snake’s body is oriented. It is the number of cases where one body part of the Snake is placed next to another body part of the Snake, without double counting.

O – Empty Space, S – Snake, H – Snake’s Head
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O S S S H O O O O O O O O O
O O O O O O O S S S S O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O
O O O O O O O O O O O O O O O O O O O O

For the above case, the compactness of the Snake is 10.

Connectivity

Connectivity is an indicator of how connected each part of the field is, and whether the Snake is separating one part of the field from another.

O – Empty Space, S – Snake, H – Snake’s Head, X – Space Chosen by Agent
O O O O O O O O O S S H O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O X O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O S S O O O O O O O O O O

At each turn, the agent pick a random empty space in the field, and count how many spaces are disconnected from that space as they are blocked by the Snake’s body. For above case, the connectivity is 148.

Dead End indicator

Dead End Indicator represents how many spaces are unreachable by the snake based on the current orientation.

O – Empty Space, S – Snake, H – Snake’s Head
O O O O O O O O O S S H O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O O S O O O O O O O O O O
O O O O O O O O S S O O O O O O O O O O

Dead End indicator is calculated in a similar manner to the Connectivity, except that instead of choosing a random empty space, it checks connectivity from the Snake’s head. For above case, the left side of the field is unreachable from the Snake’s head, hence the Dead End Indicator value is 134.

After a few rounds of training, the genetic algorithm shows a trend to maximize the Compactness and minimize the Distance from Food, Squareness, Connectivity and Dead End, while it does not really care about the Distance from the center of the field.

Genetic Algorithm

The genetic algorithm for the Snake Game agent has a population size of 500, mutation rate of 0.05, survival rate of 0.5. For each weight sets, the game is played for 3 times and taken the arithmetic average, in order to minimize the effect of randomly generated food positions.

Crossover

At each generation, the population is sorted based on their fitness value, and two parents are chosen from the 50% of the surviving population, while a Snake with higher fitness having a proportionally higher chance of being chosen. A child is created by taking the weighted average of each heuristic weight of the parents’.

Mutation

Each heuristic weight of the Snake in the population has 5% chance of being mutated by ±0.2.

The biggest learning point was that it is a bad idea to create a computation intensive program with Web (JavaScript). Due to the performance limitation of the Chrome browser, it took so much time to train the agent with the Genetic Algorithm of population 500. Even after the training, the elite Snake still could not finish the game. The most difficult part is that the food can be generated at an unreachable position. Hence, it is wise to minimize the number of holes generated by the Snake’s body (Connectivity) but then it takes too many steps to clear the game.



GSoC 2017: Phase 1 Evaluation

|

In the first month of the project, I have attempted following tasks:

  1. Capture the large video embedded inside the Jitsi-meet’s iframe in the main renderer BrowserWindow
  2. Transmit to the micro mode’s BrowserWindow
  3. Display on a HTML video element

1. Capture the HTML Video Element inside iframe

source

The Jitsi-meet’s largeVideo can be extracted directly from its iframe, using

iframe.contentWindow.document.getElementById('largeVideo');

I can subsequently retrieve the source MediaStream from the video’s srcObject attribute. However, if I simply display that MediaStream on the micro mode’s window, it does not react when the dominant speaker changes in the Jitsi-meet application, because the original HTML video’s srcObject attributes switches to another MediaStream.

There are two possible options to implement video transition in Micro Mode:

  1. A ‘hacky’ way. Import Jitsi-meet’s APP object and listen to the dominantSpeakerChanged event. Once the speaker changes, re-extract the largeVideo from the iframe.
  2. Capture the largeVideo displayed on the main BrowserWindow, convert it to MediaStream and send to the Micro Mode’s window. When the speaker changes, it automatically captures the video transition.

So far, I have attempted the second approach, but it has several problems. The existing version of HTMLMediaElement.captureStream() method does not work because the largeVideo extracted from the iframe lacks ‘currentSrc’ attribute, hence keep throwing “The media element must have a source.” error. I need to find an alternative of HTMLMediaElement.captureStream() that captures a HTML video element real time, and produces a MediaStream object.

One approach I tried was using the HTML canvas to take a snapshot of each frame of the largeVideo and render it like a video.

2. Transmit the Video to Micro Window using WebRTC

The inherited difficulty of this task is that each Electron BrowserWindow is an independent Chromium page. There is virtually no direct way to transfer media data from one window to another. After a long research, it was concluded that using webkitRTCPeerConnection to set up a MediaStream peer connection between the main window and the micro window is the most feasible approach.

Reference: https://www.tutorialspoint.com/webrtc/webrtc_video_demo.htm

The details of how peer connection is implemented are shown in the GSoC 2017: log #1.

Since the main audio is played in background after the Jitsi-meet window is minimized, there is no need to transmit the audio to the micro window.

One major concern is the performance issue. Running a background peer connection between the main window and the micro window might cause a performance drop of the Jitsi-meet conference.

3. Retrieve the Video and Display on Micro Window

After the MediaStream of the largeVideo is received by the micro window’s side, it can be simply displayed by setting the srcObject attribute of the HTML video. Then, the micro mode’s window is positioned on the top right corner of the screen, with the frameless option and always-on-top options enabled. The end product looks like this…

evaluation

However, several problems emerged after testing minimization of the main window.

Using a HTML canvas to capture the large video works only when both windows(main and micro) are active and visible. When I tried minimizing the main window, the video in micro window just freezes. I am guessing that HTML canvas’s snapshot method works only when the target video is active(not minimized). I need to search if it is possible to play the target video in background even after the window is minimized.

Furthermore, when I tested sending a local video(mp4) from the main window to the micro window, videos in both windows ran smoothly. However, once I minimize the main window, frame rate of the micro window’s video immediately starts dropping. It is still playing, but the frame rate drops quite seriously that it does not look like a video anymore. I am going to research how Chromium browser handles a background Media, and if there is a way to activate a target MediaElement in background.

Conclusion

Although the video transmission appears to be working, there are numerous internal problems I have to resolve before I move on to the next step. A lot of time is consumed for researching possible methods to implement the features, and only a fraction of time is used for the actual code writing. I used to only look up StackOverFlow for my programming problems, but for this project I had to read on many official API documentations, issue & bug trackers, and discussion threads. This is because the problems I used to solve were a kind that had one simple solution which worked cleanly. But for the Jitsi-meet-electron project, there are many different approaches I can take to solve the same problem, and in worst case scenario, the problem is actually unsolvable / not supported.