1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 5.00 out of 5)
Loading ... Loading ...

Motivation

Things became more stable last time – we have clients who pay in time, the projects are mostly interesting, we have defined schedule and proven methods to achieve results. No more crashes, sleepless nights and broken builds that should have been in production a day ago :) We have not introduced TDD and really good QA process yet, however this became stable enough to make our life not so stressed. Are we satisfied? Someone – yes, but others want something more thrilling. Programming should be fun (I think so too), but we cannot make our commercial projects to be fun, because we need to guarantee the quality. At the same time, there are a lot of interesting technologies to try that we don’t deal with on a regular basis. Ruslan, our client-side developer whose hobby technologies are RoR and low-level C++ and Assembler coding proposed to hold a kind of hackathon.

Previously we had a few ideas and started working on them but could not find enough time to work at them till the end, so this time we wanted to pick some problem that is interesting enough to make us code 48 hours with little sleep. We wanted to start on Friday and release on Sunday morning.

The task

Very easy – make in-browser voice chat with private and public sessions. In order to achieve this, we decided to use the following technologies:

  • Voice recording – C++ framework FireBreath that allows creating browser plugins (that work in FF, Chrome and IE). Yes, we know this can be done easily with Flash, but it is not interesting :)
  • Voice transmission – We were choosing between implementing this in plugin or using JS to send the raw data to the server
  • Streaming server – node.js. Yes, we know there are ready streaming servers for video and audio (e.g. ope-source Red5 in java), but we wanted to try node.js to accomplish it
  • Streaming audio player – HTML 5 audio tag. Again, we know flash here is better, but we wanted to try this variant

This seemed to be pretty straightforward – use WinAPI to record audio, open upload stream to the server (HTTP session), upload each chunk as a part of multipart HTTP request. On the client side – embed audio tag, that has src=stream URL and have node.js server publish each portion of audio we receive to the clients. Easy? Yes! Working? No! :)

21:00 15 Jul – Let’s start

We started with an interface. At that moment we did not even thing how far we are from the interface creation :) This took us 20 mins to agree on all features we wanted and start the development.
We split the task into C++ and node.js coding. Then on the node.js part there were tasks to check the streaming, to accept the multipart request and to write the appropriate JS code to work with our server and start/stop streaming

I was working with node.js and found it really great. I tried it previously back in January 2011, I wanted to make the simple application using it with mongodb and was really frustrated by the async programming.
This time I was writing a lot of asynchronous things, but it seemed natural to me, everything was really nice. Maybe it’s a problem of attitude :) I also found that there are a lot of modules for different tasks – streaming uploading, frameworks, session support and that they are really easy to install. If one asked me to compare gem system to the npm in node.js, I’d say npm wins hands down. I can’t say I’m guru in either of that, that’s more like first impressions. Handling connections, identifying clients, combining requests of the same client together was really easy. Their event system allows natural JS programming you’re used to in ExtJS for example. Inheritance approaches are a bit clumsy compared to ExtJS, but that’s fine. You just need to get used to it.

While I was enjoying node.js, my developers were working at the C++ code. they insist it was fun, but I can’t say I liked that. I know C++ quite good, but that collection of pointers, data structures, clumsy WinAPI, raw WAV format structure (we decided to use it for simplicity, just to make it working), STL templates, type casting cannot look as nice as my clean JS :)

You know, with node.js standard JS drawbacks go away – it turns out to be really fast, has access to system API, can read files in raw mode (I managed to read WAV headers, identify bitrate and calculate streaming speed – all that with PURE JS), manage a bunch of HTTP connections etc. Now it is impossible to say JS is only browser language, it is universal language now!

00:01 16 Jul – Too optimistic?

So far we could only stream some audio, but only for one client, the second one connected received something, but nothing was playing. I’ve used ReadStream to read from file. It was really easy to do – you open stream and attach event handler to the ‘data’ event and your handler is called with new portion of data. However it did not allow me to manage streaming speed according to bitrate and content went out to the browser too fast. Good for client, but bad for server and unrealistic because we’ll get recorded sound in real time. So we need to change the reading speed to make it work fine.

C++ guys managed to compile the test plugin and made it work in Chrome and FF on Windows. They wrapped it as XPI and made it install natively in any browser as a plugin. They’ve also started working at audio recording, but it’s not so easy to do with raw Win API. We could not connect other libraries, because those are dll, but plugin can contain only one dll which is our code.

Some of us feel sleepy, but beer and code makes it fun, so we go ahead :)



05:00 16 Jul – Exhausted, but happy

I discovered the problem with streaming to any number of clients – upon connection I need to send the correct WAV header. According to this manual, it is first 44 bytes of the file. Since I am experimenting with static file so far, I read first 44 bytes when server starts, and upon connection of a new client post those 44 bytes as the HTTP response. Then when new chunk is read from file (now using async fs.read that allowed me to make setTimeout for delays to match the play speed with the streaming speed), I post it to all clients connected. I’m a bit concerned that posting is synchronous and can take long time is clients’ connections are slow, but for the hack this is fine, because we were testing this on LAN.

So now the whole office can listen to the Magic Melody at the same time and we can say we have internet-radio now :)

C++ guys finished with recording sound, they put correct headers and it works fine. Now they should put this to the user’s home folder, because they did not find any way to read audio to memory, they could only read and immediately write it to the file. OK, fine for now, let’s go ahead, we’ll do optimizations later

We’re quite exhausted, so we decide to leave in 30-40 mins and have some sleep at home

Day 2. 16:30 16 Jul – Let’s go ahead!

Previously I discovered, that HTML5 audio tag cannot really handle the streaming audio now. Support varies from browser to browser, but in reality only FF could handle the streaming audio (with no content-length) supplied. Actually there are specification drafts right now, that propose several audio sources and allow combining them together, however this is only draft and proposal. In order to make chrome work with streaming audio, I had to make it look like fixed, but actually make it stream. Here’s how it is possible:

  1. Client opens HTTP connection to http://server/stream
  2. Server starts streaming and sets content-length to around 5MB
  3. Client handles the event from audio tag ‘canplay’ and starts playing the data
  4. Client requests asks the server for the refresh timeout. Basically it is the time that will pass till the stream of 5 MB ends
  5. Client waits the specified timeout (actually a bit less) and creates new audio tag. This initiates another HTTP connection
  6. When the second stream is ready for playing (canplay event), client deletes the first <audio> tag and starts playing the second one
  7. Client notifies server that it can close the first connection (that is still streaming the data)
  8. Process repeats all the time. There is a little sound glitch when audio tag is switched, but overall it is fine. I can’t say the quality is worse that skype or other VOIP. We’ll see how it performs later, but now is fine

Did you get a feeling that the above process is a dirty hack? Yes, you’re completely right! After this I really started thinking that flash-based streaming player or C++-based one would be better. But we wanted to try HTML5! We did, we made it work, but we did not enjoy the solution.

Just a warning if someone wants to repeat the above. You cannot send cookies with the first response with stream. Actually you can. but they will be processed after client receives the full response. And since it is a stream, this will never happen. Since you cannot post a cookie, you will be unable to determine that two requests are coming from the same client. So that scheme lacks important step – a handshake connection between client and server. The only task is to set cookie to the client.

Now we have fully-functional streaming with small artifacts. The main problem we had so far – everybody is tired of magic melody, so we switched to the Whisper

The C++ guys rock! They successfully record sound within a browser (plugin is embedded into the page as object and one can work with it using JS). They started working at the server post using winsock2 (low-level socket library for windows). The basics is simple – they open the socket to http://server/in, write multi-part request headers and then post a chunk of data as soon as they read it. Server knows the audio quality they are going to produce and sends the correct WAV headers to it’s clients. The rest (streaming) is already done. However the multi-part request has some problems and nothing is posted to the server…

04:00 17 Jul – WTF?!

By this time server is ready, all problems we found are solved, streaming upload is done as described in this article: Parsing file uploads at 500 mb/s with node.js it works fine if we output the streaming upload both to the file and to the clients. Now we need only the last part – correct multi-part request from the server. but for some reason it does not work. Headers are fine, but body cannot be parsed. We could not find where to see the raw request body before parsing, so debugging was pretty hard and we could not do anything. So decided to go to sleep (at around 06:00)

As a result, we ended up with good plugin that records audio from the browser – cool but rather useless. We also have a kind of simple streaming server based on node.js and a browser JS class that manages the audio tag switching described in the previous section.
We’ve spent 26 hours of coding (x 5 developers), 10 liters of beer, countless cookies and coke cans. Got a lot of fun, interesting code and positive emotions

Yes, we want to do it again!

Despite of failure, we liked the format of this coding fest, we liked the company of each other, we liked new technologies and the feeling of drive. As for me, I got excellent experience and learned more about my employees. As for developers, I think they’ve spent a nice time! At least they said it was COOL :)

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

Share this post with a friend Share this post with a friend

Leave a Reply