Damon Rocco's Developer Blog

Thursday, February 17, 2011

Slow Week This Week

Project progress has been pretty slow this week because of midterms and papers. Also, DXUT (the DirectX version of GLUT) decided that it didn't want to work nicely with my laptop so I've been spending some precious time troubleshooting it.

I went to an excellent GRASP lecture by Zhengyou Zhang from Microsoft Research. The title of the talk was "Human Activity Understanding with Depth Sensors" and it covered how a lot of the functionality provided with the Kinect SDK works. The researchers at Microsoft are able to derive a skeleton from up to 4 humans in front of the Kinect at a rate of about 2ms. This gives me hope that the image processing segment of my application will not be the performance drain that I'm fearful of.

The talk was also interesting because it confirmed that even if I had access to the Kinect SDK I would still have to do the majority of the work I'm currently going to be doing as the Kinect SDK does not provide the level of detail regarding hands that I require. It would fully take care of head tracking, and would give me a rough idea of where the hands are, but I'd still need more precision to create a good user experience.

The Alpha Review is next Friday, and I'm aiming to have head tracking fully working by then.

Thursday, February 10, 2011

Ultimate Design Post

This week I’ve been getting my hands dirty with the different software components of my project, and I’ve spent a lot of time designing how the final thing is going to run.

The Big Picture

This picture depicts a side view of what the player will see. An actual observer from this angle would just see the player waving their hands in the air in front of their screen, but because of the head tracking, it appears to the player that they are grabbing objects that are floating in the air in front of them. This effect is best illustrated in Johnny Chung Lee’s video about head tracking using the Wiimote:

Software Details
I’m working with a lot of different software libraries, and I need to tie them together in a way that works, but is also fast and efficient so that the user experience isn’t degraded by lag. The biggest issue is that the Kinect’s camera is only 30 fps, so input is not going to be instantaneous.

I’ve decided to make my application threaded using the pthreads library and I’m assigning rendering and input processing to different threads. My main goal with this approach is to allow one thread to be calling into DirectX as much as possible so that even though input is only coming in and being processed at a 30fps maximum, the rendering rate will be much higher.

The program will consist of the following class structure:

Both the input manager and the game manager will be given a minimum of 1 thread each depending on the platform (my laptop has 2 threads, but the 360 has 6).

Input Manager
The input manager busy-waits on the Kinect until the next frame is ready. It then takes the data and processes it and places the processed results into a thread-safe structure so that the Game manager can access it and update the game state accordingly. When the input manager has more than one thread available to it, it will spawn helper threads to process the data from the Kinect since there is obvious parallelism there (face detection and hand detection can easily be done concurrently).

Game Manager
The game manager checks the thread-safe structure for changes, and then updates the game state based on elapsed time and sends the new scene to DirectX for rendering. The update is broken up into

Thursday, February 3, 2011

Prototype Technology

In this post I'm going to outline all of the technology I will be using in my initial Prototype which I will be developing in the next few weeks.

Input:

LibFreenect: I will be using the LibFreenect library to extract the color and depth information from the Kinect. The library will also be used during the calibration phase of the application to change the Kinect's viewing angle. I will be wrapping the library in a KinectHandler class so that in the future I can easily replace it with an official SDK without changing the entire structure of the program.

OpenNI: I will be using the OpenNI library to determine the user's hand positioning and what they are doing with their hands (ie. grabbing an object). I chose this library because it is open source under LGPL, has excellent community support, and is well suited for processing the images extracted from the Kinect.

Fdlib: I will be using the FdLib library for face detection. I choose this library because it is both fast and simple to integrate, which makes it a very strong candidate for incorporation in the prototype. The fact that it is both proprietary and closed-source means that I will most likely be replacing it later in development, but for the purpose of rapid prototyping and getting to a play-testable state I think it will do nicely for now. I will be wrapping the library in a class so that it is easy to replace without altering the program structure.

Output:

DirectX: I will be using DirectX to handle my output. The reason I'm choosing DirectX instead of OpenGL is because I think that ultimately my game would be released on the Xbox 360 platform (and perhaps Windows), which means that I must use DirectX for rendering, but as I don't forsee releases on platforms that don't support DirectX, implementing OpenGL seems unnecessary. I'd also like to get more experience working with DirectX as it is the current game industry standard. I will be writing a wrapper around my output so that I can swap out DirectX for something else if I change my mind.

That's an overview of the software I'll be using in my project. I'm sure there will be future posts that further explore the good and bad of each library.

Thursday, January 27, 2011

Getting Started with the Kinect

We got a Kinect in the lab this week and so I restructured my schedule a little bit in order to get my hands on it as quickly as possible.

I've got my laptop set up with the LibFreenect open source library for interfacing with the Kinect device, and begun to extract some data:

These images are screen grabs from one of the LibFreenect sample projects and show the extracted RGB image from the Kinect and a depth image that has been colorized (red is close to the camera dark blue is far away, black indicates a lack of depth information).

From this output, I've determined a couple of important details about the Kinect. The user has to be about two feet away from the Kinect or else the device will not be able to accurately determine their depth. This occurs because the Kinect uses an infrared painter to cast rays into the environment which the camera picks up, and objects that are too close to the Kinect get hit with too many rays that they get washed out. There is also a depth "shadow" that gets cast such that when one object is in front of another, the device cannot determine the depth of the background object around the left border of the foreground object.

I don't think that there should be any significant future problem based on these artifacts because I only need to track the hands with precision (and they should be in front of everything), and get a general positioning of the head (I don't need facial features or any specific data, just a generalized head position) so even if the player's hands are obscuring their face I shouldn't have trouble gathering the information I need.

It seems like working with the LibFreenect library moving forward should be fairly straight forward. The library provides a series of c-style function calls to the Kinect device, and I think that wrapping a class around these should allow me to integrate the library into my application in a suitably abstract fashion.

Next week I'll be researching the hand/head recognition algorithms/software that I want to use, and getting something resembling the above demo working using DirectX for rendering.

Friday, January 21, 2011

Senior Project Kickoff Post

This blog will be detailing the progress of my senior design capstone project. I will be completing the project over the course of the spring 2011 semester.

My project is the creation of virtual building block experience utilzing the Microsoft Kinect hardware system; here is the abstract:

I plan to simulate a building block experience using the Microsoft Kinect hardware system. I will use the Kinect to track the player’s hands and head in order to present them with objects that appear to project out of their screens that they can pick up and move around as if they were real. The objects will be given real world physical properties to create a realistic building block experience, where unstable structures crumble, and improperly placed pieces can bring the entire structure crashing down. The objects will also be given interesting shapes to encourage user creativity and experimentation. The project will use the Libfreenect open source library to extract information from and send orders to the Kinect device and graphics output will be handled by the Microsoft DirectX libraries.

My anticipated work for next week:

Complete background research

Research face/head recognition algorithms
Research hand recognition algorithms
Determine whether to use existing physics system or roll my own

Begin framework setup integrating DirectX at least
Figure out how I'm going to obtain a Kinect :)