Machine Vision 2004 course project

1. What was planned

1.1. Project Description

The design and implementation of a stereo vision system capable of determining the 3D-coordinates of a red laser dot within the field of vision.

The system has two cameras in a fixed known position relative to each other. From both images the system will identify if there is a red laser dot visible. A red laser is selected because it seems to stand out very well, at least with human perception. The system will then determine the coordinates of the dot in both images and calculate an estimate of the 3D-coordinates of the dot, knowing the camera configuration.

The system will utilize two standard webcams as cameras, connected via USB to a x86 Linux PC. The final program will be coded in C++ and it should be near real time.

1.2. Project Phases

  1. Build image capturing framework and develop automatic white balance/exposure level adjustment.
  2. Capture images with and without the laser dot in fov for developing the needed image transforms.
  3. With Matlab, develop a method for finding the coordinates of the laser dot.
  4. Code the method in C++.
  5. Develop the maths required to calculate 3D-coordinates from two images, code it in C++.
  6. Tests, program and documentation finalization..

2. What really happened

2.1. Timeline

I started the project in time, bought the cameras and started coding the image capturing and processing framework in C++. That lasted for a while, but then I had to finish my report for my Information Technology Project GMMBayes. It took two months.

After that I continued coding and got the framework quite complete during my summer vacation. So now I am in the point where the real project would start, but the deadline was yesterday (15.7.2004, postponed twice by a month) and I should start doing real work. I will stop working on this project because I really have no time. If someone, perhaps on the next Machine Vision course, would like to continue my work, I would be happy to assist.

2.2. Software

The software is called Doublesight, it is written in C++ and it has the following requirements for compilation and use:

PWC Linux driver
This software is driver specific and it uses a header from the PWC driver sources. I used the 8.x and 9 beta versions of the driver. Of course, you will need cameras to go with this driver. The hardware specific part of the code can be removed, but it will require some work.
Gnu readline library
The Gnu readline library is used to read commands from the user.
Gnu Common C++ 2 library
The Gnu Common C++ 2 library provides many helpful things, e.g., threading abstraction, mutexes and configuration file handling. Note that I used the 1.1.0 version of the library, previous versions are not compatible.
C++ compiler
This software was mainly developed with GCC, though I did test it once with the Intel C++ Compiler for Linux version 7.1.006 at some point. Probably it will not compile with icc anymore.
Gnu Make
For the Makefile. Link.

Doublesight has its own command line interface, rather than taking command line parameters it presents its own command prompt. The core of Doublesight is the Controller class, which has its own thread of execution. The user interface runs in the main thread and sends commands to the Controller. This way the user interface will be more responsive.

Theoretically Doublesight can handle any number of cameras, but in practice I found that I could not plug more than one camera to a single USB bus. This is probably due to hardware problems. Images are captured in the Controller thread.

Doublesight has certain image handling components called ViewProcessors, which derive from the ViewProcessor abstract class. Any number of ViewProcessors can be used and they are executed in the Controller thread. For a single shot, images from all configured cameras are captured, labelled with the camera name, and sent as a vector to each active ViewProcessor. ViewProcessors can do whatever with the raw image data which is in YUV420P format, but they should not change the original image.

(On the other hand, you could create filtering processors that do change the original image. Since the ViewProcessors are stored in a list, they are executed one by one in the order they were added, so you could create determinisic filter chains and still use even the original ViewProcessors. The filters could not reallocate the bitmap memory area, but they could even change the color format. Of course, the current ViewProcessors expect always to get YUV420P, so the results might be quite funny.)

An abstract class called WindowedView is derived from the ViewProcessor class. WindowedView uses the TXWindow class to offer image display capabilities by creating an X window for each image (camera) in the vector. TXWindow class is threaded and sustains the image once it has been set. All current ViewProcessors are derived from WindowedView, but for the 3D-application WindowedView class' interface is not suitable.

The currently implemented ViewProcessors are:

LuminanceView
Shows a gray scale image from the Y-channel (luminance) directly.
LumiHistView
Show a luminance histogram.
LumiThreshView
Show a binary luminance image according to a fixed threshold.
ColorView
(Available only in the GPL-licenced package.)
Shows the true color image.
ChrominanceView
(Available only in the GPL-licenced package.)
Shows only the color information, luminance is ignored. ChrominanceView discards the Y-channel by using a constant luminance value in the color space conversion from YUV to RGB. The RGB triplets are then scaled so that the greatest component is saturated.

A UML class diagram is here (34kiB).

2.3. Downloads

3. Images

Click to enlarge

3.1. Color images taken with Xawtv

webcam shot 1 webcam shot 2 webcam shot 3 webcam shot 4 webcam shot 5 webcam shot 6 webcam shot 7

Particulaly in pictures 4 and 5 you can see effect of the automatic exposure algorithm of the webcam. Picture 4 is taken "normally" and it is heavily over-exposed. Picture 5 is taken after the camera looked directly into an ultrabright white led. The autoexpose seems to "lag" badly. Also see the difference between pictures 1 and 7. Picture 2 is taken with a sun shade on :-)

3.2. Images taken with my own software

Luminance image This is the Y-channel (luminance) directly from the camera. The small bright spot in the picture is the red laser dot. The picture is taken in normal indoor fluorescent lighting and the camera shutter setting (exposure) is hand-fixed.

3.3. Screen captures

screen capture 1 Screen capture 1, 699kiB, 1280x960px.
This image shows the four ViewProcessors I have implemented till 14.7.2004. The top left window shows the true color image (YUV converted to RGB, ColorView), and the top right window shows the gray scale image (LuminanceView, Y-channel). The histogram window shows the luminance histogram (LumiHistView). The fourth window is ChrominanceView, where the Y-channel is discarded, the color information of the U- and V-channels is converted to RGB-space and then each RGB-triplet is scaled so that the greatest value is saturated (255).
If you wonder about the red dot in my neck, it is the laser pointer :-)
screen capture 2 Screen capture 2, 719kiB, 1280x960px.
Another view with the same setup as in Screen capture 1.
screen capture 3 Screen capture 3, 824kiB, 1280x960px.
True color, chrominance and histogram views for two cameras at the same time. The frame rate is about 1-3 fps, but it is not processor bound as the program takes only 40% of CPU time in 1100MHz AMD Athlon machine.
screen capture 4 Screen capture 4, 901kiB, 1280x960px.
Thresholded luminance, true color and chrominance views showing the laser dot in good conditions.

4. Where to go from here

I had an idea of first finding bright spots in the luminance image by thresholding and segmented area size (and shape) selection. Then the found dots' chrominance values would be checked to see is it really is a red dot. This should have located the red laser pointer. The reason why I would not do this in RGB color space is that it would require a conversion from YUV space, and in the YUV space the brightness (luminance, Y-channel) and color (chrominances, U- an V-channel) are already separated.

I think the screen capture 4 shows the potential of my approach. You can clearly see, that the laser dot is found in the thresholded image as a nice dot, and it is the only dot-like area that is red according to the chrominance view.

I did have trouble adjusting the cameras. I found no way to manually set the exposure time without losing all color information. That means the system cannot stand bright environments because of over-exposure, unless the cameras are fooled with a bright lamp first or you could try sun shaders or other filter optics. Even then, you have to remember that these cameras are toys and you cannot expect too much from them.

Information on 3-dimensional optical measurement can be found in the Master's Thesis by Kari Jyrkinen (not yet published, contact me for more information).

News

19.7.2004

Released the two source code packages.

16.7.2004

Finishing the documentation.

14.7.2004

The ChrominanceView got a lot better when I decided to scale the RGB-values, so that the greatest value is always 255. When I discard the Y-channel I still need some value for Y in the color conversion. I made an image (97kiB, 320x1200px) of the effect.

12.7.2004

After about 3300 lines of C++ code I am finally getting to the image processing. LuminanceView (Y-channel) is already working and I can get gray scale "live" images from the two webcams. This is the first time the real intended engine structure of the program is working. Next I am going to implement ColorView (RGB-color) and ChrominanceView (UV-channels converted to RGB, constant luminance). After those, some nice luminance and chrominance histogram views so that I can start developing to the ultimate goal: locating the laser dot.

The program has a command line user interface through which I can set the camera parameters, take single shots or set it to run (in a separate thread) taking shots by itself and load ViewProcessors like the LuminanceView. All X-windows have their own threads and the UI has it's own thread (the main thread, actually), so the user interfaces should be quite responsive.

If you are interested about the source code, I will make it available some day under some free licence.

3.7.2004

I finally configured a cam at home. I have linux 2.6.7 and the 8.x versions of the driver are out of the question. Fortunately the version 9 beta 2 seems to work fine. It's been a long since I worked on this project, but now I'm coding again.

27.4.2004

Yes! Even though the cam&drivers only support capturing YUV420P format images, I've managed to read the components from raw file I produced into Matlab and visualize them. My own code can now capture images and adjust all video4linux and Philips specific parameters. Just as I thought, the camera overexposes the picture too easily. When I set the shutter speed manually I can get the laser dot clearly visible in indoor lighting.

13.4.2004

I have finally got the webcams. I hooked one up on my workstation at work. After recompiling (and updating) the kernel I got the Linux driver working and got live image out of the cam at 5 fps with Xawtv. After fetching the latest pwcx decompression kernel module version 8.4 I think I got also VGA resolution image. The camera has some some problems adjusting to the light level, it tends to over expose. I hope I get that somehow managed when I start coding my own program for it.

UPDATE: I got 15 fps with VGA resolution, not bad.

28.3.2004

I have decided to order two Logitech QuickCam Pro 4000 webcams from Verkkokauppa.com (code 6171, price 80.90€ each). It has a 640x480 pixel CCD sensor and presumably good image quality. It should be supported under Linux according to "Linux support for Philips USB webcams" -page.

22.3.2004

The project proposition has been accepted. Task 14 in the Practical Assingment -page of the course.