In the past few weeks, I have taken a whirlwind tour through the worlds of 3D scanning & position/orientation sensing. The product of this study is still a work in progress.
Here is my project concept statement:
If Photorealism is only one of many possible styles of painting, then the “realism” of our natural vision must similarly be one of many possible modes of realism. There is no reason to suspect that our natural vision presents the world as it really is. We are abstractly aware of this and yet are limited in our ability to experience other modes. It is the goal of this project to enable the user to experience a physical environment in a visual style which differs from his or her natural mode of vision. This is achieved through the use of a 3D scanner, which images a physical space and provides a software system with a cloud of three-dimensional coordinates. The software uses geometric analysis to interpret these points into an alternate view of the space – a view which may be described as a blocky, Lego-like version of the physical environment. This alternate view is then sent to the user’s head-mounted display. A 3D positioning and orientation sensor is used to locate the user within the depicted space. This system is intended for use in an interactive cinema experience, entitled “Parallax Digitalis,” which will use prerecorded and interactive sequences to convey the story of a young man entrapped by the false notion that he is a deity. The current implementation of this project serves as a proof-of-concept for my longtime theoretical exploration of the possible uses of descriptive geometry in the telling of cinematic narratives.
Here are the results so far:
And the steps taken to reach these results… (continue on next page)
Let’s start from the beginning.
There are a number of methods for scanning a 3D object. I began by trying to think through what I believe would be a new approach. The method I devised works geometrically, but I do not think there is any practically advantageous way to implement it in an actual environment. The method involves comparing orthographic and perspective projection viewing models of an environment and so its implementation would perhaps require either an orthographic lens or some sort of laser XY machine. Even so, the scanned environment would have to remain still for a long period of time and extensive image analysis would be required.
So I moved onto the development of a laser scanner, which uses a camera to measure the displacement of the laser line across the contour of an object. The laser is pointed at a mirror, which is attached to a servo motor. The laser beam is directed to sweep horizontally across the room, taking an image of a single vertical contour at each interval in the sweep.
This device worked well for a single contour slice when the beam sweep was locked at 0deg. But the requirement of synchronizing the serial read of the laser sweep angle with the taking of a photograph to a necessarily exacting level of precision made the device quite inaccurate in scanning a room. Additionally, this method requires the scene to remain still for a fair duration. Nevertheless, I believe it will be worthwhile to continue pursuing this method and I plan to build another version of it.
Here is the scan of a lamp. The red image was taken with the laser angle locked. The blue image is a software interpolation of the depth axis slice which formed it.
The measured displacement method employed in my 3D laser scanner is somewhat similar in concept to a technique called Structured Light scanning, which a camera to measure the deformation of a series of bands projected onto the object to be scanned. The horizontal bands are generated by a sine wave. In the method I employed, the projected sine wave’s phase is shifted with each of three photographic captures. The software I used to analyze the images is the only code used in this project that I did not write from scratch. I modified a version of the code available here. Here is a screenshot of the 3D model of my first successful structured light scan. The scan is of a mannequin.
Getting a working scan with this method can be a bit tricky. For one thing, the above implementation uses a flood-fill algorithm which propagates from the center of the image. Therefore, if the scan does not start at an active pixel, the fill may produce an incoherent image. Of greater concern is that if the object moves from one photograph to the next, the scan will be completely incoherent. This is because the analysis is based in the comparison of each particular pixel in the matrix across the three out-of-phase projections. So if an object moves, the pixels in each phase image do no correspond to the same point. With practice, I managed to improve my ability to take a viable scan.
My best scan is of this bust:
This detail of this scan was quite high (27,000 points) and very accurate. Though the scene must be extremely still for the duration of three photographic impressions, this method is extremely viable and could most likely be miniaturized or packaged. I have recently seen cheap ($100) mini digital projectors. I would like to try this technique with those. Further improvement of the analysis software would be required. My intention for this project was to do real-time scanning and analysis of a physical space. Unfortunately, the technical restrictions made this impossible for the time being. In furthering this project, that will be one of my main goals. As you can see in this video, it is possible to do real-time structured light scanning. However, there are a great number of unsightly artifacts in those scans. Getting a completely clean scan is quite difficult and may require some post-processing along the lines described in the section below to detect and remove artifacts. In this version of the project, I wrote limited artifact cleaning procedures and focused the majority of my volumetric analysis work on the artistic stylization of the 3D model.
Once the physical scene has been scanned, I perform a series of volumetric procedures on the data in order to stylize it. In this version of the project, my stylistic design principle was to make the 3D objects look like they are made of cubic building blocks or Legos. I achieved this through a procedure which divided the scene into cubic subregions and then hashed each point in the scene into the subregion which contained it. If a certain number of points fall within a particular subregion, that subregion becomes “active.” By changing the scale of each subregion or by filtering the display of active subregions by the number of other active ones they touch, different levels of structural decomposition are imposed upon the model. If a region is touched on all sides by other actives, then it is encapsulated and can therefore be discarded. If only three of a region’s sides are touched, it may be considered a corner. If four, an edge. If five, a face. Determining whether the touched faces are adjacent or parallel to one another allows further pruning.
Here are a few of the possibilities under these parameters:
In line with the concept statement above, it was my goal to create various decompositions of a physical space. The series immediately above was inspired in part by these Matisse sculptures:
In future versions of this software, it might be interesting to incorporate physical dynamics into the model so that, for instance, the building blocks will sway or fall if they are not stable.
Position & Orientation
Once the Volumetric Stylizer has finished processing the scene, it should prepare it for a head-mounted display (HMD). To properly situate the user within the scanned space, the software must know the position and orientation of the HMD. This is a difficult thing to calculate with any accuracy and so I was fortunate to have access to a Flock of Birds device, which uses a magnetic field sensor to calculate position and orientation in relation to a magnetic receiver. I was less fortunate in getting the Flock to work. The Flock of Birds is from the mid-’90s era of Virtual Reality and looks something like this:
After many hours of reading the manual, programming and testing the device, I briefly managed to get a stream of garbage data or perhaps just signal artifacts. Having graphed this data, I am inclined to say the data was not misinterpreted position/orientation data and was at best some passive data leakage from the flock. Whatever this data was, it lasted for just one night. The next morning, I brought the equipment to school for a meeting with Dano, but was not able to get any signal from it whatsoever. There is some chance the device is broken. There is a greater chance that the problem was that my “call” signals to the device were malformed. There is the greatest chance that the problem was caused by something I didn’t think to try correcting. In any case, I learned some interesting serial communication and bit-shifting concepts in the process. But without the Flock of Birds, I was a bit out of luck for the HMD position/orientation portion of the project.
Knowing that an accelerometer would not do the trick and with time running out, I decided to get as far as I could in writing image analysis software that could find the user’s position/orientation with respect to a camera through the use of some identifying marker. In my first attempts, I used a panel with three color LEDs, which were stationed to form a right angle. In a dark room, this method worked somewhat reliably. Partly due to the brightness of the LEDs, the color of each was difficult to isolate. So, I decided to switch to non-illuminated markers. In the last 24 hours, I wrote some software to do this analysis, which would have taken me weeks to write at the beginning of the semester. The basic idea is that a camera stream is analyzed to find the region of the screen that has the greatest number of drastic pixel-to-pixel brightness shifts. Once this region is identified, its edges are calculated. As you see in the image below, a high brightness-shift pattern is inscribed within a square on an index card and is placed on the front of the HMD.
The software identifies this square and then finds its bounding lines (represented in blue):
Though blob detection libraries such as openCV are relatively easy to integrate, I decided to write all of my marker analysis software from scratch in an effort to learn as much about the subject as possible. As of tonight, what you see above is the current state of this software. I’m still working towards the full implementation of the function that is the marker’s actual purpose: to analyze the perspectival contour of the marker square in order to determine its position and orientation in relation to the camera. This can be achieved with projective geometry by orienting a plane in three-dimensional space which satisfies the empirical conditions of the marker. For example, if the marker is rotated along the depth axis in relation to the camera, its two-dimensional representation will have non-equal sides. A recursive backtracking algorithm may be applied which solves the plane’s 3D orientation such that its perspectival appearance matches the 2D image while its contour in the 3D projection is equivalent to the physical marker.
As a temporary measure, I have implemented a functional LED tracking system. The LEDs are attached to the front of the HMD, which is situated in front of a webcam. The user’s head movements currently control X- and Y- Axis pan as well as Z-Axis (depth) tracking. The visualizer is set to recalculate the number of “building blocks” in relation to the user’s Z-Axis position. The further the user is from the webcam, the more blocky the scene appears, representing a sort of focus. Here is a screenshot from the current version:
Final Notes & Plans for Continuation
Over the winter break, I will be working to integrate AR markers into the system for a higher level of position/orientation sensing. After researching a number of approaches, I believe AR markers will almost certainly prove to be the best all-around solution for positional sensing because of their extremely low cost and high accuracy. My second version of a 3D laser scanner is nearly complete. More on that to come. Though the system is still not a fully integrated one, I believe I can reach that goal by the start of next semester. By working with several approaches to each of the constituent elements of this project, I learned a great deal about the aesthetic, technical and monetary challenges facing the implementation of a project of this kind. This process has enabled me to make informed choices in designing a 2.0 implementation. In general, I like to approach a problem more or less from scratch, so that as I move towards more advanced and artistic questions in the implementation, I have a full grasp on the underlying technical structures, which play a huge role in determining the artistic outcome of the project. For this reason, I wrote all of my software for this project from scratch (with the single exception of Processing’s video frame capture library). I will be rewriting my software in Objective-C++ with OpenFrameworks for greater performance and developing a UI which integrates with my DEcomp software suite, which I was developing before coming to ITP. Please check back for source code, data sets, etc.