Hi everyone!
Some of you may remember me as the intern who was super interested in computer vision. I recently joined the OpenROV team full-time to help @badevguru @charlesdc with the OpenROV software, but I have also been working in my free time on a variety of computer vision projects.
Specifically, I have brought this amazing thread back to life by helping some of the community members out with structure from motion compute rigs and software. Actually, if you peek in that thread, you can see that I recently posted a long list of open-source photogrammetry and structure from motion software. Well, compiling that list got me very excited about the possibility of creating 3D meshes from my photos, so I had to try it out! Here are my results.
Caveat: While VisualSFM is open-source and free, I am recommending that you use it for personal use only. I am not a lawyer, but last time I checked VisualSFM is for personal, non-commercial use only.
I started off by borrowing a camera from someone in the lab taking around 60 images of a 2.8 ROV sitting on a table from various angles. Here is an example of one of the images:
Then, I followed the instructions here for installing VisualSFM. I am using Ubuntu. Here is the result from the command lsb_release -a
:
Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial
And the results of the command uname -r
4.4.0-45-generic
If your computer is similar to mine, I can definitely help you with installation! The installation steps detailed in the link above are thorough, though, so I doubt you will have trouble installing. If you have any questions, please feel free to ping me.
After installing VisualSFM and its dependencies, I followed this blog post on how to setup a usable VisualSFM workflow. I highly recommend reading through this post; it is very useful and even provides some interesting technical details.
One warning, though. I have a fairly decent computing platform with a Nvidia GTX 1080 GPU. Structure from motion is computationally expensive, and you will need a decent rig to compute these reconstructions in a reasonable timeframe. Moreover, structure from motion projects seem to be typically bounded by memory, so make sure to have plenty of RAM. Please take a look at my post here that goes into more detail.
I then imported the dense 3D reconstruction into Meshlab where I was able to view and manipulate the point cloud. Meshlab is super cool. For example, here is a screenshot showing the computed pose of the camera relative to the ROV. As you can see, I didn’t do a very good job of photographing the top of the ROV (I never claimed to be a professional!)
And here is the ROV up close and personal, with Meshlab’s UI elements.
Notice in the above image that some of the reflections were captured as surfaces in the reconstruction. Luckily, Meshlab provides a suite of tools to clean up these “mistakes”. Here is a picture of the ROV cleaned up a bit.
Pretty cool, right? All things considered, it took me around an hour and a half from placing the ROV on the table and generating the above image. Not too bad for free software. Here are some more angles of the ROV:
Some takeaways from this evening of fun:
-
Increasing the amount of pictures in your data set will increase the fidelity of your mesh but also increase the computational time required to generate the dense reconstruction. Focusing on overlap between shots as well as being sure to take photos from multiple angles is important. With that being said, 60 images is not a very large data set and you should probably aim for more. The larger the scene or object you are reconstructing, the more pictures you will need to take.
-
Regarding the “holes” in the mesh: from a technical standpoint, reconstructing single color objects that have little texture (like a wall or side of an ROV) are extremely difficult for any feature matching/detection algorithm. As an aside, it would be interesting to see what effect, if any, filling the scene with feature-full symbols or stickers would have on the reconstruction.
-
If you get overzealous, you will almost certainly run out of RAM at some point in the pipeline. I know I did.
That’s about it! This was a super fun digression that turned out to be very informative. I suspect I will get a lot of questions on the topic of “how do we fill in the holes that appear when attempting to reconstruct a uniform surface”, and I think an interesting approach would be any one of the current SLAM methods. It would be interesting to see if the inherit temporal nature of SLAM algorithms (coupled with IMU data) would fare with this problem. I also found this paper, High-quality Depth from Uncalibrated Small Motion Clip which could also work very well (it is worth the read, I promise!).
!BONUS PIC ALERT!
I thought this was really cool. See if you can guess the name of the ROV I photographed!