Structure from Motion for documenting for Citizen / Community Science and Marine Archaeology



I second that, fantastic


Hello Scott,
Thank you again for this valuable information.

I have 2 more questions:

  1. What image resolution have you been using?
  2. Have you been using a filter on the lens? Does it have an impact on the resulting model?

I will be testing this technique in Greece in 10 days to try to rebuild a model of an archaeological site, the only problem is that I have a limited time and limited space on the SD card to try all possible combinations :smile:


Hi Achraf

I have found that the higher the resolution the better, I have used a GoPro Hero3 Black and a Hero4 Silver (both of which have a 16MP resolution @ 1 frame/second) I have also tried lower resolution to only get poor or no results (eg taking stills from HD 1080P video [which is basically only 2MP]). So generally I would say the more resolution the better. With a 32 Gig card in the camera a 16MP image is approx 2MB so that’s round figures 4 hours @1 frame/second should be heaps for any days recording (and you can empty the card at night and start again)

For some reason I have found the old GoPro Hero3 Black has given better results than the newer camera. With the Hero4 Silver I seem to get a lot more motion blur - so much so I have been thinking of reformatting the camera back to factory defaults. However, it may also be because the wrecks have been deeper with less ambient light.

I have not used any filters or anything else it’s been an off the shelf unit either mounted on the ROV/mounted on a dive scooter or just carried as a diver.

As Stretch suggests the more “trials” you can do before you’re forced into a time/place sensitive situation to iron out bugs the better (eg being on an expedition ps Remember post your progress over at OpenExplorer). I had more good luck than planning on my first attempt (I took the images about 12 months before I mastered or even knew of the software as I knew I would only get one chance to dive on a wreck that had to have the harbour master redirect all the shipping around us and just happened to get enough over coverage)

Finally Achraf good luck on your Greek archaeological site, I hope it goes well I also know a few people who have been using this technique on a few other sites over there



Here’s a very simple test I conducted of a piling next to where my boat is moored. I used a Sony Action HD camera and Deep Trekker ROV. The sample is roughly 0.7m x 1.5m in size. I only did one side of the piling for the test to get a feel for the process. It took a couple of tries before I found the “sweet spot” that resulted in an actual model. A couple things I learned: 1. You can’t take too many photos 2. Lighting and visibility are critical 3. Your photos not only have to be closely spaced but also from every conceivable angle. 4. Be patient, this stuff doesn’t process very quickly.


Hi Stretch

Fantastic, how about a few more details what software? what resolution still or captured from video? how many shots?


@Scott_W - Post edited. I spent the better part of the day trying every possible setting with every piece of software I could get my hands on. I settled on the highest resolution the Sony camera could handle and some software I hacked together. Since the fastest the camera would take stills was something like once every 10 seconds, I had to make several passes and go very slowly (the slight current made this even more difficult).


I plan to try this out won the glass sponge reef. Obviously I will have to start small.
I might try using a raspberry pi camera since they are cheap and I am able to pot them to go 200+ m. The raspi cam is only 5 mega pixel. Will that be a major drawback? It can do a reasonably fast frame rate, so if I don’t use a very wide angle lens, it would increase the effective pixel density.
It can only use one raspi cam at a time, but you could use 4, or maybe more, usb web cams on one raspberry pi to get a wide spread or multiple angles all at once. That could get expensive, but possibly still less that one gopro…

What do you think?


Good to hear you interested Darcy, it will be interesting to see some results from your deep reefs

FYI I just brought a Xiaomi Yi Sport Camera today (sub $100) to look at how they will go (thinking about multiples down the track and I thought I would give one a try 1st) and they say they are good to around 60m (not good enough for what you are chasing)

I also had been tossing around some of the raspberry pi style solutions and the 5MP has held me back but I think it will be OK, Either just run closer to the bottom with more images as you suggest or live with lesser total resolution. How much resolution do you really need for a complete wreck?? 5MP @2meters above the bottom is 1.6mm per pixel

I don’t think super fast frames will help much as long as you get a good image without motion blur and with a fair amount of over coverage it should be fine

Have a look at the Pi cameras and their Field of view and the expected distance from the bottom, this can help in the decision making process and consider using multi cameras spaced say 300-600mm apart with still say 60-70% over coverage

I would also consider the possibility to implement Stereo Depth Perception on the Raspberry Pi (think bottom lock rather than depth lock)

GoPro underwater Structure from Motion (SfM) / Photogrammetry

Thanks for the links! Very interesting work.


Here is a model of a shipwreck in Lake Tahoe. I used a downward facing GoPro mounted to the OpenROV to take a photos twice a second. It was very difficult to make straight transects from multiple heights , which is probably why the model quality is poor. Originally, I used Visual Structure from Motion, this model was generated with Autodesk ReCap 360. I plan to try modeling this wreck again within the next few weeks.


@Carey_McCachern were you using an IMU for this collection? I’ve had pretty good experiences with my ROVs keeping a proper heading and depth when the IMU was installed.


Good to hear of your results @Carey_McCachern

The software shouldn’t worry too much about the straight transects

The transits are more for you so you know if you have had good coverage or not. I have tended to run along wrecks stepping a bit across each pass and then finish with a few zig zags across the wreck

If you had trouble stitching the images together how close were you to the wreck? At a guess I would have tried to be about 1.5-2m above the structure (dependent on water clarity).

Did you end up trying to stitch the full 10,000 images or just cull them down (getting rid of the head and tail of the dive) and take every ?4? image? :confused:

No matter what fantastic that you are giveing it a go and at least getting some sort of result

Hope you continue at it :thumbsup:



I did have an IMU on my ROV for the dive. Heading hold was very helpful, but my depth sensor was not functioning. I was also testing a couple of hardware and software changes that made my video choppy, and my e-tube fogged up. I should be able to better results next time by fixing these problems so that the ROV is easier to fly.


I manually chose the best images to stitch together. I had several thousand images that were either completely blue or were of the bottom of the boat so I tried to pick clear images from every angle. I’m still playing with how to get the best model. I think one advantage of ReCap 360 is the ability to manually stitch images together. It seems like I have decent coverage of the boat, but a larger group of images often makes the model worse. However, I get better results by manually stitching these images together.

I’ll post any new developments that I make.


This is wicked cool. Nice work.


Hi everybody,
I stumbled into this great topic thanks to @Scott_W, and i’ll share our experience using SfM + Meshlab for 3D reconstruction of transects, in shallow water. I hope it will be of use to anyone studying rocky and coral reefs. Eventually we will be testing our setup into some wreck sites (at least, i hope so!)

For the image acquisition we compared still pictures and videos for 3D model. When working with 50 meters transects, we have attained robust results even using HD 1080P video from Intova Sport HD Edge camera; but we had mixed results with JVC Everio GZ-HD3 (1440 x 1080 i). In both cases, the video capture was performed diving at 1.5 meters from the surface. The average video length was of 5:30 minutes (thus giving a speed of 0.15 meter per second)

Later, each video was “stripped” into still frames using ffmpeg or avconv command line tool. We decided to use a frame rate of 6 fps, while preserving aspect ratio and picture quality. At this point we have employed two strategies:

  • Generating 3D model from 2000 to 3000 frames using VisualSFM, using the internal undistortion process based “automatic calibration” and camera model. VisualSFM was configured to used “Shared Calibration” and “Radial Distortion”
  • Undistorting externally, using OpenCV camera model, and extracting camera parameters with the classic chessboard method. Later, those frames were fed into VisualSFM, with “Radial Distortion” disabled as suggests VisualSFM author (Chang Chan Wu)

In both cases, due to model nature, we used sequential matching (about 25 to 30 frames) thus reducing computing cost. This should remain valid for transect videos, whoever for lawnmower pattern won’t hold true.

In almost all the cases, VisualSFM (with enough picture overlapping) generates a single model. In case of poor matching, we extend the image matching range and the number of desired control points (SIFT). The matching quality can be easily verified in the “View match matrix” option. At this point, we proceed to generate the dense model (PMVS+CMVS) to be later exported as a PLY file.

For the postprocessing we have been using Meshlab, mainly for noise removal, explicit outlying point deletion, and surface reconstruction as mesh layers. We have tested the classic Poisson reconstruction filter, and the Ball Pivoting algorithm… the choice depends of what kind of output is desired. The mileage may differ, mainly to surface complexity.

The final step is to reproject each single camera+picture pair onto the mesh surface, as any texture transfer algorithm can do. Many of the steps here described can be found in other sites such as

The main drawback observed with this setup, is the appearance of radial deformation of the resulting 3D transect model, as described by C.C. Wu for critical configurations for radial distortion auto-calibration at . We are exploring the possibility of using GIS tools for GPS data inclusion in order to correct this noticeable distortion.

If anybody is interested, we can share further details and implementations (they are basically tailored implementations of OpenCV algorithms)

J. Cappelletto



First up thanks for sharing your experiences so, we can all learn from each other

A couple of questions and comments about your project

From your comments

I assume you are studying the health of your Venezuelan Reefs and are using the SfM for base line studies (species counts and / or comparative analysis of the same location over time). How are you finding the reef health over there?

Seeing an image you have achieved would be great as well

Good to hear you have been using

I have been attacking it from a brute force approach but due to larger and larger data sets I have been thinking of splitting it up into chunks to save computational time. I might have to give this a go and see how it goes

You mention using captured video frames 1440 x 1080 [1.5MP]

I would suggest a comparative trial of a model from your 6fps 1440 x 1080 [1.5MP] images compared with a model from the stills @ 3840 x 2880 [11MP] at 1 sec stills at the same transit speed (of 0.15 meter per second) as I believe you may achieve better results (PS I have typically also been using faster transit speeds of around just under 1m/sec)

No matter what thanks for telling how you have found the technique and wishing you success



Thanks for your wishes and comments @Scott!

Indeed, we have been using SfM and photomosaics as part of a ZSL funded project for base line studies in Archipielago Los Roques National Park, covering a couple of endangered coral species (Dendrogyra cilindrica and Acropora cervicornis). This project is part of a PhD and a MSc thesis at the University.

I’m not an expert on coral reefs (actually i’m the engineer of the team, i reconstruct models, and they bring into the field hehe), but it can be said that the reef health could be better. If you are interested, i can put you in contact with the experts on this field working in the project.

About the pictures, i promise i’ll be posting a couple of images for the mosaics and the sparse/dense SfM models for the next week.

Finally, next field trip i will test your suggestion about the Intova setup. Our first trials with 11MP had too much noise, even under standard natural light conditions, thus rendering the pictures useless for an acceptable mosaic or 3D model. I think it was due to ISO settings at that moment.
A drawback i found with the still pictures approach, is that it cannot be guaranteed the correct focus for each picture. When using videos as input, you always have the option to pick enough frames, removing those with focus/motion blur problem. We are automating the “focused frame” extraction based on image focus quality estimation for each frame, reducing operator intervention. Once it works, i could share it.

J. Cappelletto


Hi again everybody,
As promised, i’m sharing a couple of 3D models obtained from videos and pictures.
The first one is a spare reconstruction for a 10 meters transect at Los Roques, from a Intova video. Here we are aiming to obtain a general idea for the substrate; there is still work in progress.

The second model correspond to an Orbicella faveolata coral colony, generated from 9 pictures taken while diving in Chuspa, Venezuela. The dense model was generated in VisualSFM, with meshing and texture reprojection in Meshlab.

Hope you enjoy it,
Once we finish our pending work we will be posting other samples.



This is great, Jose. I can’t believe you got that kind of resolution from just 9 photos.


Thanks @David_Lang
Curiously, with bigger models (as the 50mt transect) the quality is far lower. There is an important dependence to the noise level in video / pictures employed as inputs. Also, if there is a “closure” in the input sequence (last frame matching with the first one), it can be ensured a correct reconstruction.
I’m experiencing some distortion problems for large linear models; opinions and suggestions are welcomed!