In April 2017, I wore a low-quality body camera and a GPS logger while taking a short roadtrip through the American Southwest. Despite the brevity of the experience, the number of images captured was overwhelming. Geographical statistics from the GPS (distance between photos, etc) were insufficient for making sense of the data, so I attempted to use a poorly trained neural network captioning algorithm.
While an isolated image caption is frequently boring and incorrect, they can also be evocative, relateable, and technically correct. In aggregate, these captions also provide a fascinating index into patterns in the photoset. For example, a search for clock reveals periods of time spent driving at night, and a search for pizza finds meals. This bulk approach to using image captioning models also raises questions about patterns in the model training dataset. Does its affinity for skateboards stem from the skewed camera angle of the body cam images?
I also tried another, simpler, method of visualizing these photographs. Warning: it is slow to load.
Return to subject.space.