While many might be thinking of security as OS, or tools, not many consider the data released in terms of metadata by all users.
True; today it is unlikely that a picture downloaded from Facebook contains the GPS data of a user. Question is, is the picture we upload free from sensitive data?
The typical response from a user would be – I don’t really mind, I have nothing to hide and can’t see how metadata can be used against me. Truth is, a lot of information can be used against a user, and every track left online can help profiling a person. Some sort of digital ”dumpster diving” made by companies having access to some of the tracks we leave behind.
Once the hints are collected, it is possible to propose the best product to a user, statistically predict his or her actions (to a certain degree, of course), or even have access to parts of private life.
Going back now to the pictures, today it is very common to insert unrelated data on each picture, the data is part of the image “EXIF”, often organized (such as in JPEG case) through tags, where data is structured in tags and can be easily retrieved from any bigdata software.
I wrote a sample website to verify the metadata of JPEG pictures. Amongs the normal EXIF data, there is often vendor information (H927c) adding to the data with sometimes even more sensitive information.
So, what kind of sensitive information can we find?
- Type of camera
- Serial number
- Location at the time of shooting (GPS)
- In some cases private information (user name, email)
- Image shooting date
There are many reasons why these information could be dangerous. I can try to think of a few…
- Register or retrieve information of a machine (through social engineering) given the serial number
- Once your name, location and camera is known to another person, how easy could it be to replicate? How would it be considered during a criminal investigation?
- A serial number, if linked to a person, could be used in broad online searches that can dig deeply in a person’s life. Home location, preferences, profiling. What would happen if one day this information is used for employment or mortgages?
Of course, some of those hypotheses are dramatized, but are intentionally thought provoking.
So let’s put theory to practice and open up our trusted linux box. We identified a site that contains images without a concealed EXIF.
First thing to do is to mirror the website to a local directory:
wget -nd -r -P ./saveIMG -A jpeg,jpg www.sitetobeusedasanexample.com
This will save all jpegs on our ./saveIMG folder. Now it is time to do retrieve the juicy information once in saveIMG:
for i in $(ls *.jpg); do exiftool $i -a -G1 -H; done
Once all the data is extracted, it can be grouped and saved – or fed to a big database
So, next time we upload a picture somewhere, it might be good to strip them of some of our data – since we never know how it can be used!
exiftool -all= *.jpg