Sunday, February 14, 2010

Metadata analysis is cewl!

The idea of this post came about while using the program "cewl" that was created by Robin Wood aka @digininja. I initially started using this application to harvest the email addresses on my company's website so I can compare to a know list of exchange public folders and correct any discrepancy. If you would like to give cewl a try you can find a nice install guide over at @joswr1ght website.
However after utilizing some of cewl's functions to download documents while analyzing the email addresses, I then went a step further and used Larry's paper as a guide to analyze some of those documents and the results were shocking. This brought on the realization that a lot of companies just post PDF's/ Word documents online without thinking about sanitizing them first, and thats just making it easy for the bad guys. I am not going to get into all the details of my findings but I would say if you haven't used cewl or read Larry's paper you are doing yourself a great injustice.

Now before you start panicking the are a few things you can do to limit the exposure of personal data your company might inadvertently leak on the internet. The National Security Agency published a paper back in 2008 which I believe is still very useful today. You can use this paper as a guide for sanitizing your PDF's and other documents before publishing them online.
I decided to do something new this time, and ask both Authors a few questions:

Robin Wood Q&A:

What prompted you to create cewl?
Cewl is based on a blog post by Larry "HaxorTheMatrix" Pesce from http://www.pauldotcom.com/. He used command line tools,and I put it all in one place.

Do you think that the area of metadata research is not getting a lot of attention?
I think there is a bit of research going on, Larry does some and foca is a great app. It is defense that is lacking.

What was the main usage you had in mind for this tool and is that goal being meet?
The main usage was creating dictionaries for dictionary attacks and it seems to be working from the feedback I've been getting.

What other meta data analysis tools are you working on?
I'm currently not working on any meta data projects at the moment, however I tend to be a spur of the moment developer so if I have an idea you might see a tool the next day.

How can someone contribute or help out with this tools or any other of your projects?
If anyone wants to contribute they can mail me ideas or send code patches. I'm always happy to listen to ideas.

Where can people follow your work and find out more about what you are doing?
For more details on my projects visit http://www.digininja.org/ or follow me on twitter @digininja.

Larry Pesce Q&A:

What made you decided to focus your research and write a paper on the evils of meta data?
It started with they myspace 1.6 gig picture leak, I wanted to see if any of the images contained GPS info so that I could tie the picture to a location.

Do you think everyone is doing their part to bring awareness to this issue?
I think that folks are just starting to come around on the whole "detailed recon" aspect of a test and are starting to educate themselves.

Would you say every company that publishes documents on the web should have a policy in place that addresses sanitizing documents?
I would not say that every company needs to have a policy on it.I know shocking! For some, the effort put forth to sanitize the public documents has no reward in reducing risk. But I think that if you do the analysis, any mid sized or larger company can easily and adequately address the risk that it introduces.

Since the paper have you done any additional work or research in this area?
I have done some, such as looking at some other common stuff for information gathering and recon. I have looked at simcards, and other document types; such as streaming video, stuff like YouTube for GPS tagged videos, and of course automating a lot of the work.

Do you think that more people should be doing research in this field?
Yes, in as much as the attackers are doing the same thing. I think that most of us don't realize how much info is out there with a little bit of digging

I would like to end with a quote which I am sure I have picked up from the PDC crew “no need for a zero day, when all your personal information is in the wild".

Reference links:

http://www.digininja.org/
http://pauldotcom.com/wiki/index.php/Episode129
http://pentestit.com/2009/05/16/foca-fingerprinting-organisation-collected-archives/
http://www.sans.org/reading_room/whitepapers/privacy/document_metadata_the_silent_killer_32974
http://www.willhackforsushi.com/?p=410
http://www.fas.org/sgp/othergov/dod/nsa-redact.pdf
http://www.nsa.gov/ia/_files/app/pdf_risks.pdf

1 comment:

  1. I must have edited this post like 5X and the format is still not correct I think I will leave it this way. Maybe the next post will have better luck with formatting.

    ReplyDelete