Indexing PDF’s - The Why?

January 26th, 2008

PDF IconAfter I published my small article on “Indexing PDF Documents with Zend_Search_Lucene” I was surprised to find it on the Zend Developer Zone blog. I had no idea that this would get the attention that it did and I thank everyone for checking it out. So now that you know how you would index a PDF, you may be asking why the heck would you do this?

LuceneMany companies large and small have support centers, either be in internal help desks or external help desks. In addition to the help desk, many companies publish PDF documents such as manuals, specs, services guides, and setup/connections guides etc. So instead of a help desk employee (or anyone) remembering what manual does what and what page everything is on, you can simple index these PDF files for easy searching. Just think about it this way, say you have 50 products all having 5 manuals each, that’s 250 manuals that you have to keep track of (not including how many pages each manual has). The easy way would be to index the PDF’s, add the necessary metadata to the manual, build a search form around a web page and wa-la. You have a easy way to search PDF files finding information quickly for a customer or whoever, and saving loads of time searching page by page for the same information.

Many companies do this, and many companies bloat how they are the best at doing it. So next time you are looking for a searchable PDF solution, remember that anyone can do this and it’s easy to do yourself.


One comment to “Indexing PDF’s - The Why?”


  1. amin2u said:

    Yeah i’m still wondering wether i can implement this on Windows platform

Leave a Reply