SharePoint 2007 and PDF indexing


Introduction

By default the SharePoint 2007 Search indexed only the meta data of a PDF document. By installing and configuring a PDF IFilter the Search will also index the contents of the PDF document. This allows users to find documents based on text inside the document. This process is called full text indexing.

[Indexing Server]: the server(s) in the SharePoint Farm that has/have the "Indexing" Role assigned. In a small farm this can be a single server for all roles.

[Web Front End Server]: the server(s) in the SharePoint Farm  that has/have the "Web Front End" Role assigned. In a small farm this can be a single server for all roles.

Windows SharePoint Services 3.0

[Indexing Server]

  1. Install the PDF IFilter (see below for a list of available IFilters)
  2. Add the .pdf file type to the index list:
    1. Open the Registry Editor (Start > Run > regedit)
    2. Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Applications\<GUID>\Gather\Search\Extensions\ExtensionList
    3. Add a new String Value
      1. Value name: <next value in line>
      2. Value data: pdf
  3. [This step only applies to 64 bit servers]
    1. Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
    2. Change the (Default) key value
      1. Old value: {4C904448-74A9-11D0-AF6E-00C04FD8DC02}
      2. (Foxit  x64 PDF IFilter) New value: {987F8D1A-26E6-4554-B007-6B20E2680632}
      3. (Adobe  x64 PDF IFilter) New value: {E8978DA6-047F-4E3D-9C78-CDBE46041603}
  4. Perform an iisreset
  5. Perform a Full Update on the Search content indexes
    1. Open a Command Prompt on the Indexing Server
    2. net stop spsearch
    3. net start spsearch
    4. cd "C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\BIN"
    5. stsadm.exe –o spsearch -action fullcrawlstop
    6. stsadm.exe –o spsearch -action fullcrawlstart

[Web Front End Server]

  1. Copy the ICPDF.GIF () file to "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images"
  2. Edit the file C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml\DOCICON.XML
    1. Add an entry for the .pdf extension
      <Mapping Key="pdf" Value="icpdf.gif"/>

Microsoft Office SharePoint Server 2007

[Indexing Server]

  1. Install the PDF IFilter (see below for a list of available IFilters)
  2. Add the .pdf file type to the index list:
    1. Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and next to File Type
    2. Add a new file type pdf
  3. [This step only applies to 64 bit servers]
    1. Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office Server\12.0\Search\Setup\ContentIndexCommon\Filters\Extension\.pdf
    2. Change the (Default) key value
      1. Old value: {4C904448-74A9-11D0-AF6E-00C04FD8DC02}
      2. (Foxit  x64 PDF IFilter) New value: {987F8D1A-26E6-4554-B007-6B20E2680632}
      3. (Adobe  x64 PDF IFilter) New value: {E8978DA6-047F-4E3D-9C78-CDBE46041603}
  4. Perform an iisreset
  5. Perform a Full Update on the Search content indexes
    1. Open a Command Prompt on the Indexing Server
    2. net stop osearch
    3. net start osearch
    4. Go to Central Administration, then to the Shared Services Administration Web of the current SSP, go to Search Settings and start a full crawl of all locations containing PDF files

[Web Front End Server]

  1. Copy the ICPDF.GIF () file to "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\Template\Images"
  2. Edit the file C:\Program Files\Common Files\Microsoft Shared\Web server extensions\12\Template\Xml\DOCICON.XML
    1. Add an entry for the .pdf extension
      <Mapping Key="pdf" Value="icpdf.gif"/>

Available IFilters

Adobe PDF IFilter 6.0 - x64

  • free (always good !)
  • 32 bit and 64 bit (64 bit released recently, applies to the [Indexing Server])

Foxit PDF IFilter v1.0

  • free for desktops, servers require a license
  • 32 bit and 64 bit (IA64 currently being tested, applies to the [Indexing Server])

Conclusion

Using the above procedure for either WSS 3.0 or MOSS 2007 it is possible to have your PDF document's contents indexed by the SharePoint Search.

References

Other

 


Comments

Thursday, 27 Dec 2007 07:53 by BruceN
Also consider iFilter Shop's PDF+ iFilter. Like FoxIt, not free for servers, but unlike FoxIt, PDF+ can expose custom embedded metadata for indexing by MOSS. Plus iFilter Shop has great support.

Thursday, 27 Dec 2007 07:53 by Brett Campbell
This is GREAT!. What about TIFF Files?

Thursday, 27 Dec 2007 07:53 by Steven Van de Craen
The out of the box TIFF indexing from SPS2003 (http://support.microsoft.com/kb/837847) has been removed from MOSS 2007. I have picked up the following rumor: http://blogs.msdn.com/ifilter/archive/2006/11/30/recent-ifilter-implementation-and-deployment-questions.aspx#4161780

Thursday, 27 Dec 2007 07:53 by Göran Johansson
After hours of troubleshooting the Foxit x64 PDF IFilter is finally indexing the PDF files on our WSS 3.0 installation. Thanks a lot!

Sunday, 2 Mar 2008 06:56 by Theo Leclaire
Thanks, Steven - That really helped a lot.

Saturday, 5 Apr 2008 08:29 by shival khanna
Thanks Stevan for this post

Thursday, 5 Jun 2008 09:26 by KHugh
Worked like a charm - thanks for the post!

Thursday, 3 Jul 2008 11:42 by John Manning
I don't undertand the foxit step for WSS, the old value and new value for GUIDs are the same??

Tuesday, 22 Jul 2008 09:35 by Steven Van de Craen
Good catch, John ! I've updated it but what matters is the new value anyway :)

Tuesday, 22 Jul 2008 09:09 by Peter Kurth
I've followed all the instructions I can find on this topic and still and not seeing and pdf results returned from my search. Difference for me is the pdfs I want scaned are on a separate file server which has been defined as a Search Content Source. Does the IFilter install have to be applied to the file server on which the pdfs reside as well? Any help much appreciated.

Thursday, 24 Jul 2008 09:17 by Steven Van de Craen
Do you get results (from the file share) when searching on filename ? If not then the search is misconfigured.

Thursday, 24 Jul 2008 08:37 by Peter Kurth
Yes, I do get results when searching for a filename or part of a filename within that search scope.

Friday, 1 Aug 2008 03:00 by Steven Van de Craen
Peter, did you ever find a solution ?

Thursday, 4 Sep 2008 06:18 by michael Gibson
I get pdfs with icons returned in searches, for example, 'pdf' or when searching be the pdf document name. I do not get any content indexed.

Monday, 8 Sep 2008 08:57 by Steven Van de Craen
If they were indexed before the iFilter installation they might not be picked up again. Try reseting all crawled content and do a full crawl. (Not the best solution if you have a lot of indexed data but yeah)

Tuesday, 16 Sep 2008 01:10 by Joel Hodes
Steven, thank you for your advice here. I have one problem - Sharepoint is indexing the pdf files but not their content. The ifilter seems to be installed correctly because I ran ifilttst on my system and it extracted all data. However, the search does still not return any results unless i search for the file name itself. Do you know why it is not searching the content? Do i need to map any metadata properties for pdf's? thanks Joel

Wednesday, 17 Sep 2008 11:37 by Steven Van de Craen
Joel, did you add the 'pdf' extension to crawled file types ? Also, if they're already indexed before ifilter installation then you might need to reset all crawled content and do a full crawl. No additional action is required.

Wednesday, 8 Oct 2008 05:55 by Michael Colaianne
Can anyone tell me whether SECURED PDFs are crawled? That is, I have uploaded into a MOSS doc library several PDFs that were generated with PDF security enabled to prevent selection, copy, print, etc. Will this preculde the indexing service (with PDF iFilter installed and configured properly) from crawling content the secured PDFs? Many thanks in advance, Michael C.

Wednesday, 12 Nov 2008 07:31 by ningin
thank you for share so detail steps very useful!!

Friday, 14 Nov 2008 07:48 by Joel Hall
Great Post, worked like a champ! Thanks!

Tuesday, 20 Jan 2009 05:47 by Ravinder Jamgotre
Hi, I have tried everything you have asked above but still the icon is still plain white and PDF files are opening in Internet Explorer, can you please please help me.

Thursday, 22 Jan 2009 10:00 by Steven Van de Craen
Make sure you followed the instructions and don't forget to do an iisreset.

Wednesday, 4 Feb 2009 01:57 by Vijay
Hi, Is there any way to run the full crawl for Microsoft Office SharePoint Server 2007 from the command prompt?

Thursday, 5 Mar 2009 05:47 by mike g
Hi, Joel or Michael or anyone else have you found your solution to index/crawling content within PDF files? I have everything configured as to what is documented here http://www.adobe.com/special/acrobat/configuring_pdf_ifilter_for_ms_sharepoint_2007.pdf. thanks

Friday, 6 Mar 2009 05:02 by Steven Van de Craen
Mike, what do the crawl logs mention about a random PDF file in your environment after a Full crawl ?

Wednesday, 18 Mar 2009 02:25 by Peter
Thanks for the icon :D Working great with Adobe Reader 9 8-)

Wednesday, 1 Apr 2009 01:48 by Marcin
Hello, my problem is strange. After installing ifilter (32bit, v9),search is searching content of pdf's only after full crawl. When i upload a file and run incremental crawl, in crawl logs i see this pdf as 'searched'. But when i try to search any phrase from it, results are empty. After full crawl, i can search all content from it..

Thursday, 23 Apr 2009 10:40 by Amira
I have made these edits, and I can search inside pdf content. But I have a problem in the returned search result, the pdf document returned like that "Microsoft Word - pdfFileName.doc" although it has the pdf icon, and when click it opened by acrobat reader. thanks in advance

Monday, 27 Apr 2009 10:37 by Steven Van de Craen
Amira, maybe the title or name (in the PDF or in SharePoint) is set to 'Microsoft Word...' ?

Thursday, 18 Jun 2009 11:03 by Sander
Hi you say that not instaling any ifilter makes sharepoint index the metadata. Because this is exactly what we need, I hvae tried this, but unfortunatly I could not get this working. I just added the pdf file type and started afull crawl. Do I need to to do anything else (I have moss 2007 sp2 with lots of pdf's and only need metadata search (through managed props) Any advice much appreciated!! Sander

Tuesday, 20 Oct 2009 10:15 by Trine
We plan to install a pdf-iFilter. Is it possible to install it in a WinServer2008R2-Environment with MOSS2007 SP2? Are there any experiences? Or think to note? Thanks for your answer!

Thursday, 22 Oct 2009 08:41 by Steven Van de Craen
Trine, the Adobe PDF iFilter v6 is the only one that doesn't explicitly mention it supports Windows 2008, but even so I believe it will be no real issue. If you have x64 there's no problem because that requires the v9 iFilter from Adobe, or the Foxit one which both support W2008. Installation instructions are identical as to described above. Good luck!

Friday, 23 Oct 2009 09:32 by Trine
Hi Steven, thanks a lot for your answer! Yes, we i'll try and see... :-)

Thursday, 3 Dec 2009 02:42 by carl halle
Please note that upgrade process like sp2 or cumulative upgrade will reset the GUID to the adobe 32 bit Ifilter and this, even on a 64 bit machine. After the upgrade, ifilter will stop on 64 bits because of this. You will need to go and reset the registry key as per procedure in this post.

Thursday, 18 Feb 2010 07:32 by Mike Jackson
I have the icon for pdfs, but when the pdf s are displayed in a doc library that shows title instead of file name, the title is blank, eventhough the pdfs have titles. Any ideas what may be causing this?

Monday, 22 Feb 2010 06:03 by Mrinmoy
Hi Steven, I am a SharePoint Developer and working on a project where Clients wants to Index the content Inside PDF documents. I don't have much experience with SharePoint Search but I can understand Using Ifilter I can search content inside PDF document. I am using 64 bit windows 2008 server (MOSS 2007 and have Enterprise Search) and I am aware Ifilter version 9 is available. Can you give me few tips please? Should I install new version of IFilter? If yes do I need to change the Registry the same way you explain? or something new I have to do? Do I need to have latest version of Adobe If I use Latest Ifilter on server or doesn't matter of client's machine's version? I will really appreciate if you can reply me. It will help me a lot. Thanks a lot. Cheers, Mrin

Tuesday, 2 Mar 2010 09:24 by Steven Van de Craen
Mike, the Title field must be filled in as SharePoint metadata, they are not automatically promoted from the title property in the PDF document, that functionality only works with Office file formats. Doesn't relate to Search or iFilter by the way.

Tuesday, 2 Mar 2010 09:29 by Steven Van de Craen
Mrin, the iFilter must be installed on the SharePoint server following the above instructions. Some steps only apply if you have x64 server environment and others are different depending on the iFilter you are using. An iFilter will do nothing more that read the text from a PDF document and give it to the indexer so that it is stored, it works with both older and newer PDF file types. You don't need to match the version of iFilter and Reader for it to work. HTH, Steven

Thursday, 8 Apr 2010 05:34 by Phil
Thanks! I couldn't find ICPDF.GIF so I made my own with Paint.Net.

Thursday, 15 Apr 2010 11:29 by Archana
Very informative...solved my problem. Thanks!!

Monday, 17 May 2010 05:45 by Eric
Great tip about SP2 resetting the GUID for the .pdf extension. I got burned by this one.... Thanks Carl

Friday, 20 Aug 2010 11:01 by Sandra
Well how do you see the status of the WSS 3.0 crawl. I don't see some of the pdf documents in search if it is in subsite. Though I see many pdf documents in site collection. http://www.ekhichdi.com/a/WSS-3.0-PDF-search-results-not-working-106.html

Monday, 20 Sep 2010 10:53 by sameer
Thanks, it’s really useful.

Friday, 12 Nov 2010 11:02 by Tejas
Hi Steve, I followed the complete process listed above. Now when I crawl, the pdf's do get crawled but won't give any results when I try to search content from any pdf file. I restarted the machine as well. What could be the issue ? Thanks, Tejas

Friday, 12 Nov 2010 05:16 by Steven Van de Craen
Hi Tejas, really depends. Do new pdf files get crawled ? What do the crawl logs say ? Which SP version ? x64 ?

Thursday, 19 May 2011 07:36 by Sameer
Does iFilter require Adobe Reader pre-installed on the Indexing or WFE?

Monday, 23 May 2011 09:37 by Steven Van de Craen
Hi Sameer, installing the iFilter is all that's required. Then do the configuration steps.

Monday, 30 Jan 2012 04:11 by Geetha
Hi Steven, My env is Moss 2007 and i have installed the ifilter and followed the configuration steps. Now pdf documents are showed in search results. But i observed that, in my Document library i had 200000 + pdf docs out of which only 100000 + docs are getting indexed. Kindly let me know whether im missing some thing.

Tuesday, 31 Jan 2012 07:27 by Steven Van de Craen
Hi Geetha, if you're getting results the PDF iFilter is working. Search the Crawl Logs for information on those 'missing' documents. If they show with a warning or error message that would be helpful.

Tuesday, 31 Jan 2012 09:39 by Geetha
Hi Steven, thanks for your prompt reply. I think i should re phrase my question, My document library has some 200 k pdf documents, out of which only 100k are showing up in the results. i wrote a test webpart to count the no of documents showed for each document type. I checked with Crawl logs there were nearly 50 errors in it which is not related to pdf documents. The errors are shown for word document and even those documents are searchable. I have configured a content source to search my sharpeoint site collection, and i defined a scope for this specified document library. this scope is returning 100k + results, where my entire output should be 210k results. Do let me know if u need further info about these. Also in addition to that my document library has several file types which includes jpg, png, pdf, doc, xls. out of which pdf is the majority doc type. Other document types jpg and png is coming in the search results are desired. Even i observed that, few of doc and xls are also not coming in results. But the diff is very minimum. Only pdf is having this problem. Also let me know how can i cross check whether all my documents are coming in search results.

Thursday, 2 Feb 2012 08:45 by Steven Van de Craen
Hi Geetha, sure they're not filtered for being duplicates ? Might be best if you post your issue to the MSDN forums to get a broader reach of help.

Friday, 5 Jul 2013 06:36 by Viveka
Hi..Is it possible to search within contents of secured PDF documents(that means copy and print protected (restriction on content copy and paste)

Tuesday, 9 Jul 2013 02:31 by Steven Van de Craen
If a document is copy protected, an iFilter cannot extract text from it to make it searchable.

Friday, 13 Sep 2013 10:53 by Viveka
Thanks for the response Steven! Is there any way or work around to search within the contents of secured PDFs?? I also want to refer a post above by Michael: Wednesday, 8 Oct 2008 05:55 by Michael Colaianne Can anyone tell me whether SECURED PDFs are crawled? That is, I have uploaded into a MOSS doc library several PDFs that were generated with PDF security enabled to prevent selection, copy, print, etc. Will this preculde the indexing service (with PDF iFilter installed and configured properly) from crawling content the secured PDFs? Many thanks in advance, Michael C. Michael! Have you found the solution? or Has any one got this working? --------------------------------------------------------------------------------

Monday, 16 Sep 2013 08:39 by Steven Van de Craen
Depends on what protection is active I guess. You could try http://www.pdflib.com/products/tet-pdf-ifilter/ which states that it is able to do this, but I have no experience with it.

CAPTCHA Image Validation