[Tiki-devel] Native PDF handler for File Galleries

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[Tiki-devel] Native PDF handler for File Galleries

Luis Henrique Fagundes
Hi,

I've just implemented a native handler for extracting text contents
from PDF files. Until now, that feature depends either on
poppler-utils or pstotext installed on system and doesn't work on
windows.

The native is based on https://github.com/christian-vigh-phpclasses/PdfToText.

In my tests, PdfToText was as effective as poppler-utils for the
purpose of indexing files for search. The text formatting of
poppler-utils is much better, but since it's not displayed in search
results, this looks irrelevant.

My question is: should this be optional? Is there any case in which
user will prefer to use the former external tools to parse PDF files?

asa

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
TikiWiki-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: [Tiki-devel] Native PDF handler for File Galleries

Dr. Sassafras
Not optional. Assuming:

*The the library is being included as a part of tiki, not through packages.

* It works :) it's a funny thing to say, but I've tested variants of PDFtoText that didn't render spacing properly, leading to words being combined.

Overall it's much better to rely less on system integration whenever possible. So thanks ;)

Brendan

> On Aug 2, 2017, at 1:04 PM, Luis Henrique Fagundes <[hidden email]> wrote:
>
> Hi,
>
> I've just implemented a native handler for extracting text contents
> from PDF files. Until now, that feature depends either on
> poppler-utils or pstotext installed on system and doesn't work on
> windows.
>
> The native is based on https://github.com/christian-vigh-phpclasses/PdfToText.
>
> In my tests, PdfToText was as effective as poppler-utils for the
> purpose of indexing files for search. The text formatting of
> poppler-utils is much better, but since it's not displayed in search
> results, this looks irrelevant.
>
> My question is: should this be optional? Is there any case in which
> user will prefer to use the former external tools to parse PDF files?
>
> asa
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> TikiWiki-devel mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
TikiWiki-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/tikiwiki-devel
Loading...