Class ExtractText
- java.lang.Object
-
- io.outofprintmagazine.corpus.batch.CorpusBatchStep
-
- io.outofprintmagazine.corpus.batch.impl.ExtractText
-
- All Implemented Interfaces:
ICorpusBatchStep
public class ExtractText extends CorpusBatchStep implements ICorpusBatchStep
-
-
Field Summary
-
Fields inherited from class io.outofprintmagazine.corpus.batch.CorpusBatchStep
dictionaryPOS
-
-
Constructor Summary
Constructors Constructor Description ExtractText()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected String
parseToString(InputStream stream)
com.fasterxml.jackson.databind.node.ArrayNode
runOne(com.fasterxml.jackson.databind.node.ObjectNode inputStepItem)
-
Methods inherited from class io.outofprintmagazine.corpus.batch.CorpusBatchStep
copyInputToOutput, copyInputToOutput, getAuthor, getAuthor, getData, getDate, getDate, getDateFormat, getDefaultProperties, getDocID, getExtensionFromMimeType, getJsonNodeFromStorage, getJsonNodeFromStorage, getJsoupDocumentFromStorage, getJsoupDocumentFromStorageNormalized, getLink, getMapper, getMimeTypeFromExtension, getOutputScratchFilePath, getOutputScratchFilePath, getOutputScratchFilePathFromInput, getParameterStore, getStorage, getStorageLink, getText, getText, getTextDocumentFromStorage, getTextDocumentFromStorage, getTextWithSelector, getThumbnail, getTitle, getTitle, isDictionaryWord, run, setAuthor, setAuthor, setData, setDate, setDate, setDate, setDocID, setLink, setParameterStore, setStorage, setStorageLink, setThumbnail, setThumbnail, setTitle, setTitle
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface io.outofprintmagazine.corpus.batch.ICorpusBatchStep
getData, getDefaultProperties, run, setData, setParameterStore, setStorage
-
-
-
-
Method Detail
-
parseToString
protected String parseToString(InputStream stream) throws IOException, SAXException, org.apache.tika.exception.TikaException
- Throws:
IOException
SAXException
org.apache.tika.exception.TikaException
-
runOne
public com.fasterxml.jackson.databind.node.ArrayNode runOne(com.fasterxml.jackson.databind.node.ObjectNode inputStepItem) throws Exception
- Specified by:
runOne
in interfaceICorpusBatchStep
- Specified by:
runOne
in classCorpusBatchStep
- Throws:
Exception
-
-