Fork me on GitHub


The Stanford output is included mostly as a convenience and a debugging tool, but if you need a more reliable implementation here you go.

This is the result of the Stanford CoreNLP analysis of the text using the following annotators:

  • tokenize: recognizes individual words.
  • ssplit: recognizes sentence boundaries.
  • pos: identifies the token's part of speech.
  • lemma: identifies the root form of the token.
  • parse: creates a syntax tree of the sentence. Main clause, independent clause, noun phrase, verb phrase, etc.
  • depparse: creates a dependency graph of the sentence. Subject, verb, object, modifier, auxilliary, etc.
  • ner: recognizes named entities. Is this a person? Is this a place?
  • coref: links pronouns to antecedents.
  • quote: recognizes quotations and links quotes to speakers.
  • sentiment: classifies the emotional content of a sentence on a scale from most negative to most positive.

The structure of the STANFORD json is as follows:

  • document
    • various document level annotations
    • corefs
    • quotes
    • sentences
      • various sentence level annotations
      • tokens
        • various token level annotations
        • word
        • lemma
        • pos
        • ner