oopcorenlp – Functionality - Aggregates

Fork me on GitHub

Aggregates

This is the OOP document level scores enriched with relevant statistical calculations. The structure of the json document is as follows:

document
- annotation
  - name (same as annotation)
  - aggregatedScores (list)
    - name (subannotation name)
    - score
      - raw (integer, sum of subannotation scores in text)
      - normalized (decimal, raw divided by number of tokens in text)
      - count (integer, count of subannotation scores in text, frequently equal to raw)
    - aggregateScore
      - rank (integer, ordinal location of this subannoation inside the annotation)
      - percentage (decimal, subannotation normalized divided by annotation normalized)
      - percentile (integer, subannoation rank plus annotation count divided by annotation count)
  - scoreStats (object)
    - score
      - raw (integer, sum of subannotation scores in text)
      - normalized (decimal, raw divided by number of tokens in text)
      - count (integer, count of subannotations)
    - stats
      - min (decimal, lowest subannotation normalized)
      - max (decimal, highest subannotation normalized)
      - mean (decimal, annotation normalized divided by annotation count)
      - median (decimal, midpoint of subannotation normalized)