PERPOS: Information Extraction

Modified: April 9th, 2007

  1. The First Experiment
    • A corpus of 50 Presidential Records
    • GATE Platform and Development Environment
    • ANNIE Application in default configuration
    • Human Markup vs. ANNIE Response
    • Annotation Difference and Corpus Benchmark Tools
    • Precision/Recall and F-Measure Results
    • Analysis

  2. The Second Experiment
    • A different corpus of 50 Presidential Records
    • Precision/Recall was raised to over 90% on Corpus1 before the run on Corpus2
    • Modified Pipeline: "OntoText Hash Gazetteer" vs. "ANNIE Gazetteer"
    • Modified Pipeline: "Montreal Transducer" vs. "ANNIE Named Entity Transducer"
    • Modified word lists to better represent the Bush41 Presidential Records domain
    • Modified rules in the default pattern/action grammar
    • Default configuration used on Corpus2 performs statistically the same as it had on Corpus1
    • Modified configuration enjoys a 10% increase in both Precision and Recall measures
    • Still many errors in the annotated response files

  3. The Public Papers Experiment
    • The George Bush Presidential Library and Museum Public Papers
    • HTML and TEXT versions have been processed through an IE application using the GATE API
    • The pipeline properties and resources used were the same as for Corpus2
    • Requires Internet Explorer browser for best results
    • Please email "matthew.underwood" if with question "@gtri.gatech.edu"