ACCCA Manuals and Guides
Return to ACCCA
Introduction
ACCCA (A Combined Clinical Concept Annotator) is a UIMA-based combination
system for clinical record concept annotation. It can annotate clinical
concepts with 3 concept types: problem, treatment, and test.
Currently ACCCA combines 6 systems (ABNER 1.5, Lingpipe 3.8, OpenNLP Chunker 2.1,
JNET 2.3, Peregrine 2009, StanfordNER 1.1), and has been trained on the training
corpus of the 2010 i2b2/VA clinical record concept annotation challenge.
ACCCA combines the annotations of the different
systems using a simple voting scheme to establish a combined annotation. We participated in the i2b2 clinical
record concept annotation challenge, and ranked third out of 22 teams
(F-score 82.2% for exact match, 90.6% for inexact match).
ACCCA is written in Java (1.6). It provides an online web service, supports
a training mode, and allows users to easily add more systems in the combination framework.
The system is open source and can be downloaded
below.
How to download and use ACCCA
ACCCA can be used in 4 different ways:
1. Web page mode:
Just paste a clinical record and view the annotations directly.
It is not possible to do batch processing, or to use the training mode
with this method.
2. Web service client mode:
Access the ACCCA web service to process multiple clinical
records automatically. Batch processing is possible with
this method, but not the training mode.
The WSDL file is located at
http://www.biosemantics.org/ACCCA/AnnotationWebservicePort?wsdl.
If you use Java, you can simply download this web service
client, with sample code included in the jar file.
The web service uses JAX-WS, so apart from the jar file,
you will also need JAX-WS 2.1 Runtime Library and JAX-WS 2.1
API Library.
The web service uses a plain text clinical record as input,
and produces concept annotations as output.
/**
* To generate the annotations of the input clinical record
*
* @param document
* @return AnnotationSet
* @throws Exception
*/
public AnnotationSet getAnnotation(String document)
Below is sample code that shows how to use the web service in Java.
public static void main(String[] args) throws Exception_Exception {
String document = "He 'd been having lower abdominal pain for approximately " +
"the past week , a symptom for which he 's been admitted in the past . " +
"His PCP had recently started ciprofloxacin for a UTI .";
//initial the web service client
AnnotationWebserviceService annotationWebserviceService = new AnnotationWebserviceService();
AnnotationWebserviceDelegate annotationWebserviceDelegate = annotationWebserviceService.getAnnotationWebservicePort();
//call the web service to make the annotation
AnnotationSet annotationSet = annotationWebserviceDelegate.getAnnotation(document);
List simpleAnnotationList = annotationSet.getSimpleAnnotationList();
//print out the annotations
for (SimpleAnnotation simpleAnnotation : simpleAnnotationList) {
System.out.println("Annotation: " + simpleAnnotation.getConceptText()
+ " --- Concept Type: " + simpleAnnotation.getConceptType());
}
}
3. Web service server mode:
Besides accessing our web service, users can also set up local
web services. To do this, please download the war file
and the training models, unzip the training
models and place the "trainingModels" folder in
/home/public/CRCAnnotationWebservice/ (this is the default
folder). If you would like to use a
different location, then change the path in the
analysis engine description files. After placing the folder
in the desired location, deploy the war file to Tomcat.
4. UIMA mode:
Download the system and load it into the UIMA framework (all libs are located in WebRoot\WEB-INF\lib\).
It is possible to use a training mode and do batch processing with this method.
The source code of ACCCA can be downloaded here, and the
training models can be downloaded here. To load the system
pipeline into the UIMA framework, please open the UIMA CPE
(Collection Processing Engine) configurator, and load
the CPE descriptor (I2B2_UIMA\resources\accca_pipeline_CPE.xml).
You can change the related parameters, such as location of
the clinical records, output concept directory, etc.
It is also possible to add more systems to the pipeline. Below is
a screenshot of the pipeline CPE (click to enlarge the picture).
To retrain ABNER,
OpenNLPChunker, and StanfordNER, go to
I2B2_UIMA\src\main\java\org\biosemantics\util,
where you will find their training classes. To retrain Lingpipe
and JNET, load the CPE descriptor
(ACCCA_UIMA\resources\accca_pipeline__train_CPE.xml).
All trainers use
default training file formats, and the training models
folder includes some sample training files.
Below is a screenshot of the trainer pipeline CPE (click to enlarge the picture).
System architecture and annotation steps of ACCCA
Below is the system architecture of ACCCA

The following processing steps have been executed
to generate the concept annotations for the i2b2 corpus:
1. All tools except Peregrine were trained on the i2b2
training corpus (349 clinical records with concept annotations).
The corpus was converted to an appropriate input format
(IOB format with concept annotations) for training.
For systems that also need part-of-speech (POS) information,
we used the OpenNLP POS module to generate the POS tags,
and combined them with the concept annotations
to generate the appropriate training files.
2. All tools were integrated into the UIMA framework.
3. To annotate the clinical records, the UIMA Collection Reader read the clinical records,
and the UIMA Analysis Engine executed each of the tools.
4. The annotation results of the systems were combined using a
simple voting scheme. If a concept annotation provided by two or more
of the six systems is the same (i.e., the same start and
end position, with the same concept type), this annotation
becomes the combined annotation. A threshold of two was
selected because it gave optimal performance on the
training data.
5. The combined annotations were converted to the i2b2
annotation file format.
Frequently asked questions
How do I adjust the precision and recall of the combined system?
You can adjust the precision, recall, and F-score by changing the parameter
(i2b2VotingThresholdNumber) in ACCCA_UIMA\desc\i2b2Annotator.xml.
This parameter refers to the voting threshold, i.e., the minimum number of
systems that have to agree on an annotation. For the i2b2 test corpus, the
voting threshold was varied from one to six, resulting in an increase in
precision from 60.7% to 97.6%, and a concomitant drop in recall from 84.0%
to 19.3%. The highest F-score (82.2%) was achieved when the voting threshold was 2.
How do I add new systems into the ACCCA framework?
To add a new system into the ACCCA framework, you need to
follow three steps:
1. Write a new analysis engine Java class file. Have a look at the class files under
ACCCA_UIMA\src\main\java\org\biosemantics\ae\i2b2 for examples
2. Write a new analysis engine description xml file. Have a look at the xml files under
ACCCA_UIMA\desc for examples
3. Add the xml file to the annotation pipeline (i2b2_pipeline_cpe.xml)
How do I output the combined annotation result to file?
There are 2 parameters (outputCombinedConcept2file, outputCombinedConceptDirectory)
in the analysis engine file
ACCCA_UIMA\desc\i2b2Annotator.xml. Change the value of those 2 parameters to
output the combined annotation result to file in the i2b2 corpus format (see
ACCCA_UIMA\src\test\resources\0001.con as an example).
I have a gold standard corpus, how do I evaluate the performance of the combined
annotations against the gold standard corpus?
There are 2 parameters (includeGoldStandardAnnotation, goldStandardDirectory)
in the analysis engine file
ACCCA_UIMA\desc\i2b2Annotator.xml. Change the value of those 2 parameters to
include the gold standard annotations in the pipeline. Note that the gold standard
corpus must be in the i2b2 corpus format.
To find out the performance of each individual annotation system and the ensemble system, add the CAS
Consumer file (ACCCA_UIMA\desc\cc\summary\comparison\StrictSpanIdenticalMentionAnnotationComparator.xml)
to the annotation pipeline (i2b2_pipeline_cpe.xml)
Can I easily check the annotation of each annotation system and the ensemble system?
Yes, just add the ACCCA_UIMA\desc\XmiWriterCasConsumer.xml to the annotation pipeline (i2b2_pipeline_cpe.xml).
After the records have been processed, open the generated annotation xml files in the UIMA Document Analyzer.
I get an "unread block data" error when I run the annotation pipeline, how do I solve it?
If you get the following error
13157 [main] DEBUG de.julielab.jules.ae.netagger.EntityAnnotator - setModel() - loading JNET model...
13612 [main] ERROR de.julielab.jules.ae.netagger.EntityAnnotator - setModel() - Could not load JNET model: unread block data
java.lang.IllegalStateException: unread block data
at java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2377)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1361)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1947)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1871)
then a class version error from JNET has occurred. Try to load the
related jar files(a_mallet-deps.jar, a_mallet-optimized.jar, a_mallet-troveless-0.4.jar)
before the other jar files.
I deployed the war file to Tomcat, but I cannot see the log information, how do I solve this problem?
ACCCA uses log4j for the log function. To make Tomcat support log4j, you need to put log4j.jar(under I2B2_UIMA\I2B2_UIMA\WebRoot\WEB-INF\lib) into $CATALINA_HOME/lib, and then restart Tomcat.
What's the advantage of using an ensemble system instead of a single NER system?
The ensemble-based annotation system has higher precision, recall, and F-score values
than any of the six individual systems currently available in ACCCA. In terms of F-score,
the ensemble system outperformed the best single system, JNET, by 4.6 percentage
points. Besides improved performance, our combination approach offers the possibility of varying the
performance of the combined system across a range of precision and recall values by varying the voting
threshold. Thus, the performance of an ensemble system can easily be tuned to best meet
specific requirements.
How to cite ACCCA?
If you have used ACCCA in your study, please cite:
Ning Kang, Zubair Afzal, Bharat Singh, Erik M. van Mulligen, and Jan A. Kors.
Using an ensemble
system to improve concept extraction from clinical records. Journal of Biomedical Informatics, 2012
Contact
If you have questions or remarks about ACCCA, please send an e-mail to
.
Return to ACCCA
|