J Am Med Inform Assoc 16:211-219 doi:10.1197/jamia.M2933
  • Original Investigation
  • Research Paper

Structured Product Labeling Improves Detection of Drug-intolerance Issues

  1. Gunther Schadow
  1. Regenstrief Institute and Indiana University School of Informatics, Indianapolis, Indiana
  1. Correspondence: Gunther Schadow, MD, PhD, 410 West 10th Street, Suite 2000, Indianapolis, IN 46202; e-mail: <gschadow{at}>
  • Received 17 July 2008
  • Accepted 22 October 2008


Objectives This study sought to assess the value of the Health Level 7/U.S. Food and Drug Administration Structured Product Labeling (SPL) drug knowledge representation standard and its associated terminology sources for drug-intolerance (allergy) decision support in computerized provider order entry (CPOE) systems.

Design The Regenstrief Institute CPOE drug-intolerance issue detection system and its knowledge base was compared with a method based on existing SPL label content enriched with knowledge sources used with SPL (NDF-RT/MeSH). Both methods were applied to a large set of drug-intolerance (allergy) records, drug orders, and medication dispensing records covering >50,000 patients over 30 years.

Measurements The number of drug-intolerance issues detected by both methods was counted, as well as the number of patients with issues, number of distinct drugs, and number of distinct intolerances. The difference between drug-intolerance issues detected or missed by either method was qualitatively analyzed.

Results Although <70% of terms were mapped to SPL, the new approach detected four times as many drug-intolerance issues on twice as many patients.

Conclusion The SPL-based approach is more sensitive and suggests that mapping local dictionaries to SPL, and enhancing the depth and breadth of coverage of SPL content are worth accelerating. The study also highlights specificity problems known to trouble drug-intolerance decision support and suggests how terminology and methods of recording drug intolerances could be improved.


Computerized provider order entry (CPOE) is known to improve the safety and quality of drug prescribing if it offers decision support functions that actively help physicians prevent errors.1 These functions depend on knowledge bases that can be costly to purchase and maintain.

Traditionally, those who maintain large electronic health record and CPOE systems rely on three kinds of sources for drug knowledge: (1) commercial database products such as First Data Bank (FDB), Micromedex, and Cerner/Multum; (2) public sources, such as the Veterans Administration's (VA) NDF-RT2 and the National Library of Medicine's (NLM) RxNorm3; and (3) in-house created knowledge such as the Regenstrief Medical Gopher CPOE system.4

All of these knowledge sources involve laborious processes to excerpt, encode, compile, and reconcile data from various primary sources,5 and represent these data in a variety of proprietary formats and different conceptual models, using different terminology, and are therefore not interoperable and are difficult to integrate. Quality issues have been identified in such knowledge bases regardless of whether they are self-maintained or purchased.6

In the United States, the pharmaceutical industry and the U.S. Food and Drug Administration (FDA) together are the authoritative source for drug information. The Physician's Desk Reference (PDR), the reference book most frequently consulted by American physicians,7 simply compiles all the labels of currently marketed drugs approved by the FDA. The FDA's drug listing service registers marketed drugs and administers the National Drug Code (NDC), which, although not being a systematic terminology, is nevertheless the most widely used drug nomenclature in the United States.

The FDA has embarked on a comprehensive initiative to improve drug knowledge appropriate for use by both human readers and clinical information systems. The FDA initiative included the electronic labeling guidance,8 which first required drug labels be submitted electronically, and the Physicians Labeling Rule (PLR),9 which mandates more user-friendly labels. In 2007 a proposed drug listing rule10 was preempted by a congressional mandate to comprehensively move regulatory activity to electronic systems, including the formerly paper-form-based drug listing process.11

The FDA implements these initiatives using the Health Level 7 (HL7) version 3 standards, particularly the Structured Product Labeling (SPL) standard.12 Structured product labeling can represent human readable label documents and package them with increasing amounts of computer-processable drug knowledge. Since 2006, pharmaceutical manufacturers produce SPL labels for their products and the FDA has released an increasing number of these labels through NLM's DailyMed ( website. In 2007 we found that the nearly 3,000 available SPL labels covered only about 80% of the occurrences of NDCs in outpatient medication dispensing records and only one third of the distinct NDCs. Although these numbers have increased steadily for original innovator prescription drugs, coverage of generics is less comprehensive, and repackaged and over-the-counter (OTC) drugs have not yet been included. However, with the new law calling for electronic listing for every drug, this gap will rapidly close in the coming year.

It is therefore appropriate to turn from analyzing the breadth of coverage to investigating the depth of knowledge contained in SPL and to explore its value for clinical decision support. SPL uses the HL7 Reference Information Model (RIM)13 to represent drug knowledge from package inserts, which includes a comprehensive set of basic pharmacological data on ingredients, and packaging, all with exact computable quantities. All problems with inner packages (e.g., vials) have been resolved. Physical properties, including shape, color, imprint, scoring, and image, can be described and are being made required for new submissions. Since 2005 a comprehensive set of structures for knowledge required for clinical decision support are provided, including substance classifications, indication, dosage, adverse effects, contraindications, and interactions.

Although SPL can encode rich clinical knowledge, actually producing the encoded knowledge has proven difficult. Organizational and legal concerns slowed down the creation of this important content, which was originally intended to be created by the pharmaceutical industry. The FDA embarked upon an SPL indexing initiative14 to fast-track production of the most safety-critical content internally, but has yet to publish any. However, the terminology sources selected for this purpose, including the VA's National Drug File Reference Terminology (NDF-RT),2 already contain some of this knowledge. We are using these knowledge sources to enrich SPL labels and thus are able to simulate and study their application and value proposition before their issuance by the FDA.

In this article we show that the emerging SPL knowledge content can improve decision support functions in existing CPOE systems, such as the Regenstrief Institute (RI) Gopher.15 We focus here on drug-intolerance issue detection, for which we produced enriched SPL knowledge content and used it to screen a large set of drug-intolerance and medication data. We compare the performance of the new SPL knowledge-based detection with our current Gopher system.

Drug intolerances are called allergies in common jargon, although many of them are not true allergies or of any immunologic nature.16 17 For example, direct stimulation of histamine receptors by morphine derivatives, or hemolysis by sulfonamides in glucose-6-phosphate-DH deficient patients, are frequently put on allergy lists. To model this information correctly, HL7 version 3 uses the term “intolerance” instead of “allergy”. HL7 also uses the term “issue” instead of “alert” or “reminder”. A reminder in the sense of, for example, a typical Arden Syntax application18 is a rule that triggers a communication action. However, an issue in HL7 version 3 is an object that persists and can be communicated multiple times, and is documented and disposed of in an orderly and persistent manner, like a sticky note on a chart. Repeated visibility of an issue attached to the offending medication list entry could be a more effective communication vehicle, because it might prevent alert fatigue and would allow the health care provider to resolve the issue at a later time. Figure 1 shows the information model subset of SPL that deals with intolerance issues and Figure 2 shows an example network describing the detection of an Issue with ampicillin administration for a patient that has a penicillin allergy record.

Figure 1

Structured Product Labeling (SPL) knowledge structures serve as a hub connecting different key terminologies: UNII for ingredient Substances and active moieties, NDF-RT Ingredient/Chemical or MeSH Chemicals for chemical structure classes, NDC for Products with specific dose form and strength (Ingredient.quantity), and SNOMED-CT for observations (of type medical problem or intolerance identified by LOINC). SNOMED-CT has the “causative agent” relationship, which exists as a Participation type in the Health Level 7 (HL7) Reference Information Model (RIM) and represents the same linkage of the SNOMED-CT allergy concept to the NDF-RT/MeSH chemical structure class. A drug-intolerance Issue finally connects an IntoleranceObservation and a conflicting SubstanceAdministration.

Figure 2

Example of how the Health Level 7 (HL7) Reference Information Model (RIM) model works for detecting a penicillin allergy issue for a particular patient. The Patient is subject of a Penicillin allergy observation, coded using a SNOMED allergy concept. SNOMED describes allergies with the causative agent role, which is represented directly in the HL7 RIM participation of type causative agent connecting to the Penicillin Entity kind coded using the NDF-RT/MeSH concept. The patient is also the subject to a SubstanceAdministration, with the consumable being a drug coded with an NDC. Through SPL the NDC code is described as a product with INGRedient ampicillin (coded with a UNII) and of chemical class Ampicillin (NDF-RT/MeSH). The NDF-RT/MeSH terminology defines the IS-A relationships which transitively classify Ampicillin as a Penicillin. For efficient reasoning using a small fixed number of joins the transitive closure IS-A* is materialized. This approach may be extended by materializing even higher order semantic links such as the INGR*IS-A link (HAS-A*).


The objective of this study is to compare the performance of the drug-intolerance issues detection by the RI Gopher CPOE system with a new method using SPL and its public knowledge sources. Both methods were emulated on a relational database using only SQL, after importing all required knowledge sources and a large dataset of clinical intolerance and medication records.

Gopher Knowledge Sources

The Gopher system has an allergy list, onto which prescribers may put dictionary terms of type drug, and drug set. A drug set is a collection of drug terms or other drug set terms. For example, “ampicillin oral solid” is considered a drug, but “penicillins” is considered a drug set. The Gopher system detects drug-intolerance issues against the patient's allergy list by testing whether an ordered drug is on the allergy lists or if that drug term is an element of a drug set that is on the allergy list. Of course, both drugs and drug sets are in some way abstractions of medicines (sometimes called “clinical drugs”). Terminologically, both drugs and drug sets are classes or concepts. Formal concept languages define a concept using a so-called interpretation function, which maps the concept to its extension set of particular drugs. Hence, instead of drug (element) and drug set, we should speak of a drug concept D and its generalization (or “class”) C such that D is-a C or sometimes written as DC, stating that the concept is-a relationship is a subset relationship of the extension sets of the classes D and C. The is-a relationship is asymmetric (DCCD), transitive (ABBCAC), and reflexive (AA).19

The Gopher decision algorithm was emulated on a relational database system by: (1) exporting the Gopher dictionary as an XML file from the legacy database, (2) transforming that XML file into relational database load scripts, and finally (3) loading it into a simple schema of a concept table and a concept-generalization relationship table. The full expansion of the drug classes into all drug terms was created as the materialized transitive and reflexive closure of the is-a relationships using a form of Warshall's algorithm20 in SQL (Figure 3). The materialized transitive closure is a relation that redundantly links each concept to all its generalizations. For example, hydroxyl-aminopenicillin is-a aminopenicillin and aminopenicillin is-a penicillin, so hydroxyl-aminopenicillin is-a penicillin. The reflexive closure redundantly links every concept also to itself, penicillin is-a penicillin, simplifying many subsequent reasoning queries, such as the one shown in Figure 4.

Figure 3

Warshall's algorithm in SQL to compute the materialized transitive closure for IS-A hierarchies—the first statement CREATEs the transitive closure table and primes it with the direct relationships. The second statement INSERTs transitive relationships by joining the relationship table with itself to turn = into This algorithm spans the depth of the IS-A hierarchy exponentially converging usually after about 2 to 5 iterations. The final statement INSERTs the reflexive relationships (a IS-A a).

Figure 4

Drug-intolerance issue detection is as easy as joining a patient's drug with the causative agent of a patient's intolerance through the transitive and reflexive closure table.

With this preparation, drug-intolerance issues are detected as easily as joining the causative agent of the patient's intolerance and the intolerance classes of the patient's drug through the transitive and reflexive is-a* relation as shown in Figure 4. Also see Figure 2 for a visualization of the data structure and relationships that are the subject of this database operation.

SPL Terminology and Knowledge Sources

The SPL knowledge representation is a hub connecting multiple key terminologies in the Federal Medication Terminology system. The first steps of the FDA SPL indexing initiative will use the NDF-RT terminology to annotate ingredients with mechanism of action (MoA), physiologic effect (PE), and chemical structure. A typical use of MoA and PE includes drug-interaction checking, and a typical use of the chemical structures includes coding allergen concepts. This is based on the assumption that allergens are determined often by rather accidental molecular features that may not be the determinants for the MoA.

The NDF-RT is distributed in an XML format, which was transformed into a simple relational database schema. Preliminary analysis of this database in comparison to MeSH confirmed that the chemical structure classes are homomorphic with MeSH, i.e., every chemical structure concept in NDF-RT is also in MeSH and all hierarchy relationships are preserved. Therefore, instead of NDF-RT, we use MeSH directly to encode both intolerance records and annotating drug ingredients. This is done for two reasons: firstly we prefer to rely on original terminology sources (here MeSH) where other available derived sources (here NDF-RT) do not offer significant improvement. Secondly, MeSH allows us to extend the value space beyond the part of MeSH included in NDF-RT, thereby covering non-drug allergen concepts, such as foods and living matter. Figure 5 shows the high-level headings we selected to include all their descendents in our MeSH allergen concept subset.

Figure 5

Top-level MeSH classes useful for coding allergies. The bold-faced classes are included; the classes shown in italics are specifically not included because they did not seem to contain items that would be used for coding allergies.

All descendent MeSH concepts below these headings and the concepts from the MeSH chemical supplements that are mapped to these headings were extracted using the MeSH concept ids. Initially we extracted these from UMLS, but later moved to the original MeSH distribution because the representation of relationships changed significantly in version 2008AA of UMLS. The result was a table of MeSH-Chemical concepts and a table of structure-class links. Again, the complete materialized transitive and reflexive closure over the structure-class links was computed as described above for the Gopher sets (Figure 3), and issue detection could proceed accordingly (Figure 4).

All SPL labels were downloaded from DailyMed (3,704 as of March 7, 2007). The term SPL refers to ingredients by the federal substance registration system's (SRS) UNII codes. When we conducted this study in 2007, the UNII codes were not completely merged into the UMLS, hence only a few UNII concepts were mapped to UMLS Concept Unique Identifiers (CUIs). However, we could map 2,472 (76%) of the UNII concepts to 770 MeSH classes using the many synonym names in the UMLS Metathesaurus simply by exact name match. Later versions of the UMLS contain increasing numbers of UNII concepts with CUIs that are also mapped to MeSH.

This link from UNII to MeSH was used to enrich the SPL labels, adding the appropriate XML structures for MeSH classes into the SPL files to create files that could have been created by the industry and would be created by the FDA indexing initiative. These enriched SPL files were loaded into an HL7 RIM-based database-system. With the SQL queries from our earlier work, we created a clean NDC code index to the SPL data.21 The NDC index was later used to perform intolerance issue detection on medication dispensing records, which contain only NDC codes.

Clinical Data: Intolerance and Medication Records

A complete set of patient-specific medication intolerance records originating in the allergy lists of the RI Gopher system was extracted from the Indiana Network for Patient Care (INPC) data system, excluding statements such as “none” or “no allergies”. Allergy records in the INPC are observations with observation code “drug allergy” and observation value being an RI/Gopher dictionary term which is often a drug set term.

For all patients who had any such intolerance record, medication orders and medication dispensing records were also extracted across INPC member institutions (benefiting from the results of the routine record linking performed in the INPC). Medication order records originate from the Gopher CPOE system for which patient id, timestamp, and the RI/Gopher drug term was available in the INPC. Medication dispensing records originate from claims data processes and consist of patient id, timestamp, and the NDC code.

All data were de-identified before export into a research database for the experiment. The de-identification procedure replaced internal patient ids with random generated pseudo-ids, and for each patient, a random time offset of ±180 days was generated and added to all dates of birth, intolerance, and medication record timestamps. A fixed offset was added when necessary to hide ages beyond 90 years. This approach was approved by the IRB for exempt research (EX0801-37).

The final research dataset included 1,005,187 intolerance records for 84,030 patients born between 1917 and 2008 and covering a time range between 1977 and 2008. Gender distribution was 2:1 females to males, which is common for intolerance data16 22; 51,143 of these patients had any medication orders, 33,703 had any medication dispensing records, and the intersection, 22,897, had both.

Mapping Intolerance Substances (Allergens) to MeSH

The patient-specific drug-intolerance records referred to 1,348 distinct classes of chemical substances not tolerated (allergens) using RI/Gopher dictionary concepts. Of these concepts, 957 were of type drug and 391 of type drug set. These concepts were mapped to MeSH classes beginning with an existing mapping of RI dictionary terms to RxNorm “semantic clinical drug forms” (SCDF). The RxNorm ingredients were then mapped through the UMLS Metathesaurus (via CUI) to MeSH concepts. For the 391 drug sets, the correct MeSH concept was semi-automatically inferred using an SQL query developed to compute the most specific common generalization of the ingredients which all the set-elements have in common. The candidate mappings were reviewed on a spreadsheet in the order of descending frequency of occurrence. Manually validated mappings for 191 of them were loaded back into the database. Some important drug sets used in intolerance records were not reducible to a chemical structure concept (e.g., ACE inhibitor, NSAI, and antibiotics) and were left unmapped for this study. The combined end result was 1,058 mappings for 946 original terms to 581 MeSH concepts. The mapping to fewer MeSH concepts than original terms is due to the higher level of abstraction of the MeSH concepts. Thus only 70% of the distinct drug and drug set concepts were mapped in priority of descending frequency, sufficient for the purpose of this study to validate the feasibility of intolerance issue detection using SPL.

Mapping Medication Concepts to MeSH

Patient-specific orders used 7,159 distinct RI/Gopher dictionary terms. Many of them were not medication orders but included tests, durable equipment, and other services. Of these orders, 1,885 used a term that was mapped to medications (using the above mentioned map to RxNorm SCDF). Only these orders for identifiable drugs were included in the clinical dataset. For 1,438 of these drug terms, at least one ingredient (as per RxNorm) could be mapped to a MeSH-Chemical.

The medication dispensing records that originate in insurance claims processing data refer to drugs using the NDC. There were 3,731,830 medication dispensing records for 33,703 distinct patients and 23,307 distinct NDC codes covering a time range from April 1994 to March 2008. For these NDC codes, we could look up SPL descriptions for only 26% of the distinct NDC codes and 60% of the instances, for reasons we have discussed earlier.21 This defective coverage of NDC codes by SPL could significantly penalize our SPL-based drug-intolerance issue detection; however, this is still sufficient for demonstrating the value of the SPL knowledge where it is available.

Drug-intolerance Issue Detection

Four methods for detecting drug-intolerance issues were tested:

  • (1) Medication orders against drug allergies using

    • (a) Gopher sets vs.

    • (b) MeSH classes; and

  • (2) Medication dispensing records (NDC) using

    • (a) Gopher sets vs.

    • (b) SPL and its MeSH classes for ingredient UNII codes.

The Gopher method (a) involved the transitive and reflexive closure of the Gopher set-subset-element relationship, and similarly the MeSH method (b) involved a transitive and reflexive closure of MeSH hierarchy. A drug intolerance was detected if a drug was ordered or dispensed that had any ingredient that was a derivative of (or the same as) the chemical structure not tolerated by the patient. To prevent excessive counting of replicate and outdated information, the intolerance and drug records had to be within 90 days of each other. This did detect some drug-intolerance issues in hindsight, but it is acceptable for this study that aims in testing the performance of the detection method, not the incidence of prescribing errors.

To compare the performance of the issue detection on a per-class basis, and to judge the nature of the discrepancies between the two methods, a full outer join table of the set of issues detected by Gopher vs. SPL/MeSH was created using the drug and intolerance term as the join key. This table was reviewed in the order of descending patient count.


Intolerance Issues Detection on Ordered Drugs

As summarized in Table 1, of 51,143 patients with both intolerance records and medication orders, any intolerance issues were found for 4,368 (9%) patients with the Gopher method and 8,832 (17%) with the MeSH method. Of 1,469 ordered drugs, 400 (27%) were found to be a subject of any drug-intolerance issue using the Gopher method and 420 (29%) using the MeSH method. Of 1,623 allergen codes, 270 (17%) were found to be the subject of any drug-intolerance issues using the Gopher method, and 375 (23%) using the MeSH method. Of 2,734,787 drug orders, 10,239 (0.4%) drug-intolerance issues were found using the Gopher method and 45,129 (1.7%) using the MeSH method. With both methods, 446 distinct drug-intolerance pairs were detected; 336 only with Gopher, and 491 only with MeSH.

Table 1

Number of Issues Detected in the Data Counted by Different Entities (Orders, Supplies, Patients, Drugs, and Allergens)

Intolerance Issue Detection on Dispensed Drugs

Of 33,703 patients with both intolerance and medication dispensing records, any intolerance issues were found for 1,223 (4%) patients with the Gopher method and 2,188 (6%) with SPL. Of 23,307 dispensed products by NDC, 659 (2.8%) were found to be a subject of any drug-intolerance issue using the Gopher method and 464 (2.0%) using the SPL method. Of the 1,623 allergen codes, 94 (5.8%) were found to be the subject of any drug-intolerance issues using the Gopher method, and 112 (6.9%) using SPL. Of 3,682,926 medication dispensing events, 3,337 (0.1%) drug-intolerance issues were found using the Gopher method and 13,749 (0.4%) using SPL. With both methods, 138 distinct NDC-intolerance pairs were detected; 717 only with Gopher, and 586 only with SPL.

The full outer join table that displays hits and misses side by side showed that most of the issues detected by SPL/MeSH were missed by Gopher due to the ad-hoc nature of its drug sets. Table 2 shows the top 24 MeSH classes referred to as intolerances using the count of distinct patients. In all these classes, the Gopher sets are incomplete, missing elements that should be included even regardless of the MeSH hierarchy. Especially multi-ingredient drugs are missed in the sets of their ingredients; for instance, not all acetaminophen/codeine drugs are elements of the set named codeines. A systematic query for all the sets that are mapped to MeSH and the actual elements of the Gopher sets found 820 such missing elements.

Table 2

Most Frequently Reported Allergen Classes


The SPL/MeSH method detects twice the number of patients with drug-intolerance issues and four times the number of issues than the current Gopher method, which uses manually maintained drug sets. This effect is substantial, outweighing the incomplete mappings from RI/Gopher terms to MeSH (70%), UNII to MeSH (70%), and surprisingly even the NDC to SPL mapping (26%). Although these mappings were all limited, recall that most of them (except NDC) were completed by descending instance count. The improved issue detection was of course attributable mostly to these frequent drugs and intolerances. Although the mapping tables developed for this study are not complete enough to drive actual patient care yet, the study shows that where SPL coverage does exist it appears to provide value for decision support.

Although a 2- and 4-fold increase in number of detected issues seem impressive and certainly demonstrates the value of the approach, using Gopher prescription data and patients with allergies noted in the Gopher system may have introduced bias. Medication orders written at the time an allergy was present on a patient's allergy list would have created a warning and the prescriber would have had the chance to select a different drug. Hence in the comparison of decision support performance we may see fewer cases than if we had selected a sample that had been entered in a CPOE system without decision support. However, inclusion of drug dispensing data, and our 90-day window allowing drugs to match even retrospectively, added drug-intolerance records, and the fact that the better performance is consistent even on a class-by-class level all confirm that the SPL method indeed outperforms the Gopher method.

There are two main reasons why the SPL-method outperforms the Gopher method. Firstly the SPL method is more systematic and complete, capable of resolving a substance class to all drugs that include any ingredient of that class. The two main improvements of the SPL design over the Gopher methods are the use of a systematic chemical class terminology (MeSH) but also the use of a medication knowledge model (SPL) that describes the medicines by ingredients rather than merely classifying them in hierarchies. Therefore the SPL method outperformed the Gopher method specifically in multi-ingredient products that were not always mentioned in all of the Gopher drug sets where they belonged.

The second important reason why the SPL method outperforms the Gopher method is the deeper structure in which the MeSH hierarchy organizes different chemical classes, such as hydrocodone under morphine, and azithromycin under erythromycin. However, one should ask whether pointing out more remote members of the classes implicated in the intolerance is always beneficial. At the RI it is believed that the intolerance warnings do not cause as much consternation with our users as some overly zealous drug-interaction alerts; however, there is evidence that false alarms cause users to habitually ignore even potentially serious issues.23 24

For example, in Figure 6 we can see for penicillin allergy how the MeSH hierarchy works very well. Although the structure of MeSH headings are not strictly a taxonomy, we can—and NDF-RT does—interpret it as a taxonomy for the case of chemical structures, if we carefully disambiguate the concepts. It is not correct to say that “amoxicillin is-a penicillin-G” because it is not true that amoxicillin inherits all properties from penicillin-G. However, it is correct to say that amoxicillin is-a penicillin-G-derivative.

Figure 6

MeSH headings under the β-lactam family are an example how MeSH concepts are a taxonomy only if we understand the MeSH chemical structures as derivatives. It is not true that the molecule amoxicillin is-a penicillin-G molecule, but we can say that amoxicillin is a penicillin-G derivative.

We propose that more precise terms for the chemical structures should be formed by distinguishing derivatives [D] from the specific complete and unaltered molecule [M]. That is, a terminology design pattern analogous to the structure-entirety-part (SEP) triplets should be used. The SEP triplets were first described by Schulz et al.25 and are now used in SNOMED-CT for modeling anatomic terminology. For example, an anatomic name, such as “kidney”, can be understood as kidney-structure (what looks like kidney tissue under a microscope), entire kidney (complete organ), or kidney-part (e.g., a calyx). For chemical structures, one should at least distinguish derivative (structurally similar) relationships from the entire molecule, and one might also consider parts, moieties, as well to complete the triple. Thus, although it is not true that amoxicillin molecule [M] is-a penicillin-G molecule [M], it is true that amoxicillin molecule [M] is-a penicillin-G derivative [D].

The default names for the MeSH headings for drugs are often unfortunate. For example, instead of adding [D] to the MeSH headings “Penicillin G”, “Ampicillin”, and “Amoxicillin”, the synonyms “benzylpenicillin”, “amino-benzylpenicillin”, and “hydroxyamino-benzylpenicillin” would be more clearly recognizable as chemical derivative class names rather than entire molecule names. However, in the context of SPL, we may interpret all of the MeSH concepts as derivative structure classes because in SPL it is the UNII codes that are used for specific ingredient molecules, and MeSH is only used for chemical structure generalizations.

In general, anyone who uses the MeSH structure classes for drug-intolerance issue detection must rely on the assumption that every molecule that is a derivative of the molecule that is not being tolerated will induce a similar adverse effect (immunologic or otherwise). This assumption might not always be true, and when it is not true, may lead to false-positive issue detection.

The assumption is generally true for the case of penicillin allergy. We wish to clarify a confusion found in some informatics articles published previously on the subject: a certain commercial knowledge base reportedly warns routinely about cephalosporin in the presence of allergy against penicillin, which was perceived to produce excess allergy alerting.22 According to the MeSH hierarchy, this alert should not be generated, because cephalosporin is not a derivative of penicillin. Only if an allergy against β-lactam [D] was asserted should cephalosporin derivatives be detected as issues. However, a β-lactam allergy is seldom asserted because the β-lactam ring is not the major determinant for the penicillin allergy.26 Thus, the MeSH taxonomy works correctly for cases like this.

The reasoning based on derivative class becomes problematic, however, when we turn to the sulfonamide allergy, the second most common antibiotic allergy. In common jargon, sulfonamide allergy is abbreviated to “sulfa-allergy”, leading to the misconception that this is an allergy against the element Sulfur or “sulfur containing drugs”. This is incorrect. Sulfonamide is a specific R-(SO2NH)-R structure shown in Figure 7. The different residues of the different sulfonamides account both for their many clinical uses as anti-bacterial, anti-malaria, anti-diabetic, diuretic, and β-antagonist, and for the immune responses against them. The question arises whether, for instance, a thiazide or furosemide given to a patient with allergy to sulfamethoxazole is a drug-allergy issue. Although most drug labels remain conservative, it is now known that the sulfonamide structure is not a determinant for the immune response, but that instead the aryl amine moiety plays an important role and this is absent in thiazide and furosemide. The very concept of sulfonamide allergy has therefore been questioned, and sulfonylarylamine allergy had been proposed instead.27 Although this is not reflected in the MeSH hierarchy today, the chemical class taxonomy could be easily modified to accommodate this change in paradigm and to increase precision.

Figure 7

MeSH hierarchy for sulfonamides. Both allergenic sulfonamide antibiotics and nonallergenic compounds are under the same heading Sulfanilamides. The term Sulfonyl-aryl-amine is used in the literature to describe the allergen class more appropriately. Thus when the amino group is substituted to an amide as in furosemide, the compound apparently loses its allergenity. This suggests that allergenity does not inherit along all derivative relationships in the MeSH hierarchy, but that in some cases the taxonomy must be restructured to separate those compounds with allergen properties from others without such property.

Codeine and morphine are frequently implicated in “opioid allergies”, but rarely are these true allergies. In most cases they are gastrointestinal side effects such as vomiting, and sometimes a specific direct histaminergic action that can lead to severe adverse events clinically presenting as anaphylaxis. Therefore recording this intolerance is justified. However, the effect is not the same for all opioids, only for the morphine-codeine group, not, for example, for pethidine. These viable alternatives are in separate MeSH classes, suggesting again that if the correct MeSH class is chosen, then it is correct to increase the number of detected issues for morphine, codeine, hydrocodone. and oxycodone. Unfortunately the vast majority of reported “opioid allergies” are not of this kind; hence the opioid intolerance issue generates the most frequently overridden warning.24 22

The use of chemical structure classes from NDF-RT/MeSH does seem to lead to appropriate drug-intolerance issue detection. Where that is not the case now, it should be modified to allow more precise coding and detection of the intolerance issues. Indeed the most important remedy for spurious allergy warnings in the presence of a terminologically correct reasoning system seems to be a greater accuracy of the intolerance records themselves. More even than the documentation of the clinical features and severity of the last observed effect, it seems that an evidence-based guidance for the user to a better choice of chemical structure class is the most important intervention to reduce false positives and improve effective prevention of true adverse effects—a decision support intervention of a new kind.

Of course, given the very preliminary mapping of terminology, the SPL/MeSH method has also missed many issues that the Gopher method found. Most of them were due to the intolerance concept not having been mapped to MeSH. The high-frequency terms not mapped were not true chemical classes but a mechanism of action (e.g., “beta-blocker”) or therapeutic class (e.g., “antibiotics”). Although these could have been mapped to an eclectic ad-hoc set of MeSH terms, it would have introduced the same maintenance problem we see so clearly with the ad-hoc nature of the Gopher sets. Also, if false-negative reminders are a concern, it would be better not to offer such broad concepts as “opioid” or “NSAI” for intolerance records. Without further explanation, these intolerance statements are not actionable and in many cases not accurate.

One limitation of this study is, of course, that we only compared the SPL method against one particular in-house maintained CPOE decision support system and its knowledge base. Hence the study shows only an exemplary value of the SPL method for knowledge management. Today only a few leading institutions use locally maintained drug lists for intolerance checking. Most large institutions use one of the commercial drug knowledge base products (First Data Bank, Medispan, etc.). However, many smaller institutions or outpatient practices may not be able to afford them. Even for larger institutions, using locally maintained knowledge bases that combine public and commercially derived content may be advantageous to better handle intolerance classes, because, as Hsieh et al.22 reported, commercial intolerance checking systems are not necessarily superior in all aspects to public ones. Naturally, any actual content distributed publicly through SPL is open for use by commercial vendors, and we should validate our study using a commercial knowledge base. That said, it is not possible to conduct the same study with all existing CPOE systems, and the findings would be quickly outdated by improvements made to the systems studied.

The important point of this article is to show that the SPL initiative is delivering value to our field in the form of a product model that works well for decision support functions and an increasing amount of public computer actionable knowledge content. Both are available to health care providers, information system vendors, and commercial knowledge vendors. The specific but generalizable practical approaches offered in this article may assist others in using both the product model and the knowledge content to improve decision support in their systems.


Despite being severely disadvantaged by incomplete terminology mapping and coverage, a drug-intolerance detection method using SPL—as it will exist soon—and its public terminology sources—as they exist today—can be implemented in a standard relational database and finds several times more issues than an in-house maintained dictionary of ad-hoc drug sets. Although most of the additionally detected issues are logically justified, the increased sensitivity highlights the importance of both well-maintained chemical structure terminology and more accurate intolerance records to increase the overall effectiveness of this safety feature.


The author thanks Linas Simonaitis, Paul Dexter, and Marc Overhage. Without Randy Levin's genial leadership this work would not have been possible at all.


  • This article was presented at the AMIA 2008 Fall Symposium and was selected for extended publication.

  • This work was performed at the Regenstrief Institute and is funded in part by the Agency for Healthcare Research and Quality (AHRQ) grant R01 HS15377 and the U.S. Food and Drug Administration (FDA).


Free Sample

This recent issue is free to all users to allow everyone the opportunity to see the full scope and typical content of JAMIA.
View free sample issue >>

Access policy for JAMIA

All content published in JAMIA is deposited with PubMed Central by the publisher with a 12 month embargo. Authors/funders may pay an Open Access fee of $2,000 to make the article free on the JAMIA website and PMC immediately on publication.

All content older than 12 months is freely available on this website.

AMIA members can log in with their JAMIA user name (email address) and password or via the AMIA website.