A System for Automated Lexical Mapping
- Correspondence and reprints: Jennifer Y. Sun, MD, MS, 57 Blossomcrest Road, Lexington, MA 02421-7103; e-mail: <jennifer.sun{at}childrens.harvard.edu>
- Received 8 March 2005
- Accepted 1 February 2006
Abstract
Objective To automate the mapping of disparate databases to standardized medical vocabularies.
Background Merging of clinical systems and medical databases, or aggregation of information from disparate databases, frequently requires a process whereby vocabularies are compared and similar concepts are mapped.
Design Using a normalization phase followed by a novel alignment stage inspired by DNA sequence alignment methods, automated lexical mapping can map terms from various databases to standard vocabularies such as the UMLS (Unified Medical Language System) and LOINC (Logical Observation Identifier Names and Codes).
Measurements This automated lexical mapping was evaluated using three real-world laboratory databases from different health care institutions. The authors report the sensitivity, specificity, percentage correct (true positives plus true negatives divided by total number of terms), and true positive and true negative rates as measures of system performance.
Results The alignment algorithm was able to map 57% to 78% (average of 63% over all runs and databases) of equivalent concepts through lexical mapping alone. True positive rates ranged from 18% to 70%; true negative rates ranged from 5% to 52%.
Conclusion Lexical mapping can facilitate the integration of data from diverse sources and decrease the time and cost required for manual mapping and integration of clinical systems and medical databases.
Access to medical information is hindered by the variation that is inherent in the lexicon of medical terminology. As the medical field moves toward electronic health records, portability of patient information, and sharing of information across institutions, the need for a method to computationally normalize and map nonstandard terms and concepts to a standard vocabulary becomes more important.
Several algorithms have previously been proposed to automate translation between medical vocabularies including the use of frames, semantic definitions, diagrams, and a combination of lexical, logical, and morphological methods.1 2 3 4 5 6 7 The Unified Medical Language System (UMLS), a product of the National Library of Medicine,8 exists to help in the development of systems to “understand” the language of biomedicine and health.9 10 The current version of the UMLS Metathesaurus contains information about over one million biomedical concepts and five million concept names from more than 100 controlled vocabularies and classifications (some in multiple languages). It includes vocabularies and coding systems designated as U.S. standards for the exchange of administrative and clinical data, including SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) and LOINC (Logical Observation Identifiers Names and Codes).
Tools developed for the UMLS include lexical variant generation, tools for customizing a subset of the Metathesaurus (MetamorphoSys), and extracting UMLS concepts from text (MetaMap).11 12 LOINC is a clinical terminology important for laboratory test orders and results, identified by the Health Level 7 (HL7) Standards Development Organization as a preferred code set for laboratory test names. The LOINC database also has a mapping program called RELMA, the Regenstrief LOINC Mapping Assistant developed by the Regenstrief Institute.13 None of these systems, however, have been evaluated as a fully automated mapping system on production databases from multiple health care institutions.
The Lexical INtegrator of Concepts (LINC) system performs completely automated lexical mapping of medical vocabularies. The following sections illustrate the stages involved in implementation of the LINC system and discuss key issues and trade-offs in performance.
Background
The development of tools to manage variation among the many medical vocabularies throughout clinical and research domains is a daunting but necessary task. As the medical field develops increasingly complex information systems to store and exchange electronic information, the task of reconciling variations in terminology becomes increasingly urgent. Many different approaches have been created to help solve this problem.
The University of California San Francisco UMLS group began the undertaking of intervocabulary mapping in 1988 using lexical matching. Sherertz et al.14 used filters and rules to transform input for mapping. They attempted to map disease names and disease attributes to MeSH terms with mapping in 47% to 48% of phrases in both cases.
The SPECIALIST lexicon, a UMLS knowledge source, is an English-language lexicon that contains biomedical terms as well as common English terms. Lexical records contain base form, spelling variants, acronyms, and abbreviations. In addition, there is a database of neoclassical compound forms, single words that consist of several Greek or Latin morphemes that are very common in medical terminology.15 Language-processing tools have been developed to mediate between existing vocabularies and the lexicon. These programs consist of modules for normalization and lexical variant generation, including lowercasing, removing punctuation, removing stop words, sorting words in a term alphabetically, generating inflectional variants, reducing words to their base forms, and generating derivational variants for words.16 17 18 However, these tools do not automate the task of mapping existing vocabularies to the UMLS Metathesaurus.
A major issue in natural language processing is a lack of a truly comprehensive clinical vocabulary. The Large Scale Vocabulary Test19 20 showed that a combination of existing terminologies can represent a majority of concepts needed to describe patient conditions in a range of health care settings and information systems. This finding is significant in that it can serve as a strategy for improving current vocabularies and perhaps in establishing a standard national vocabulary.
Structured or coded data in electronic medical records is not widely available. Instead, the main source of clinical information is in unstructured-text reports and unconstrained data. Many natural language processing systems have been developed for the purpose of extraction of clinical medical information.19 20 21 22 23 24 25 These systems all include text parsers and part-of-speech tagging of text, some type of normalization method or variant expansion, and mapping of the individual words/terms found. One of the core issues with these systems is the mapping of the normalized words to a standard vocabulary. Another issue is the development of specific lexicons for each individual system that are not necessarily generalizable. Our work differs from related work because our objective was not to extract text or attempt to parse it, but to develop a method focusing on the mapping of noun phrases to any standardized vocabulary, not a specific lexicon.
Sager et al.19 developed the Linguistic String Project, which is one of the first comprehensive natural language processing systems for general English and was later adapted to medical text. The MedLEE system (Medical Language Extraction and Encoding system) was developed as part of the clinical information system at Columbia by Friedman et al.20 for use in actual patient care and was shown to improve care. The MedLEE system was also applied to map clinical information to coded forms.21 The MedsynDikate system developed at Freiburg University is used for extracting information from pathology findings reports. Their novel approach involved researching interdependencies between sentences while creating a conceptual graph of sentences parsed.22
The MetaMap Program, by Aronson et al.23 24 is a configurable program that maps biomedical text to concepts in the UMLS Metathesaurus. The algorithm uses variant generation using knowledge from the SPECIALIST lexicon, and then maps these variants to the Metathesaurus.
Lau et al.25 propose a method for automated mapping of laboratory results to LOINC codes. Their method uses a series of rules to parse and map the laboratory tests to the LOINC descriptors and then to the LOINC codes. This method is very specific in its mapping process and would be difficult to generalize to other medical vocabulary domains.
Systems have also been developed to map abbreviations to full forms such as AbbRE (abbreviation recognition and extraction) from Yu et al.26 and systems for scoring abbreviation expansions such as that by Chang et al.27 AbbRE uses a set of pattern-matching rules to map an abbreviation within free text (biomedical articles) to its full form. The system identifies abbreviations that are defined within the article, which in the case of biomedical articles, only occurred 25% of the time. The remaining undefined abbreviations could only be mapped to abbreviation databases in 68% of cases.
Methods
Databases Used
Database dictionaries of laboratory studies were obtained listing all available names and definitions of laboratory tests. These databases, representing the query terms, were obtained from Children's Hospital in Boston, MA (TCH), Intermountain Health Care in Utah and Idaho (IMH), and the Dana Farber Cancer Institute in Boston, MA (DFCI).
The TCH laboratory database contained 5,985 terms, of which 5,566 were unique terms; the IMH database had 4,440 terms, of which 3,364 were unique; and the DFCI database had 326 unique terms.
Two dictionary sources were used: UMLS and LOINC. The UMLS dictionary has 11,033 terms and the LOINC dictionary has 34,840 terms. The UMLS dictionary terms are from the Semantic Network branch of the Metathesaurus that lists laboratory tests. The LOINC database was downloaded from version 2.12 (http://www.regenstrief.org/loinc/) and includes all concepts, not just laboratory concepts. Examples of full terms are shown in Table 1A.
Examples from the Laboratory Terms Databases
Examples of normalized terms from each database, excluded words highlighted in bold
Overview
A graphical flowchart of the processes used in LINC is summarized in Figure 1. The following sections present the individual steps in the overall method: preprocessing the query and dictionary terms, the alignment process, and postalignment sorting of the candidate mappings. The alignment process of mapping query terms occurs in several phases, which proceeds only if a match threshold is not reached, as discussed below.
Preprocessing for Normalization
To account for variations in the query vocabulary compared to the dictionary vocabulary, normalization of each term was necessary. The normalization process involves converting the term to lower case, splitting the term into tokens (its constituent words), removing punctuation, removing duplicate tokens, and sorting the tokens by frequency. The frequency-sorting step uses a database created for each vocabulary used, consisting of the number of occurrences of each token within that particular vocabulary. For the query vocabulary, the 1% most frequent terms are removed for the initial mapping (see details in “Exclusion Terms” section below).
After normalization, two approaches to string alignment were employed. In the first method, the query terms are passed to the alignment method as separate tokens, scored separately, and then recombined to give an overall mapping score. In the second method, the normalized tokens are concatenated back into a string and passed to the alignment method and scored as a whole.
Alignment
The inspiration for this algorithm comes from DNA sequence alignment algorithms such as BLAST (Basic Local Alignment Search Tool).28 29 30 31 32 LINC uses a matrix structure to find the best alignment between two strings: the query term and the dictionary term (Fig. 2). Every query term is mapped to every dictionary term to determine the dictionary terms with the best alignment. The matrix is initialized with zeros, and then for every cell where a character of the query term matches a character of the dictionary term (not including spaces), the numeral “1” is placed as a marker in the cell. To fully explore all possible alignments, the algorithm iteratively performs depth first searches to find the “best” alignment. Using principles of dynamic programming, the larger problem of finding the optimal alignment is solved by finding the solution to a series of smaller problems, namely, finding the optimal alignment for substrings. The smaller problems do not require recalculation because their results are saved, thus improving computational efficiency.
Matrix representation of alignment. Query term is alk ptase (on the vertical) and dictionary term is alkaline phosphatase (on the horizontal). Highlighted cells represent positions where a character of the query term matches a character from the dictionary term within the optimal alignment.
The details of the alignment process are as follows. Each matrix cell containing a “1” (a character match) is represented by a node. The nodes are then linked together into a chain starting at the lower left corner of the matrix and proceeding to the right (Fig. 3). Starting with the first node in the chain, a score is calculated as the sum of the score for the first node plus the sum of the score for the next node; the calculation continues in this recursive way. By aligning every character of the query term to every matching character in the dictionary term, we are performing a depth-first search. Each node has an optimal score associated with it, representing the highest possible score that can be attained from that node forward (see “Scoring Algorithm” section). The algorithm tracks the scores for nodes that have already been traversed. Finding the node with the maximum score and following the path from that node through the matrix retrieves the best overall alignment.
A graphical representation of how the chain is linked together. The dark solid arrows of the matrix show the reading of the cells from lower left and up the column. Each column is read from bottom to top, proceeding from the leftmost column to the right. The lower image shows the chain representation of the matrix. Coordinates represent the position of the cell in the matrix. The cells marked “1” are read from the matrix starting from the lower left corner, up the first column (column 0 in Figure 2), then proceeding up the second column (column 1) and continuing through the table from left to right.
Scoring Algorithm
The scoring method is a combination of multiple factors that contribute to an optimal alignment. Each node (matched character) is given an initial score of 1. To penalize gaps between matching characters, the initial score is divided by the squared distance between the current node and the next mapped node (i.e., the gap).
Continuity between mapped characters is also tracked to benefit longer continuously matched node chains. A proximity score is calculated based on the Euclidian distance between any two nodes. If two matched characters are continuous within the original query term and dictionary term, then the proximity score equals 1. The continuity score is the sum of the proximity scores from all the nodes within the chain. But when two or more nodes have proximity scores equal to 1, the continuity score for that portion of the chain is squared prior to adding it to the overall continuity score. For example, a chain of four continuous nodes would have proximity scores of 1 + 1 + 1 = 3, and this would then be squared to add a continuity score of 9 to the overall score.
Overall, the scoring scheme is skewed toward greatly rewarding longer chains (greater continuity) of matched nodes.
Specific examples of scoring follow.
-
Example 1
-
query term = AST
-
dictionary term = AST
-
Node score (A) = 1
-
Node score (S) = 1
-
Node score (T) = 1
-
Proximity scores = 2 (A-S is continuous, and S-T is continuous)
-
Continuity score = 4 (22)
-
Overall score = 1 + 1 + 1 + 4 = 7
-
Example 2
-
Query term = AST
-
Dictionary term = ALT
-
Node score (A) = 1
-
Node score (S) = 0
-
Node score (T) = 1
-
Proximity score = 0.5 (A-T is not continuous)
-
Continuity score = 0.25
-
Overall score = 1 + 0 + 1 + 0.25 = 2
-
Example 3
-
Query term = AST
-
Dictionary term = ASB
-
Node score (A) = 1
-
Node score (S) = 1
-
Node score (T) = 0
-
Proximity score = 1 (A-S is continuous, and S-T is not continuous)
-
Continuity score = 1
-
Overall score = 1 + 0 + 1 + 1 = 2
Exclusion Phase
In the preprocessing of the query vocabulary, the 1% most frequent words in that particular vocabulary are removed. Because these words are so prevalent within their respective vocabularies, they are often nonspecific and contribute an excessive amount to the alignment mapping between terms. Examples of these words are shown in Table 1.
The initial alignment is done without these words. Then a second alignment (“second pass” in the flow chart) is done using the candidate mappings from the initial alignment, with the excluded words added back to the overall query term. By adding back the excluded words, higher scores could be assigned for the most specific mappings.
Abbreviation Expansion Process
Within the overall LINC process, the abbreviation expansion steps occur in the third through fifth passes of the algorithm. If a query term is not mapped in the first two passes (see flow diagram), it then falls into the abbreviation expansion process.
Using a compilation of abbreviation dictionaries from various online resources, an overall abbreviation table was created. Online sources for abbreviations included Wikipedia, Eugene Free Community Network, JD.MD, and Börm Bruckmeier Publishing.33 Tables were created expanding each query and dictionary term if an available abbreviation was found within the term. All combinations of the abbreviations were also created, i.e., if a term consisted of two words such as alk phos, alk with one possible abbreviation expansion (to alkaline) and phos with two possible abbreviation expansions (to phosphorus and phosphatase), there would be five possible combinations for expansion (alk phosphorus, alk phosphatase, alkaline phos, alkaline phosphorus, alkaline phosphatase). If there were more than 16 combinations, the term was deemed to have no relevant expansions since the expansions were then nonspecific and ambiguous. In the TCH database, there were 5,566 terms of which 1,359 had greater than 16 abbreviation expansion combinations. The IMH database had 3,364 terms, of which 692 terms had greater than 16 expansions, and the DFCI database had 326 terms, of which 23 had greater than 16 expansions.
The algorithm uses three different passes to expand and map abbreviated terms. Alignment is attempted between the expanded query term and the unexpanded dictionary, then with the unexpanded query term and expanded dictionary, and last with the expanded query term to the expanded dictionary.
Match Threshold
As a part of the scoring, a threshold was set to determine whether two terms were appropriate mappings based on the scoring algorithm described previously. The threshold is a percentage of a “perfect match,” a perfect match being the case where the query term and the dictionary term are identical (normalized) character strings. Query terms that had no candidate mappings scored above the threshold are defined as “unmapped.”
To find the optimum threshold, the mapping algorithm was run with scoring thresholds ranging between 50% and 100%, in 5% increments. Plotting a receiver operating characteristics (ROC) curve, we further investigated thresholds between 80% and 90% (in increments of 1%), finally choosing a threshold of 85% as the best balance of sensitivity and specificity.
Post-processing
Several different techniques were used to reorder the candidate mappings so that the “best” mapping would be sorted to the top of the list. These methods include
-
Sort by dictionary term length
-
Sort by position: summing the x coordinates of the matched characters within the dictionary term
-
Sort by position method 2: summing the first and last x coordinates of the matched characters of the dictionary term
-
Sort by score and position: a new score that is a combination of the score and position of mapped terms
-
Score: the score of the current query term
-
Max score: the highest scoring mapping for the current query term
-
Position sum: the sum of all the x coordinates for the mapped characters of the dictionary term
-
Max position sum: the maximum position sum of all the dictionary terms that map to the current query term.

-
Sort by percentage mapped, a new score that is a combination of the score and the percentage of the dictionary term mapped

Evaluation
Using a random number generator, 200 query terms were randomly chosen from each query term database (TCH, IMH, and DFCI) for evaluation. LINC mappings were generated for each of these terms using various combinations of the algorithms discussed previously. For each LINC mapping run, a 2 × 2 truth table was then constructed to tally true positives (the algorithm found the correct mapping in the dictionary), false positives (the mapping found was incorrect, but scored above the threshold), false negative (the mapping scored below the threshold, but the term was present in the dictionary), and true negatives (the mapping was below the threshold and the term was not in the dictionary). The gold standard for the truth table was based on manual evaluation of the mappings by the investigators.
A mapping was labeled true positive as long as an appropriate term from the dictionary scored above the threshold. In some cases, there were multiple mappings above the threshold, of which some would be correct mappings and some would be incorrect mappings; these were still labeled as true positives in that at least one correct mapping was identified. If the query term was ambiguous or undecipherable (as was the case with many abbreviated terms), those mappings were considered true negatives or false positives depending on the mapping results since there was no means by which to confirm a mapping. Additionally, for all terms for which the system did not generate a mapping scoring above the threshold, the investigator manually searched the dictionary vocabulary for the query term to determine whether the mapping should be labeled as a false negative.
Results
Results for 19 alignment runs are shown in Tables 2 3 4 5. In initial experiments with TCH, many different scoring algorithms and postprocess sorting of mappings were evaluated. The initial runs were with TCH versus the UMLS only. An overall “best” algorithm was chosen from the initial runs and applied to TCH, IMH, and DFCI versus LOINC only. The abbreviation phases were added last. Table 2 shows an increase in the true-positive rate with the changes in scoring algorithms and postprocess sorting. The best mapping showed a true positive rate of 28% and a true negative rate of 50%, with an overall percentage correct of 78%. It was also noted that normalizing the query term using alphabetization or sorting the tokens by frequency did not seem to make much difference and created some new problems, so both those methods were removed for the rest of the runs.
Summary of Alignment Algorithm results of TCH Laboratory Terms Versus the UMLS Standard Vocabulary
Summary of Alignment Algorithm Results of TCH Laboratory Terms to the LOINC Standard Vocabulary
Summary of Alignment Algorithm Results of IMH Laboratory Terms Versus LOINC Standard Vocabulary
Summary of Alignment Algorithm Results of DFCI Laboratory Terms Versus LOINC Standard Vocabulary
Our initial mapping with TCH (5,985 terms) versus UMLS (11,033 terms) averaged 50 terms mapped per minute. Mapping to a larger dictionary like LOINC (34,840 terms) lowered the speed to six to ten terms mapped per minute.
Table 3 shows the mapping of TCH to LOINC with better success except when the abbreviation algorithm is applied. Table 4 for IMH shows an overall true positive rate between 65% to 70% with 99% of the mappings falling in the top three mappings produced, with 95% in the top position. With the DFCI database, Table 5 shows not much change in mapping with changes in the algorithm; mappings were between 62% and 63% correct with 33% to 34% true positive rates. Again, the mappings ranked in the top position in 82% of cases and 91% ranked in the top three positions.
Additionally, the query terms and matching dictionary terms were evaluated to determine the rate of nonmatching words within either. In the TCH versus UMLS runs, there were nonmatching words in 85% to 95% of mappings. In TCH versus LOINC, nonmatching words occurred in 98% to 100% of cases; in IMH versus LOINC, in 100% of mappings; and in DFCI versus LOINC, in 97% to 100% of mappings.
As shown above in Table 6, there were varied results from the abbreviation expansion. For the TCH database, the percentage of correct mappings actually decreased. Although the abbreviation expansion process had found ten new mappings, it also incorrectly expanded 33 terms resulting in a net decrease. For DFCI, the expansion netted no change in percentage correct, although the method did find as many as 26 new mappings. In the case of the IMH database, there was a net increase resulting from 12 new mappings found and only four errors in expansion.
Joint Results Table: TCH, IMH, and DFCI Versus LOINC, Run 2 Versus Run 6
The system is biased to identify more mappings, thereby producing more false positives. It was assumed that it would be easier to eliminate incorrect mappings than to manually search a large vocabulary for an appropriate mapping.
Discussion
A comparison of the LINC system against past vocabulary mapping investigations is highlighted by three key features. First, our system is flexible enough to allow mapping of different vocabularies, possibly from different domains. The system is not dependent on the creation of specialized lexicons to perform the mappings. Second, our system is fully automated, requiring no preformatting, manual input, or construction of rules or filters. Finally, the LINC system has been evaluated with multiple real-world vocabularies against both the UMLS and LOINC.
In experimenting with different algorithms for normalization and mapping within LINC, there were trade-offs for each method used, and no one optimal method was found. In mapping query terms to dictionary terms, it was noted that the order of words within the terms differed from vocabulary to vocabulary. For example, the TCH query term “glucose, whole blood” does not exist in that exact form in the UMLS Metathesaurus. Instead, the UMLS has the term “whole blood glucose tests.” The matching algorithm was run against a subset of the UMLS Metathesaurus, which was likely helpful in decreasing the number of false positives.
Three methods were employed to deal with these types of variation. The first method was alphabetizing the words within each term. Through the string processing method, the query term would then become “blood glucose whole” and the UMLS term would become “blood glucose tests whole.”
Another method was to order the words within each term by their frequency in the vocabulary. From each vocabulary, a frequency table was generated to tally the frequency of each word within its vocabulary. Using this frequency method, the TCH query term “whole blood glucose” would become “whole glucose blood.” Because each term is ordered using frequencies within its own vocabulary, the UMLS term “whole blood glucose tests” would become “whole glucose tests blood.”
The third method was to break each term into tokens (words) and perform a separate alignment on each token, from which the scores were joined to obtain an overall score. In the third method, the order of words would then not matter because as long as the token from the query term was found within the dictionary term, it would be mapped.
Among the three vocabularies mapped, there was variation in the sensitivity and specificity. The vocabularies were quite different, as shown in Table 1, with the TCH vocabulary having many more nonmappable terms; thus, the specificity was higher because there were many more true negative terms. The IMH terms most closely resembled the LOINC terms; thus, that vocabulary had the highest sensitivity. The DFCI terms were abbreviated in many cases; thus, when expanded, many could be mapped to the LOINC vocabulary, increasing the sensitivity.
With the alphabetization method, although words will be ordered in a standard manner across vocabularies, the ordering may create gaps in the alignment due to words in the dictionary term that do not occur in the query term; as in the example shown previously, the term “tests” creates a gap in an otherwise continuous string. With the frequency method, the least frequent terms will be the most relevant, but the downside is that words may be reordered in such a way that the score is decreased due to poor alignment.
Another issue that hindered lexical mapping was the existence of less “discriminating” or specific words within the query term, such as “qualitative,” “quantitative,” “urine,” “blood,” and “csf” (cerebrospinal fluid). These terms are clinically relevant, but not the critical tokens that differentiate a query term. LINC addresses this issue by removing the most frequent terms in the query vocabulary (top 1%) from the query term for the initial alignment/mapping. After candidate mappings were identified during the initial alignment, a second alignment was run with the query term reincorporating these initially excluded tokens to obtain the most specific mapping possible.
LINC tackles the obstacle of abbreviations in both the query terms and the dictionary terms by running several phases of abbreviation expansion. Because of the limitations of the available abbreviation dictionaries, however, many appropriate expansions were unavailable and many inappropriate expansions were performed, thus increasing the number of false positives and decreasing the specificity of the match. For example, “ser” was only expanded to serine, not serum. There is a trade-off with customized abbreviation dictionaries; with more possible expansions, there may be more matches, but there also may be more false positives generated from the incorrect expansions. Additionally, the matching algorithm was run against a subset of the UMLS Metathesaurus, which was likely helpful in decreasing the number of false positives; therefore, the higher specificity in the matching runs with the UMLS may be confounded by that factor.
Another major consideration is how to choose the “optimal mapping.” A single query term may map to multiple terms in the dictionary vocabulary, but in an automated system, we only want the “best” mappings. Two methods were used to extract the best mappings from the list of candidate mappings generated. The first method was to use a mapping threshold. Using this threshold, the accuracy of the mapping can be controlled; an exact mapping would have a threshold of 100%, meaning that all words in the query term must appear in the dictionary term as exact string matches. As detailed in the “Methods” section, an ROC curve was used to optimize the threshold.
The second method to extract the optimal mapping was to sort the list of candidate mappings according to various scoring metrics. As described previously, several sorting methods were evaluated. Initial sorting methods were based on the length of the dictionary term and the sum of positions of the character mappings of the dictionary terms, under the assumption that a shorter dictionary term without other extraneous tokens would be more relevant. In the end, the best sorting method was a simple one that scored mappings by the percentage of characters in the dictionary term that matched.
Limitations and Future Goals
A limiting factor to the breadth of this investigation is the choice of standardized medical vocabularies used as well as native ambiguity in some of the legacy terms. Neither the UMLS Metathesaurus nor LOINC covered the entire domain of laboratory tests that is available at this time. In addition, the available abbreviation dictionaries for the latter part of our experimental runs were incomplete or often contained abbreviations irrelevant to our specific domain. Last, the system has only been tested on laboratory terms, which may not be representative of performance in other vocabularies such as diseases, medications, or genomics.
While the system is automated in its generation of potential mappings for a query term, currently, an expert/clinician still needs to confirm the correct mapping from the list of generated candidates. Thus, there still exists a trade-off between the speed of automation and manual accuracy. Although manual confirmation of the lexical mapping is more time intensive than allowing the system to function in a totally automated fashion, this still represents an improvement in efficiency compared to systems that require manual search or collation of vocabularies.
Currently, LINC is implemented as part of MEDIATE (Medical Information Acquisition and Transmission Enabler),34 an automated system developed to integrate data from multiple disparate sources using semantic networks to perform context-based mapping. LINC provides the underlying lexical mapping for the nodes of the semantic network and helps provide the ground structure to support context-based mapping. Further development of both the lexical and context-based mapping techniques should yield further improvements in automating electronic medical information exchange.
Future avenues of exploration might include the following experiments. A phase for synonym incorporation could be evaluated. Further tests against SNOMED would provide additional evidence to the generalizability of the system. In addition, testing on different real-world vocabularies within the clinical or research realm can help test the scope of the algorithm's application. Finally, the adaptability of LINC could be tested on other databases in the medical domain that are not limited to clinical databases. For example, the bioinformatics domain has many similar issues with standardization where automated lexical mapping will become more important.
Conclusion
In the drive to expedite and improve the efficiency of information exchange, the mapping of local clinical terminology to standardized vocabularies will always be necessary to accommodate the legacy systems of individual institutions. LINC uses novel methods to automate the lexical mapping of terms between medical terminologies. The evaluation of LINC on multiple real-world laboratory databases and two “standardized” medical vocabularies illustrates the continuing obstacles that confront data mapping efforts, and the performance of the system demonstrates promise to facilitate data exchange using automated lexical mapping.











