PolySearch2 Evaluations


PolySearch2 Evaluations

To assess the performance of PolySearch2, we conducted a speed test comparing only the speed of the original PolySearch with PolySearch2 on various queries with equivalent parameters. We then performed four evaluations to compare their accuracy. Finally three additional evaluations were conducted to assess the performance of PolySearch2 on several novel search tasks. Performance statistics including precision, recall, f-measure, and accuracy are presented in Table 1 for the 7 evaluations. Table 1 also lists the feature differences between PolySearch and PolySearch2.

Table 1 summarizes the performance evaluation and feature comparison of PolySearch 2.0 versus the original PolySearch. Evaluation #1 assesses PolySearch2’s ability to identify disease-gene association. Evaluation #2 assesses PolySearch2’s ability to identify drug-gene/protein associations. Evaluation #3 assesses PolySearch2’s ability to identify protein-protein interactions. Evaluation #4 assesses PolySearch2’s metabolite-gene associations. Evaluation #5 assesses PolySearch2’s ability to identify drugs with significant adverse effects, or ‘dangerous drugs’. Evaluation #6 assesses PolySearch2’s ability to identify toxin-disease association. Finally Evaluation #7 evaluates PolySearch2’s ability to identify toxin-adverse effect associations. Analysis speed is calculated based on multiple runs on query with 10,000 relevant documents.

All Evaluation datasets are available at the Downloads Page.

Table 1: Performance evaluation and feature comparison of PolySearch 2.0 versus the original PolySearch

PolySearch
PolySearch2
Prediction AccuracyPrecisionRecallF-measureAccuracyPrecisionRecallF-measureAccuracy
#1 Disease/Gene 0.6533 1.0000 0.7903 0.6533 0.8708 0.9091 0.8895 0.8525
#2 Drug/Gene 0.7490 1.0000 0.8565 0.7490 0.9701 0.8351 0.8975 0.8571
#3 Protein/Protein 0.8396 1.0000 0.9128 0.8396 0.9432 0.9326 0.9379 0.8962
#4 Metabolite/Gene 0.7834 1.0000 0.8785 0.7834 0.9579 0.8619 0.9074 0.8614
#5 Drug/Adverse Effect - - - - 0.9233 0.8022 0.8585 0.7737
#6 Toxin/Disease - - - - 0.9054 0.7864 0.8417 0.7810
#7 Toxin/Adverse Effect - - - - 0.8808 0.6822 0.7689 0.7854
System Features
PolySearch
PolySearch2
Thesaurus Size9 categories 57,706 terms with 353,862 synonyms 20 categories 1,131,328 terms with 2,848,936 synonyms
Filter words701129,718
Database Numbers1 corpus and 6 databases6 corpora and 14 databases
Num. of Search Types66 query combinations273 query combinations
Analysis Speed6.5 documents per second165 documents per second
Mobile Friendly?NoYes

PolySearch2 Evaluations - BioASQ dataset

To assess the flexibility of PolySearch2, we conducted an association test using BioASQ, a biomedical semantic Question Answering challenge's gold standard training dataset (Task 3B Training Set, released March 2015), and assessed PolySearch2's performance in finding associated disease concepts when presented with free-text sentences.

Table 2: Performance evaluation using the BioASQ Task 3B (biomedical semantic QA) gold standard training dataset. The search queries are question sententences from BioASQ and PolySearch2's disease association results are compared with tagged disease concepts in the BioASQ 3B gold standard training data set.

PolySearch
PolySearch2
Prediction AccuracyPrecisionRecallF-measureAccuracyPrecisionRecallF-measureAccuracy
#8 BioASQ Question / Disease - - - - 0.7284 0.6052 0.6611 0.7212


This project is supported by the Canadian Institutes of Health Research (award #111062), Alberta Innovates - Health Solutions, and by The Metabolomics Innovation Centre (TMIC), a nationally-funded research and core facility that supports a wide range of cutting-edge metabolomic studies. TMIC is funded by Genome Alberta, Genome British Columbia, and Genome Canada, a not-for-profit organization that is leading Canada's national genomics strategy with $900 million in funding from the federal government.