PhenoLines: Phenotype Comparison Visualizations for Disease Subtyping via Topic Models

AbstractPhenoLines is a visual analysis tool for the interpretation of disease subtypes, derived from the application of topic modelsto clinical data. Topic models enable one to mine cross-sectional patient comorbidity data (e.g., electronic health records) and constructdisease subtypes—each with its own temporally evolving prevalence and co-occurrence of phenotypes—without requiring alignedlongitudinal phenotype data for all patients. However, the dimensionality of topic models makes interpretation challenging, and defacto analyses provide little intuition regarding phenotype relevance or phenotype interrelationships. PhenoLines enables one tocompare phenotype prevalence within and across disease subtype topics, thus supporting subtype characterization, a task that involvesidentifying a proposed subtype’s dominant phenotypes, ages of effect, and clinical validity. We contribute a data transformation workflowthat employs the Human Phenotype Ontology to hierarchically organize phenotypes and aggregate the evolving probabilities producedby topic models. We introduce a novel measure of phenotype relevance that can be used to simplify the resulting topology. The designof PhenoLines was motivated by formative interviews with machine learning and clinical experts. We describe the collaborative designprocess, distill high-level tasks, and report on initial evaluations with machine learning experts and a medical domain expert. Theseresults suggest that PhenoLines demonstrates promising approaches to support the characterization and optimization of topic models.

Download publication

Get in touch

Something pique your interest? Get in touch if you’d like to learn more about Autodesk Research, our projects, people, and potential collaboration opportunities.

Contact us