Jamie P. McCusker
Rensselaer Polytechnic Institute
Director of Data Operations
Biography
Jamie P. McCusker is the Director of Data Operations at the Tetherless World Constellation at Rensselaer Polytechnic Institute. They work with Deborah McGuinness on using knowledge graphs to further scientific research, especially in biomedical domains. They have worked on applying semantics to numerous projects, including drug repurposing using systems biology, cancer genome resequencing, childhood health and environmental exposure, analysis of sea ice conditions and materials science. They are the architect of the open source Whyis knowledge graph development and management framework, which has been used across many of these domains.
Talks and Events
2022 Workshop: KGC 2022 Workshop: Healthcare and Life Sciences Symposium
We seek original contributions describing theoretical and practical methods and techniques for building and maintaining health knowledge graphs for the healthcare and life sciences domain. The symposium will cover topics around data integration, data profiling, data curation, querying, knowledge discovery, ontology mapping, matching, reconciliation, machine learning approaches, and applications. We will have several invited speakers who are thought leaders in the healthcare and life sciences space. Furthermore, we plan to have a panel discussion comprising experts from industry, government, and academia. In summary, the primary objectives of this symposium will be to provide a platform to discuss:
- Characterisation of healthcare and life sciences knowledge graphs
- Opportunities for the application of knowledge graphs in healthcare and life sciences
- Challenges of creating and maintaining such knowledge graphs
- Opportunities for knowledge graph research in this space
2021 Workshop: Annotating Tabular Data using Semantic Data Dictionaries
It is common practice for data providers to include text descriptions for each column when publishing data sets in the form of data dictionaries. While these documents are useful in helping an end-user properly interpret the meaning of a column in a data set, existing data dictionaries typically are not machine-readable and do not follow a common specification standard. We introduce the Semantic Data Dictionary, a specification that formalizes the assignment of a semantic representation of data, enabling standardization and harmonization across diverse data sets. The rendition of data in this form helps promote improved discovery, interoperability, reuse, traceability, and reproducibility. We present the associated research and describe how the Semantic Data Dictionary can help address existing limitations in the related literature. We discuss our approach, present an example by annotating portions of the publicly available National Health and Nutrition Examination Survey data set, present modeling challenges, and describe the use of this approach in sponsored research, including our work on a large National Institutes of Health (NIH)-funded exposure and health data portal and in the RPI-IBM collaborative Health Empowerment by Analytics, Learning, and Semantics project. This work has been evaluated in comparison with traditional data dictionaries, mapping languages, and data integration tools.