03-06 MAY 2021
THE KNOWLEDGE GRAPH CONFERENCE
[ WORKSHOPS AND TUTORIALS ]
Included as part of the All-Access ticket, workshops and tutorials are stand-alone events taking place May 3 – 4.
Workshops enable attendees to hear from leaders in the knowledge tech space in a smaller setting than at a conference talk. They have a separate call for papers and their own organizing committee.
Tutorials are learning sessions that include both lecture-style and hands-on sessions. Each tutorial will be for half a day unless specified.
Stay tuned for the full conference schedule that will be posted in early April.
This list is not exhaustive and more will be added as finalized.
Workshops
Abstract
This workshop establishes a bridge for researchers from industry and academia who are working on positive social impact in accordance with priorities outlined in the Sustainable Development Goals (SDGs).
The workshop targets practitioners of knowledge graphs methods, related domain SMEs, and anyone working to leverage knowledge graphs for social good.
We invite contributions applying knowledge graphs (and related methodologies) relating to any of the SDGs and related topics.
The work does not have to be unique but should demonstrate social impact.
Those wishing to present must submit an extended abstract (one page) for oral presentation and discussion with stakeholders from academia and industry.
To complete your submission, submit your abstract by the deadline of 16 April, 11:59 pm, PST.
View the website for more details.
Objectives
We are proposing an applied research-based workshop that will show how businesses, social scientists, the public sector and academia use knowledge graphs to have a positive social impact. The efforts include various kinds of problems they are solving, bottlenecks they have, and insights they are gathering. This workshop can also be an excellent place to identify and find projects/researchers for future collaboration.
The idea of the workshop is in accordance with priorities set up by SDGs but research on related topics will also be welcome.
Topics
The topics include, but are not limited to:
- Efforts such as the [International Aid Transparency](https://iatistandard.org/en/), [Open Contracting](https://www.open-contracting.org/), and [Open Corporates](https://opencorporates.com/) try to connect the dots between funds and registered entities.
- The SDGs, related datasets and relevant news articles can be represented as a massive semantic graph. It is a great use-case for knowledge graphs – as a semantic layer on top of existing data will help map relationships between different SDGs and understand the interaction as well as overlap between target concepts..
- Efforts to help on the social healthcare crisis such as Covid-19 and identification of social determinant of health using knowledge graphs.
Workshop Organizers
- Lambert Hogenhout
- Vivek Khetan
- Bogdan Sacaleanu
- Rayid Ghani
Abstract
Research in artificial intelligence and data science is accelerating rapidly due to an unprecedented explosion in the amount of information on the web [1,2]. In parallel, we noticed immense growth in research around Knowledge Graphs from industry and academia. Most focus of Knowledge Graphs has been on its construction, however, harnessing knowledge graphs/symbolic knowledge for end tasks is yet to be well explored. Particularly, integrating information from knowledge graphs with learning techniques for solving tasks. We call this “knowledge-infused learning” (KiL), an approach to encode knowledge for using it with deep learning. Knowledge Graphs are important in its use for neuro-symbolic AI [3,5,8]. In this changing world, retrospective studies for building state-of-the-art AI and Data science systems have raised concerns on trust, traceability, and interactivity for prospective applications in healthcare, finance, and crisis response [1,4].
Objectives
We expect that the KiL paradigm would account for both pieces of knowledge that accrue from domain expertise and guidance from physical models. Further, it will allow the community to design new evaluation strategies that assess robustness, explainability, interpretability, and fairness across all comparable state-of-the-art algorithms [6,7]. The proposed forum aims to address an escalating concern to develop a common substrate attracting people having machine learning skills with unstructured data and a better handle on the conceptual underpinnings of inference on structured data. Thus, we envision an audience, within the purview of Knowledge Graph Conference and having an interest in human-centered reasoning, personalized knowledge graph creation, and sustainable computing benefitting from the workshop.
Topics
Topics include:
- Shallow, semi-deep or deep infusion of Knowledge into Deep Learning
- Interpretability and explainability afforded by K-iL
- Human-allied Probabilistic Learning
- Languages and models for knowledge graph representation
- Knowledge graph enhanced Natural language Processing tasks such as Question Answering, and Natural Language Inference
- Knowledge graph enabled Reinforcement Learning agents
- K-iL enhanced Virtual Assistants/Conversational Systems, Human-Computer Interaction
- Commonsense Reasoning using Knowledge Graphs
- Knowledge Graph enhanced optimization for real-world applications (e.g. Social Good, Finance, Education, Healthcare)
Workshop Organizers
- Amit Sheth
- Ying Ding
- Pavan Kapinapathi
- Manas Gaur
References
[1] Topol, Eric J. “High-performance medicine: the convergence of human and artificial intelligence.” Nature Medicine (2019).
[2] Kelly, Christopher J., et al. “Key challenges for delivering clinical impact with artificial intelligence.” BMC medicine (2019).
[3] A. Sheth, M. Gaur, U. Kursuncu and R. Wickramarachchi, “Shades of Knowledge-Infused Learning for Enhancing Deep Learning,” IEEE Internet Computing, (2019).
[4] Gottesman, Omer, et al. “Guidelines for reinforcement learning in healthcare.” Nature Medicine (2019).
[5] Marcus, Gary. “The next decade in ai: four steps towards robust artificial intelligence.” arXiv preprint arXiv:2002.06177 (2020).
[6] Kapanipathi, Pavan, Veronika Thost, Siva Sankalp Patel, Spencer Whitehead, Ibrahim Abdelaziz, Avinash Balakrishnan, Maria Chang et al. “Infusing Knowledge into the Textual Entailment Task Using Graph Convolutional Networks.” In AAAI, pp. 8074-8081. 2020.
[7] Gaur, Manas, Keyur Faldu, and Amit Sheth. “Semantics of the Black-Box: Can knowledge graphs help make deep learning systems more interpretable and explainable?.” In IEEE Internet Computing, 2020.
[8] Kursuncu, Ugur, Manas Gaur, and Amit Sheth. “Knowledge infused learning (K-IL): Towards deep incorporation of knowledge in deep learning.” In AAAI-MAKE Fall Symposium, 2019.
[9] Lecue, Freddy. “On the role of knowledge graphs in explainable AI.” Semantic Web Preprint, 2019.
Abstract
Electronic health records (EHRs) have become a popular source of observational health data for learning insights that could inform the treatment of acute medical conditions. However, their utility for learning insights for informing preventive care and management of chronic conditions has remained limited. For this reason, the addition of social determinants of health (SDoH) [1] and “observations of daily living” (ODL) [2] to the EHR have been proposed. This combination of medical, social, behavioral, and lifestyle information about the patient is essential for allowing medical events to be understood in the context of one’s life and, conversely, allowing lifestyle choices to be considered jointly with one’s medical context.
We propose that the personal health knowledge graph is a semantic representation of a patient’s combined medical records, SDoH, and ODLs that would be generated by both patients and their providers and be potentially useful to both for decision-making. While there are some initial efforts to clarify what personal knowledge graphs are [3] and how they may be made specific to health [4, 5], there is still much to be determined concerning how to operationalize and apply such a knowledge graph in life and clinical practice. There are challenges in collecting, managing, integrating, and analyzing the data required to populate the knowledge graph, maintaining, reasoning over, and sharing aspects of the knowledge graph.
Following on the success of our inaugural workshop held at the Knowledge Graph Conference in 2020, in the second installment of this workshop, we aim to gather multiple stakeholders (health practitioners, health informaticists, knowledge engineers, and computer scientists) working on defining, building, consuming, and integrating personal health knowledge graphs. The workshop will focus on the challenges and opportunities in this nascent space with a strong focus on existing PHKG solutions.
Please submit your extended abstracts to https://easychair.org/conferences/?conf=phkg2021 no later than April 4, 2021, 11:59 AoE.
For more details, see https://phkg.github.io
Objectives
We seek original contributions describing theoretical and practical methods and techniques for building and maintaining personal health knowledge graphs. The workshop will cover topics around data integration, data profiling, data curation, querying, knowledge discovery, ontology mapping, matching, reconciliation, machine learning approaches, and applications of personal health. We will have several invited speakers who are thought leaders in this space. Furthermore, we plan to have a panel discussion comprising experts from industry, government, and academia. In summary, the primary objectives of this workshop will be to provide a platform to discuss:
- Definitions of personal health knowledge graphs
- Opportunities for the application of personal health knowledge graphs
- Challenges of creating and maintaining a personalized health knowledge graph
- Opportunities for knowledge graph research in this space
Topics
Topics include, but are not limited to:
- Applications of personal health knowledge graphs
- Real-world use cases
- Models to encode the relevant information in a personal health knowledge graph
- Perspectives on personal health
- Interoperability aspects when integrating personal health data from disparate sources
- Reasoning and querying over a personal health knowledge graph
- Adaptively contextualizing personal health knowledge graphs
- Techniques for keeping personal health knowledge graphs current
- Ensuring privacy and security for a personal health knowledge graph
Workshop Organizers
- Ching-Hua Chen
- Amar Das
- Ying Ding
- Deborah McGuinness
References
[1] Diez Roux AV, Katz M, Crews DC, Ross D, and Adler N. Social and behavioral information in electronic health records: New opportunities for medicine and public health. American Journal of Preventive Medicine, 49:980–3, 2015.
[2] Uba Backonja, Katherine Kim, Gail R Casper, Timothy Patton, Edmond Ramly, and Patricia Flatley Brennan. Observations of daily living: putting the “personal” in personal health records. American Medical Informatics Association, 2012, 2012.
[3] Krisztian Balog and Tom Kenter. Personal knowledge graphs: A research agenda. In Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, pages 217–220, 2019. [4] Amelie Gyrard, Manas Gaur, Saeedeh Shekarpour, Krishnaprasad Thirunarayan, and Amit Sheth. Personalized health knowledge graph. http://knoesis.org/sites/default/files/personalized-asthma-obesity 20(2814):29, 2018.
[5] Tania Bailoni, Mauro Dragoni, Claudio Eccher, Marco Guerini, and Rosa Maimone. Perkapp: A context-aware motivational system for healthier lifestyles. In 2016 IEEE International Smart Cities Conference (ISC2), pages 1–4. IEEE, 2016.
The Path from Knowledge Graph to Enterprise Knowledge Graph: A workshop sponsored by the Enterprise Knowledge Graph Foundation (EKGF)
Abstract
The Enterprise Knowledge Graph Foundation was established to define best practices and mature the marketplace for Enterprise Knowledge Graph (EKG) adoption. We are a member-supported non-profit (501c6) trade association established as a coordination body for the international enterprise knowledge graph community. EKGF members contribute open-source thoughts, ideas and working code. These contributions support areas ranging from the EKGF infrastructure itself, for delivering EKGF products and services, to original EKG principles intended to provide guidelines for the development and deployment of an enterprise knowledge graph, and an EKG Maturity Model which establishes standard criteria for measuring progress and sets out the practical questions that all involved stakeholders ask to ensure trust, confidence, and usage flexibility of data.
The objective of this workshop is to help participants understand what makes a Knowledge Graph a genuine Enterprise Knowledge Graph, which includes complying with the EKGF Principles, Maturity Model and working model EKGF founding members will be introduced.
Topics
- The EKGF principles/pillars philosophies – the nucleus of the EKGF Manifesto
- Status of the Maturity Model (content)
- Introduction to the EKGF process
- EKGF Portals – including services to be provided
The workshop is intended to be interactive where participants put ideas on the table and seek agreement.
Tutorials
Abstract:
In this tutorial we will address the challenge of semantically defining the elements of knowledge graphs (entities, relations, etc), so that their meaning is explicit, accurate and commonly understood by both humans and machines. We will use RDF/OWL and Protege as our modeling tools and, through a running use case of a labor market knowledge graph, we will see how we can tackle pitfalls and dilemmas that commonly appear in the modeling process.
Topics include:
- How to define entities and relations so that humans can understand them
- How to define entities and relations so that machines can correctly reason with them
- How to measure quality
- How to tackle expressivity and content dilemmas
Audience
Knowledge engineers, ontologists, taxonomists and data modelers who develop semantic models and knowledge graphs.
Outline
The tutorial will be delivered in 5 sessions of a total duration of 180 minutes, with one break in between. Each session will comprise a slide-based presentation of key topics and techniques, a hands-on demonstration of these techniques in Protege (where applicable), and a Q&A. All sessions will have as a running example a use case regarding the development of a knowledge graph for the labor market domain.
- Introduction
- Session 1: Defining classes and individual entities
- Class vs individual – the pitfall and the dilemma
- Vague classes
- Good and bad names and definitions
- Lexicalization and synonyms
- Q&A
- Class vs individual – the pitfall and the dilemma
- Session 2: Defining hierarchies
- Good and bad subclasses
- To subclass or not to subclass?
- Transitivity traps
- Q&A
- Session 3: Defining non-hierarchical relations
- Relation vs attribute dilemma
- Vague relations
- Semantic Relatedness
- Q&A
- Session 4: Measuring knowledge graph quality
- Quality dimensions
- Quality trade-offs
- Good and bad quality metrics
- Q&A
- Session 5: Expressiveness and content dilemmas
- What Lexicalizations to Have?
- How Granular to be?
- How Negative to be?
- How many truths to handle?
- How interlinked to be?
- Q&A
- Wrapping up
- Take-Aways
- Additional resources
- Q&A
Presenter: Panos Alexopoulos
Abstract
Prior to the Shape Constraint Language (SHACL) we had no W3C standardization for validating our semantic knowledge graphs against a set of constraints. By constraining RDF using SHACL we gain the possibility of exactly this – validating semantic knowledge graphs under a closed-world assumption! This masterclass will introduce you to the SHACL Core constraints, demonstrate how to constrain your data and what happens if the data conforms false (or true for that matter!)
Key topics include:
- Terminology and concepts of semantic knowledge graphs
- Comparing SHACL and the Web Ontology Language (OWL)
- Introduction to SHACL
- Hands-on exercises:
- Building shapes for ABoxes
- Validation and validation errors
Audience
- Information architects
- Content creators
- Data modelers
- People who are interested in high data integrity
Prerequisites
- Some knowledge and/or experience with RDF is needed
- Otherwise, beginner/Intermediate level, depending on your prior knowledge of SHACL
Key Takeaways
- Learning about the SHACL Core constraints
- Hands-on experience with constraining your RDF data using SHACL
- How to identify errors in validation reports
Outline
- Lectures including:
- basis terminology, concepts and history
- introduction to SHACL Core constraints
- Hands-on demonstration:
- A brief look at some data examples
- Building constraints from scratch, using SHACL Core
- Using validation engine (SHACL playground and through a Java framework)
- Summary
- Discussion, Q&A, resources
The first part includes a traditional lecture, followed by a hands-on session that participants can follow using their editor of choice.
Instructors
Abstract
Python has excellent libraries for working with graphs which provide: semantic technologies, graph queries, interactive visualizations, graph algorithms, probabilistic graph inference, as well as embedding and other integrations with deep learning. However, almost none of these have integration paths other than writing lots of custom code, and most do not share common file formats. Moreover, few of these libraries integrate effectively with popular data science tools (e.g., pandas, scikit-learn, PyTorch, spaCy, etc.) or with popular infrastructure for scale-out (Apache Spark, Ray, RAPIDS, Apache Parquet, fsspec, etc.) on cloud computing.
This tutorial introduces kglab – an open source project that integrates RDFlib, OWL-RL, pySHACL, NetworkX, iGraph, pslpython, node2vec, PyVis, and more – to show how to use a wide range of graph-based approaches, blending smoothly into data science workflows, and working efficiently with popular data engineering practices. The material emphasizes hands-on coding examples which you can reuse; best practices for integrating and leveraging other useful libraries; history and bibliography (e.g., links to primary sources); accessible, detailed API documentation; a detailed glossary of terminology; plus links to many helpful resources, such as online “playgrounds”. Meanwhile, keep a practical focus on use cases
Software used:
- https://github.com/DerwenAI/kglab
Audience
- Python developers who need to work with KGs
- Data Scientists, Data Engineers, Machine Learning Engineers
- Technical Leaders who want hands-on KG implementation experience
- Executives working on data strategy who need to learn about KG capabilities
- People interested in developing personal knowledge graphs
Prerequisites
- Some coding experience in Python (you can read a 20-line program)
- Interest in use cases that require *knowledge graph representation*
Additionally, if you’ve completed *Algebra 2* in secondary school and
have some business experience working with data analytics – both can
come in handy.
We will use Jupyter notebooks, installed along with the other open-source software libraries. You will need to have Python 3.6 or later, and also Git installed.
Key Takeaways
- Hands-on experience with popular open source libraries in Python for building KGs, including rdflib, pyshacl, networkx, owlrl, pslpython, and more
- Coding examples that can be used as starting points for your own KG projects
- How to blend different graph-based approaches within a data science workflow to complement each other’s strengths: for data quality checks, inference, human-in-the-loop, etc.
- Integrating with popular data science tools, such as pandas, scikit-learn, matplotlib, etc.
- Graph-based practices that fit well with Big Data tools such as Spark, Parquet, Ray, RAPIDS, and so on
Outline
- Sources for data and controlled vocabularies: using a progressive example based on a Kaggle dataset for food/recipes
- KG Construction in rdflib and Serialization in TTL, JSON-LD, Parquet, etc.
- Transformations between RDF graphs and algebraic objects
- Interactive Visualization with PyVis
- Querying with SPARQL, with results in pandas
- Graph-based validation with SHACL constraint rules
- A sampler of graph algorithms in networkx and igraph
- Inference based on semantic closures: RDFS, OWL-RL, SKOS
- Inference and data quality checks based on probabilistic soft logic
- Embedding (deep learning) for data preparation and KG construction
Instructors
- Paco Nathan
- Daniel Vila Suero
- Gaurav Jaglan
Abstract
Over the last decade, DBpedia has become one of the most widely used knowledge graphs and one of the central interlinking hubs in the LOD cloud. The ultimate goal of the DBpedia community project is to a) build a large-scale, multilingual knowledge graph by providing structured information extracted from Wikipedia, and to b) integrate and complement this knowledge with knowledge from other sources.
Over the last few years, the DBpedia core team has significantly consolidated the knowledge and technology around DBpedia and introduced some novel technologies and concepts. These efforts have positively impacted the community around DBpedia, which has constant growth. According to the bibliographic database Google Scholar, there are over 33,500 articles citing DBpedia; using DBpedia or developing technology for DBpedia. The ultimate goal of the tutorial is to provide an overview of the latest advancements in the DBpedia technology stack and the DBpedia KG lifecycle, together with detailed discussions on the regular DBpedia releases, the DBpedia infrastructure and services, and concrete usage scenarios of the DBpedia knowledge graph and the technology behind it.
The aim of this tutorial is, from a practical perspective, to explain in detail the process of replicating the DBpedia infrastructure, exploiting DBpedia in third-part applications, and contributing and improving the DBpedia knowledge graph and services. Hands-on sessions will be organized to internalize the DBpedia technology stack with practice.
Audience
This tutorial targets existing and potential new users of DBpedia, developers that wish to learn how to replicate DBpedia infrastructure, service providers interested in exploiting the DBpedia KG, data providers interested in integrating data assets with the DBpedia KG, data scientists (e.g. linguists) focused on extracting relevant information (e.g. linguistic) from/based on the DBpedia KG. The tutorial is also dedicated for people from the public and private sector who are interested in implementing knowledge graph technologies, and in particular, DBpedia.
Prerequisites
The tutorial will be shaped in a way that no specific prerequisites will be required. However, the participants would benefit if they have some background knowledge in Semantic Web concepts and technologies (RDF, OWL, SPARQL), general overview of the Web Architecture (HTTP, URI, JSON, etc.) and basic programming skills (bash, Java, JavaScript).
Key Takeaways
During the course of the tutorial the participants will gain knowledge about:
- the complete DBpedia Knowledge Graph lifecycle, i.e. from extraction and modelling to publishing and maintenance of DBpedia
- how to find information, access, query and work with the DBpedia KG
- the DBpedia infrastructure
- the Databus platform and services (Spotlight, archivo, etc)
- how to replicate the DBpedia knowledge graph and infrastructure
- how to use DBpedia in third-party applications
- how to contribute and improve the DBpedia knowledge graph.
Outline
- DBpedia Knowledge Graph in the nutshell
- DBpedia Technology Stack
- DBpedia Databus platform
- Replication of the DBpedia infrastructure
- Consumption of the DBpedia Knowledge Graph
- Contribution and improvement of DBpedia
- Integration of datasets with DBpedia
- Use cases and applications of DBpedia
All required software and tools will be provided and they will be freely available for the participants. Instructions for the hands-on sessions will be provided in advance.
Instructors
Abstract
It is common practice for data providers to include text descriptions for each column when publishing data sets in the form of data dictionaries. While these documents are useful in helping an end-user properly interpret the meaning of a column in a data set, existing data dictionaries typically are not machine-readable and do not follow a common specification standard. We introduce the Semantic Data Dictionary, a specification that formalizes the assignment of a semantic representation of data, enabling standardization and harmonization across diverse data sets. The rendition of data in this form helps promote improved discovery, interoperability, reuse, traceability, and reproducibility.
We present the associated research and describe how the Semantic Data Dictionary can help address existing limitations in the related literature. We discuss our approach, present an example by annotating portions of the publicly available National Health and Nutrition Examination Survey data set, present modeling challenges, and describe the use of this approach in sponsored research, including our work on a large National Institutes of Health (NIH)-funded exposure and health data portal and in the RPI-IBM collaborative Health Empowerment by Analytics, Learning, and Semantics project. This work has been evaluated in comparison with traditional data dictionaries, mapping languages, and data integration tools.
Software used:
- https://github.com/tetherless-world/SemanticDataDictionary
- https://github.com/tetherless-world/whyis
Key Takeaways
In this tutorial, we begin by introducing the Semantic Data Dictionary, a specification that formalizes the assignment of a semantic representation of data. We also introduce Whyis, a knowledge graph publishing, management and analysis framework that allows for the curation of knowledge from many different sources. We begin with a walkthrough of the Whyis installation process. We continue by showing how to register a user and load RDF data into Whyis. We next demonstrate a workflow for annotating tabular data by creating a Semantic Data Dictionary, which we process into knowledge graph fragments both in a standalone manner and within Whyis. We conclude by demonstrating how to query the knowledge graph for the annotated triples and how to visualize the resulting data.
Outline
- Introduction to Semantic Data Dictionaries
- Introduction to Whyis
- Installation of Whyis
- Registering a user in Whyis
- Loading RDF data into Whyis
- Demonstration of annotating data using a Semantic Data Dictionary
- Standalone processing of the Semantic Data Dictionary
- Processing of the Semantic Data Dictionary within Whyis
- Querying the knowledge graph for the annotated data
- Visualizing the annotated data in Whyis
Instructors
Abstract
Recently, knowledge graphs have been successfully deployed to overcome the typical difficulties in accessing and integrating data stored in different kinds of sources. In particular, in the Virtual Knowledge Graph (VKG) approach, data is presented in the form of a graph, which is mapped to the underlying data sources, and query answering is based on advanced query transformation techniques. In this tutorial we illustrate the principles underlying the VKG approach, providing an overview of well-established algorithms and techniques, and we discuss relevant use-cases using VKGs. We also provide a step-by-step hands-on session, where participants can experiment on their own with the discussed technologies using the open-source VKG system Ontop and other open-source tools.
Software used:
- https://ontop-vkg.org/
The following open-source software is required to practice on the own computer the hands-on session of the tutorial:
- Latest version of Protégé desktop (with Java 8 embedded)
- Docker
Ontop will be downloaded both as a plugin of Protégé and as a Docker image.
Prerequisites
- We assume that participants are familiar with basic Semantic Web standards, including RDF and SPARQL.
- Basic knowledge of RDFS and OWL 2 is a plus, but is not required.
Outline
1. Introduction to Virtual Knowledge Graphs
In this part we first briefly recall the main Semantic Web technologies that underlie the Virtual Knowledge Graphs (VKG) approach, as they have been standardized by the W3C. Specifically, the RDF data model, the ontology language OWL 2 QL, and the query language SPARQL. We then provide an introduction to the principles of VKGs, introducing the general VKG framework and a language for mapping relational data to RDF. We present the ideas behind query processing, based on query rewriting with respect to the ontology, and transformation to SQL using mappings. We discuss the architecture of a typical VKG system, and the external software with which such a system interacts (i.e., the VKG ecosystem). We also discuss strengths and current limitations of the virtual approach to knowledge graphs, and illustrate significant use-cases in which it has been successfully deployed.
2. VKGs in Action
In this part, we first discuss how to define the components of a VKG system over a relational database, concentrating on the mappings, which critically affect the behaviour of the resulting system. We also provide some insights on how queries are processed exploiting reasoning over the ontology, and the mappings. We illustrate the various aspects on a running example. We then show how to extend the VKG approach to several data sources, including CSV files in data lakes.
3. Hands-on Session
In the hands-on session, we will demonstrate step by step the overall VKG approach introduced before, making use of the Ontop system. Interested participants will be able to follow the steps on their own computer. For the mapping and ontology design part we rely on the Protégé ontology editor, together with its plugin for Ontop. We will also practice how to deploy a VKG system as a SPARQL endpoint, i.e., a system providing SPARQL query answering functionality via a standard HTTP protocol.
Instructors
- Diego Calvanese
- Benjamin Cogrel
- Xiao Guohui
Abstract
Knowledge graphs have emerged as a compelling abstraction for organizing the world’s knowledge over the internet, capturing relationships among key entities of interest to enterprises, and a way to integrate information extracted from multiple data sources. Construction and maintenance of large-scale knowledge graphs requires leveraging knowledge representation, machine learning, and natural language processing. In addition, for applications that use machine learning, natural language processing and computer vision, knowledge graphs are being increasingly used as a knowledge representation for capturing and tracking the learned knowledge. Most major AI/NLP/Vision conferences attract a significant number of research papers that leverage knowledge graphs. While several knowledge graphs have been built and deployed in the industry today, their creation and successful maintenance requires addressing numerous challenges.
Audience
- Newcomers to the field
- Technology executives
Prerequisites
- Undergraduate education in computer science
Key Takeaways
This course aims to convey basic theoretical concepts of knowledge graphs in an easy-to-understand manner. The course does not include any hands-on material and provides a solid background for a follow-up course on software engineering using knowledge graphs.
Outline
This tutorial is organized around a set of lectures focused on the following topics:
- What is a Knowledge Graph?
- What are some Knowledge Graph Data Models?
- How to Create a Knowledge Graph?
- How to Create a Knowledge Graph from Structured Data?
- How to Create a Knowledge Graph from Text?
- What are some Knowledge Graph Inference Algorithms?
- How do users interact with Knowledge Graphs?
- How to Evolve a Knowledge Graph?
- What are some High-Value Use Cases of Knowledge Graphs?
- How do Knowledge Graphs Relate to AI?
Instructors
- Vinay Chaudhri
Abstract
Ontologies form the semantic framework for knowledge graphs, but to serve the purpose of linking data, ontologies need to be based on taxonomies or other controlled vocabularies, whose concepts are linked to data and tagged to content. There has also been a trend of greater integration of taxonomies and ontologies: ontologies are being adopted for wider business use, and taxonomies have been included in the W3C standards with widespread adoption of SKOS (Simple Knowledge Organization System).
While taxonomies are easier to design and create than ontologies, too often they are created without any skill or training, and poorly designed taxonomies yield poor results. This tutorial will cover the basics and best practices in taxonomy design, including: types of controlled vocabularies, standards, sources for topical concepts, wording of labels, alternative labels, hierarchical and associative relationships, and governance. This tutorial also explains the approach of semantically enriching an existing taxonomy to become an ontology by adding a semantic layer of an ontology or custom scheme.
Audience
- Ontologists or knowledge engineers who are not experienced in creating taxonomies
- Those who have a basic understanding of taxonomies or ontologies, but would like to know more
- Managers of data, information, content, or knowledge
Prerequisites
Basic familiarity and understanding of ontologies and taxonomies, but prior experience creating them is not required.
Key Takeaways
- Understand the diversity of knowledge organization systems and which are better suited for which situations
- What resources to use in developing a taxonomy
- How to develop concepts and labels that best serve the users
- How to construct taxonomy/thesaurus relationships according to standards and best practices
- How to extend a taxonomy into an ontology and how to design an ontology that leverages and existing taxonomy
Outline
- Introduction to taxonomies and ontologies: background, uses, purposes, and approaches
- Taxonomies and other types of controlled vocabularies: characteristics and comparisons
- Standards and models for controlled vocabularies: ANSI/NISO, ISO, SKOS
- Sources for concepts: manual content analysis, term extraction, user interviews, brainstorming
- Wording of labels: conventions and style
- Alternative labels: purpose, types, number of labels
- Hierarchical relationships: purposes, types and best practices
- Associative relationships: purposes and types
- Governance: documentation and processes, such as for adding new concepts
- Semantically enriching a taxonomy to extend it to become an ontology [will include a PoolParty demo]
Interactive activities include:
- Identifying the most suitable controlled vocabulary type
- Suggesting alternative labels
- Suggesting broader and narrower concepts
- Suggesting related concepts
- Semantic model design
Instructors
Abstract
Graphs are characterized by their ability to model relationships between entities. Graph models allow for relationship traversals up to 3-5 orders of magnitude faster than traditional relational models. This performance advantage allows graphs to include more data and different types of data than were possible in the past. Enterprise Knowledge Graph (EKG) architects start asking questions:
- What other data can we include?
- What benefits could we gain if we began to think of our organization as a more holistic and integrated system?
- How do the diverse datasets interact to give us deeper insights into how to optimize our operations?
This workshop will introduce the fundamental concepts around EKGs and Systems Thinking and how they fit together. We then will walk the group through a series of short exercises to demonstrate how the two concepts are related with examples.
Objectives include:
- Define the characteristics of an Enterprise Knowledge Graph (EKG).
- Allow participants to understand how EKG data modeling processes determine what is stored in an enterprise knowledge graph.
Key Takeaways
- Learn the fundamentals of Systems Thinking and how large graph models help us with Systems Thinking.
- See the role of time in data models (temporal modeling).
- How to align the EKG data model with enterprise strategy (lower costs, increase revenue, increase agility)
- Learn how to predict the value of insights as you connect more systems together.
Outline
- What is an enterprise knowledge graph?
- What is Systems Thinking?
- What are predictive graph models?
- What are feedback loops?
- What are externalities?
- How do we look for unintended consequences?
- How do we decide what new data to model?
- How do large, diverse connected datasets change the way we think?
- What new insights can we discover?
- How do we decide what datasets can provide the most value to our system?
Sample workshop “labs” and breakout rooms include:
- E-commerce: How do we find our best customers?
- Social networks: How do we find the top influencers?
- Strategy graphs – how do we design a graph that can predict if an organization is following a strategy? Assume every e-mail message and meeting tiles can be processed by an NLP classifier model.
- Human resources graph – how would you predict if an employee will be: happy, innovative, ambassadors?
- Call centers: How do we decide how much training to give our call center agents?
- Manufacturing: How do we trace a product defect to the root cause?
- Process mining: How do we harvest process steps from log files to look for workflow bottlenecks?
Instructors