Defining the Knowledge Graph for the Social Sciences & Humanities
by Gerard Coen
Introduction & Discussion
The knowledge graph (KG) is a term trending among both scholars and practitioners in various scientific disciplines.
The trend mentioned above, is indicated by the rapid growth of peer-reviewed articles on the KG. Over 120 articles were published in 2017 compared with only about 60 articles in 2015 (Web of Science, 2018). The term has also started to appear in large-scale science projects for the Social Sciences & Humanities (SSH) such as the CLARIAH-PLUS ERIC (European Research Infrastructure Consortium) and also the Trans-Atlantic Platform “Digging into Data” Challenge (Hessen, 2018; Digging Into Data, 2018).
The concept of the KG is of great interest for the digital humanities community due to its potential for formalizing and connecting findings and insights derived from the analysis of library collections and datasets (Haslhofer, et al., 2018). For example, in the CLARIAH-PLUS project, the KG opens interesting opportunities to move past the analysis of texts purely on the basis of language, towards an analysis based on content, such as literature, history, philosophy and theology (Hessen, 2018).
Despite the fact that the term KG is gaining momentum, ambiguity around its exact definition and proper usage remains a challenge. The need for a definition has been raised by scholars such as Ehrlinger & Wöß (2016) who highlight the fact that the large variety of interpretations of the KG has hampered the development of a common understanding of the topic. Furthermore, this creates an entry barrier for people who are unfamiliar with KGs and wish to explore the topic or have the ambition to build or use a KG for their own research (Ehrlinger & Wöß, 2016).
Ambiguity around the exact meaning of a term or concept is not unusual, online we can find a variety of examples of definitions for terms such as Blockchain, Artificial Intelligence or the Circular Economy which are either unclear or even contradictory. This is due in part to innovation happening at a rapid pace in emerging fields as well as problems with commercial influences. The influence of Google creates a particular problem for the shared understanding of the KG since in 2012 the company announced the launch of the Google ‘Knowledge Graph’ which they describe as a graph “that understands real-world entities and their relationships to one another: things, not strings” (Singhal, 2012).
The Google ‘Knowledge Graph’ is effectively a tool for rapid information discovery based on search data. The Google KG is an attempt to build the “perfect search engine” capable of understanding exactly what the user means and providing the exact answer they looking for. Information is presented alongside search results in an attempt to answer the user’s next query before they type it, results are provided on the basis of what other people have searched for before (Singhal, 2012).
Given the size and influence of Google, their commercial use of the term KG makes a broader comprehension fuzzy and obscures other potential applications of the KG which are not related to accidental or rapid discovery. Furthermore, for those interested in learning more about the KG, information becomes hidden in plain sight or lost through the availability of information oriented towards the Google KG. One such example is the Wikipedia page on the topic which contains only information on Google’s version of the KG (Knowledge Graph, 2018).
Tracing the provenance of the term, Ehrlinger & Wöß (2016, p.2) argue that the first use of KG came from researchers at the University of Groningen and the University of Twente in the Netherlands who used it in the 1980s to “formally describe their knowledge-based system that integrates knowledge from different sources for representing natural language.” In their work entitled Towards a Definition of Knowledge Graphs, they define the KG stating that:
“A knowledge graph acquires and integrates information into an ontology and applies a reasoner to derive new knowledge.”
Although the work of Ehrlinger & Wöß (2016) sets out to provide a universal definition for the KG, due to the narrow focus of the research related to the semantic web there is a failure to appreciate the cognitive distance of scholars from fields outside of Information and Computer Sciences, therefore the appreciation and application of their definition is limited.
In order to establish a common understanding and proper scientific basis for the KG, the term should be explored relative to the SSH. In order to achieve this, here we provide further definitions of the KG from both academic writing and grey literature, along with sources for understanding how the KG works, and literature dealing with the potential application of the KG for the SSH. An open discussion of the term KG necessitates open methods, and open & available sources, for that reason only Open Access material has been used for developing the content of this paper. Through discussion, it is hoped that a common definition and understanding can emerge which is both relevant and shared by both scholars and practitioners.
Digging into the Knowledge Graph | Digging Into Data. (2018). Retrieved from https://diggingintodata.org/awards/2016/project/digging-knowledge-graph
Ehrlinger, L., & Wöß, W. (2016). Towards a definition of knowledge graphs. CEUR Workshop Proceedings, 1695. http://ceur-ws.org/Vol-1695/paper4.pdf
Haslhofer, B., Isaac, A., & Simon, R. (2018). Knowledge Graphs in the Libraries and Digital Humanities Domain, 1–11. Retrieved from http://arxiv.org/abs/1803.03198
Hessen, A. (2018). CLARIAH-PLUS granted. [online] Clariah.nl. Available at: https://www.clariah.nl/en/new/news/clariah-plus-granted#video-message-of-gertjan-filarski-2 [Accessed 31 May 2018].
Knowledge Graph. (2018). Retrieved from https://en.wikipedia.org/wiki/Knowledge_Graph
Singhal, A. (2012). Introducing the Knowledge Graph: things, not strings. Retrieved from https://www.blog.google/products/search/introducing-knowledge-graph-things-not/
Open Access Resources
Knowledge graphs represent concepts (e.g., people, places, events) and their semantic relationships. As a data structure, they underpin a digital information system, support users in resource discovery and retrieval, and are useful for navigation and visualization purposes. Within the libaries and humanities domain, knowledge graphs are typically rooted in knowledge organization systems, which have a century-old tradition and have undergone their digital transformation with the advent of the Web and Linked Data. Being exposed to the Web, metadata and concept definitions are now forming an interconnected and decentralized global knowledge network that can be curated and enriched by community-driven editorial processes. In the future, knowledge graphs could be vehicles for formalizing and connecting findings and insights derived from the analysis of possibly large-scale corpora in the libraries and digital humanities domain.
A knowledge graph is a kind of semantic network representing some scientific theory. The article describes the present state of this field and addresses a number of problems that have not yet been solved. These problems are implicit relations, strength of (causal) relations, and exclusiveness. Concepts might be too broad or complex to be used properly, so directions for solving these problems are explored. The solutions are applied to a knowledge graph in the field of labour markets.
Recently, the term knowledge graph has been used frequently in research and business, usually in close association with Semantic Web technologies, linked data, large-scale data analytics and cloud computing. Its popularity is clearly influenced by the introduction of Google’s Knowledge Graph in 2012, and since then the term has been widely used with- out a definition. A large variety of interpretations has hampered the evolution of a common understanding of knowledge graphs. Numerous research papers refer to Google’s Knowledge Graph, although no official documentation about the used methods exists. The prerequisite for widespread academic and commercial adoption of a concept or technology is a common understanding, based ideally on a definition that is free from ambiguity. We tackle this issue by discussing and defining the term knowledge graph, considering its history and diversity in interpretations and use. Our goal is to propose a definition of knowledge graphs that serves as basis for discussions on this topic and contributes to a common vision.