The term Linked Data refers to a set of best practices for publishing and interlinking structured data on the Web. These best practices were introduced by Tim Berners-Lee in his Web architecture note Linked Data and have become known as the Linked Data principles. These principles are:
To publish data on the Web, the items in a domain of interest must first be identified. These are the things whose properties and relationships will be described in the data, and may include Web documents as well as real-world entities and abstract concepts. As Linked Data builds directly on Web architecture, the Web architecture term resource is used to refer to these things of interest, which are, in turn, identified by HTTP URIs.
In order to enable a wide range of different applications to process Web content, it is important to agree on standardized content formats. When publishing Linked Data on the Web, data is represented using the Resource Description Framework (RDF). RDF provides a data model that is extremely simple on the one hand but strictly tailored towards Web architecture on the other hand. To be published on the Web, RDF data can be serialized in different formats. The two RDF serialization formats most commonly used to published Linked Data on the Web are RDF/XML and RDFa.
The RDF data model represents information as node-and-arc-labeled directed graphs. The data model is designed for the integrated representation of information that originates from multiple sources, is heterogeneously structured, and is represented using different schemata. RDF aims at being employed as a lingua franca, capable of moderating between other data models that are used on the Web. The RDF data model is described in detail as part of the W3C RDF Primer. In RDF, a description of a resource is represented as a number of triples. The three parts of each triple are called its subject, predicate, and object. A triple mirrors the basic structure of a simple sentence, such as:
Mark Carter | has a | website |
Subject | Predicate | Object |
The resulting URIs for my name could look like:
http://macarter.org/person/macarter http://xmlns.com/foaf/0.1/name "Mark Carter"
The subject of a triple is the URI identifying the described resource. The object can either be a simple literal value, like a string, number, or date; or the URI of another resource that is somehow related to the subject. The predicate, in the middle, indicates what kind of relation exists between subject and object. The predicate is also identified by a URI. These predicate URIs come from vocabularies, collections of URIs that can be used to represent information about a certain domain. One way to think of a set of RDF triples is as an RDF graph. The URIs occurring as subject and object are the nodes in the graph, and each triple is a directed arc that connects the subject and the object. As Linked Data URIs are globally unique and can be dereferenced into sets of RDF triples, it is possible to imagine all Linked Data as one giant global graph. Linked Data applications operate on top of this giant global graph and retrieve parts of it by dereferencing URIs as required.