Welcome to Defra Data Services Forum

    Glossary of Linked Data

    Follow

     

    API API is short for Application Programming Interface.
    Application Programming Interface An application programming interface is a set of rules and helpers that make it easier for developers to access data held in databases. The maintainer of a database will create APIs so that an external developer does not need to understand the full structure of the database, or the data contained within, in order to build applications that make use of the data.
    Data cube A data cube is a common structure for representing statistical data. The different ways of categorising the data (by location, time, or whatever other variables) are referred to as dimensions. If you had three such dimensions you could imagine representing the data as a cube, with one statistical dimension matched against each side of the cube. Though this is just an analogy and in practice a 'data cube' can have any number of dimensions. Different two-dimensional 'slices' through the cube can be presented as tables. Filtering and other such affordances help users to get to the data that they want.
    Data Services Platform The Data Services Platform (DSP) has been set up by the Defra group to make environmental data available to a wide range of users. The platform provides access to datasets, data services, and other information, in ways that are easy to use, both by themselves, and by other people and organisations
    Dimension A dimension is one of the component parts of a statistical dataset, and helps to describe an individual observation. Dimensions can be thought of as columns in a spreadsheet, and are often things like Time Periods, Geographical Areas, and Measurement Types.
    DSP DSP is short for the Data Services Platform.
    Graph database A graph database is a type of database where there is no hierarchy - all data is stored as a series of nodes and edges (links between nodes). Typically each node of the graph represents a thing or the value of a property, and each edge represents a property - a particular type of relationship between the two nodes that it joins. This makes it easier to query the database based on relationships and it makes for a very flexible data structure that is easy to alter or extend. Graph databases are very useful for storing datasets that are complex with lots of connections.
    Identifier An identifier is a string of characters which can help an application or person uniquely identify a resource in a dataset. This is often abbreviated to ID. It is helpful to use identifiers when communicating information about a thing (such as a town) rather than its name, as there may be more than one town with (for example) the name Newport, but they will have different identifiers.
    Linked Data Linked data is data you can link to. Linked data is built on the mechanisms of the web, so each piece of data gets a URL that is used both as an identifier for that piece of data and as an address you can look up to find information about it. This makes principle makes it easier for computers to access the data programmatically.
    Machine-readable If a dataset is machine-readable, then this means that a developer can write a computer script or program to get, process, analyse and/or visualise data. This could be done from R (a popular statistical programming language), Python (a popular programming language), or some other way. The benefits of this are that once the script is created, it can be scheduled to run every day (for example). A good example of this is the floods service, where data is used by the BBC to provide access to up-to-date flood information on their website, something that wouldn't be possible if a human had to update the data manually.
    Metadata Metadata is data that helps to explain a dataset. This may be the publisher of the data, the date it was published, or any licences associated with the data. In linked data, the metadata is published as part of the dataset, ensuring people who are using the data have access to the information about the data.
    PMD PMD is short for PublishMyData.
    PublishMyData PublishMyData is a software application in use across the DSP Linked Data services and developed by Swirrl. It consists of a triple store, a user interface for browsing, searching, filtering, and downloading data, and a collection of APIs for accessing data programmatically. (NB. The Linked Data services are designed to be portable and there is no tie-in with a service Providers software)
    RDF RDF is short for Resource Description Framework. This is the data model that underpins linked data. RDF was adopted as a World Wide Web Consortium recommendation in 1999. Often used to talk about linked data - "let's have a look at the RDF" would mean to examine the raw linked data in a database.
    Reference data Reference data describes a type of data that helps to describe other data. A dataset such as Local Authorities is a reference dataset that is used to locate resources in a particular area. Because these local authorities should be consistent across different datasets, using a single, controlled list ensures interoperability between datasets, and a single point where changes can be made (for example to add new local authorities). Other types of reference data include codelists, such as exemption types in the registers dataset, or chemicals.
    Slice A slice is a particular filtered view of a datacube, where one or more dimensions is fixed. For example, limiting the dataset to show only data from 2019, and/or a particular river basin and/or a particular species of fish.
    SPARQL SPARQL is short for SPARQL Protocol and RDF Query Language. It is used for querying data in RDF format, in a similar way that SQL is used to query relational databases. SPARQL is a standard created and maintained by the World Wide Web Consortium. SPARQL is useful for getting data out of linked databases as an alternative to a more specific API.
    Swirrl Swirrl is the service provider currently providing Linked Data expertise to the DSP, maintaining the pre-existing linked data applications, and developing new ones. Swirrl are the creators of the PublishMyData platform.
    Tidy data Tidy data is the principle of representing a statistical dataset in a one-row-per-observation form. This means that each value in a dataset exists in its own row, with other columns being the dimensions that describe that data item. An example of this would be a dataset of river sensor data, where each row is one measurement, with other columns being the date/time of the measurement, the location, and the measurement type, such as height, or flow. Tidy data makes it very easy for developers and analysts to use the data, by filtering down to the slice of data required.
    Triple Store A triple store is a type of database used to store linked data. It is a particular kind of Graph Database, designed for use with data in RDF format. There are two triple store products in use across the DSP - Stardog, and Fuseki.
    Turtle Turtle is one of several standardised text-based formats for representing RDF data. Files in this format are usually given the extension .ttl
    URI URI is short for Uniform Resource Identifier. This is a series of characters that can be used to refer to an abstract or physical resource. The characters must follow specific rules, defined by the Internet Engineering Task Force. HTTP URLs, as used for web site addresses, are one kind of URI. In Linked Data, each thing ('resource') being described is identified using a URL, which can also serve as the address of a webpage that provides more information about that resource. An example of this is the water body Captain's Pond, which has this URI: https://environment.data.gov.uk/catchment-planning/WaterBody/GB30535397. Clicking the link provides more information about the water body, including a map.
    URL URL is short for Uniform Resource Locator. Usually (though not always) this takes the form of http(s), and is more commonly known as a web address, which is a link to a webpage.
    User Research User research is an important part of the development and review of applications or datasets. Conducting user research ensures that requirements are based on user needs, and help with setting out the roadmap of a project. User research often takes the form of a combination of interviews, group workshops and observed sessions. Outputs are often recommendations, roadmaps, user journeys and wireframes.
    Was this article helpful?
    0 out of 0 found this helpful

    Comments

    Please sign in to leave a comment