Linked data is a good choice for managing a large and varied collection of data, because of the way it helps make connections between separate but related data.
In most organisations data tends to be in 'silos', each based around a topic or business area, and it is a common challenge to combine and compare data from different places.
Linked Data uses web-style URIs as identifiers for the things of interest in the data (things like places, or species of fish, or measurement sites, or one particular observation of rainfall, or a classification of flood defence assets etc) and that enables the identifier to be globally unique. If two different systems hold information about the same thing, then they can use the same identifier for it, without risk of clashes or misinterpretation.
For example, the Environment Agency's Asset Management linked data application uses identifiers like https://environment.data.gov.uk/asset-management/id/asset/24285 for assets. If you saw just '24285' in isolation, you wouldn't know what it meant without some extra explanation of the context. Two different applications might use '24285' to refer to different things, but the full URI should only ever refer to one specific thing.
A key part of the Linked Data approach is that you can directly look up the identifier on the web to find out more about it. So if you put https://environment.data.gov.uk/asset-management/id/asset/24285 in your web browser, you get a web page with information to explain what it is. You can then see that it refers to an embankment, and find various bits of information about it. An obligation of a linked data service provider is to support looking up identifiers in this way.
If you create an identifier in a web domain that you control, then you are in charge of the official definition of what it means. Others can use it if they want, but you get to define what it refers to.
Another aspect of the Linked Data approach is to make use of a standard feature of the web called 'content negotiation', which means that you can provide information about a thing in various different formats and users of the data can choose which one of those is best suited to them. One of the measuring stations used by the Floods API has the identifier https://environment.data.gov.uk/flood-monitoring/id/stations/1491TH. By default you get a JSON format description of it when you look it up, but that JSON response also lists some other formats you can have instead, for example https://environment.data.gov.uk/flood-monitoring/id/stations/1491TH.html to get a formatted web page about it. A programmer can specify in the header of their web request which format they want to receive.
You can choose what information you want to supply in the different descriptions of a thing of interest - in this case the web page includes a map that shows where the station is, as well as the most recent water level reading and other relevant details.
Directly looking up an identifer is not the only way you can provide information about it - you might have other custom applications that provide different views. For flood assets, there is a map based application that lets you explore all of the flood defence assets. In that application you can view data about our example embankment at https://environment.data.gov.uk/asset-management/index.html?element=http%3A%2F%2Fenvironment.data.gov.uk%2Fasset-management%2Fid%2Fasset%2F24285 where you get a map showing where it is, alongside other nearby assets, plus a summary of its properties. The PublishMyData platform provides a range of views of data, in addition to supporting direct lookups of an identifier.
The ability to look up the definition of a linked data identifier is a useful way to help users find metadata, for example definitions of terms used in representing data. Related information can be found just by following the links, as you would on a web page.
Linked data is particularly useful as a way of managing and disseminating reference data, that is lists of definitions of common items of interest (whether real world things, or a way of classifying data) that other datasets often refer to. Geographical data is a common example: local authority districts or parliamentary constituencies are used in many different Environment Agency datasets as a property of something. For example that example asset is in the constituency with identifier http://data.ordnancesurvey.co.uk/doc/7000000000024900. If you look that up, then the Ordnance Survey provide information about what the constituency is called and where it is.
One aspect of reference data is re-using authoritative data managed by other people, whether that's an external organisation like the Ordnance Survey, or another part of the Environment Agency. This reduces effort by building on work already done by others and also reduces the risk of data being out of date.
Systematic use of authoritative identifiers in reference data also helps with combining data across data silos. For example, if two different data collections both include that constituency identifier as a property, then it becomes easy to combine different types of data about that constituency (or catchment, or stretch of river network).
Finally, the linked data approach (in common with other kinds of databases) enables data to be delivered in standard machine readable formats that are good for processing in software, so helping users to analyse, process or visualise software. It is relatively easy to extract various subsets of data, transform them to JSON or CSV or GML or whatever format is most convenient for a user and deliver the data through an API or download.
To explore Defra's linked data, go to https://environment.data.gov.uk/. For enquiries about the data, or to talk to the team about your data, please Submit Feedback/Report an issue.
Please sign in to leave a comment