Welcome to Defra Data Services Forum

    Bitesize: What Linked Data looks like technically

    Follow

    What it looks like technically

     

    Rather than being stored in a database as a collection of connected tables, rows and columns, linked data is stored in a graph database. A graph database is a much simpler way of storing data, which can be thought of as one big table, consisting of three columns, and as many rows as are needed (often hundreds of millions).

     

    Each piece of information is stored as a triple - a statement that consists of an identifier (the subject), a property (the predicate), and a value (the object). As an analogy, it works a bit like grammar: The cat (subject) sat on (predicate) the mat (object), but it's easier to show as a real world example.

     

    If we take catchment data explorer as an example, we have a lot of data about water bodies. One of these data items is the overall 2019 classification. In the graph database, this looks like this for one water body:

     

    subject          predicate            object
    
    Captain's Pond    2019classification   Moderate
    

     

    For two water bodies, it looks like:

     

    subject           predicate               object
    
    Captain's Pond    2019classification   Moderate
    Decoy Broad       2019classification   Poor
    

     

    This principle can be extended, so more water bodies just means more rows (or triples). If a new water body is discovered, or created in the real world, then it's simply a case of adding more triples.

     

    As well as creating more water bodies, this way of storing data makes the data model flexible. To add 2020 classification data to the database, we would simple add more triples - no requirement to change the structure of tables.

     

    subject          predicate            object
    
    Captain's Pond    2019classification   Moderate
    Captain's Pond    2020classification   Good
    

     

    We have a lot more information about these water bodies, and these can all be represented by more triples:

     

    subject          predicate               object
    
    Captain's Pond    2019classification      Moderate
    Captain's Pond    meanDepth               1m
    Captain's Pond    altitude                26m
    Captain's Pond    waterBodyType           Lake
    Captain's Pond    parentOpCatchment       Bure
    ...
    

     

    With thousands of water bodies in England, it's easy to see how the number of triples can get into the millions.

     

    From these examples so far, the 'linked' part of linked data hasn't surfaced. Looking at the examples above, we have the subject water body - Captain's Pond. With thousands of water bodies, there's a good chance that two water bodies could have the same name, which as well as being confusing, would cause problems with this data model. To help with this, instead of using the name of the water body as the subject, we use an identifier (a URI). These identifiers are unique within the database, and are consistent - ie wherever we want to attach some data to Captain's Pond, we can use its identifier, which is GB30535397.

     

    subject          predicate               object
    
    GB30535397        label                   Captain's Pond
    GB30535397        2019classification      Moderate
    GB30535397        meanDepth               1m
    GB30535397        altitude                26m
    GB30535397        waterBodyType           Lake
    GB30535397        parentOpCatchment       Bure
    ...
    

     

    But this still isn't linked. Because actually, we don't just use the identifier on its own - we use a URL as the identifier. In the case of Captain's Pond on Catchment Explorer, this is https://environment.data.gov.uk/catchment-planning/WaterBody/GB30535397. This has several benefits:

    • the identifier is globally unique - we know exactly what it is that we are talking about when use that identifier.
    • we can create a page on the internet that holds the information that we know about this water body (you can click the link to see this in practice).
    • other people and applications can link to this water body. This could be from a report produced within the Environment Agency, it could be an article about the water body on a local news website, or it could be referenced from a separate dataset in another service, such as Flood Plan Explorer

    Wherever possible, data points within the dataset use URLs as their identifiers - in the example above - the classification, the water body type and the operational catchment that the water body is within would all be stored as URLs, each with their own page where you can find out more information, such as what 'Moderate' actually means, the definition of a 'Lake', or which other water bodies are in the same Operational Catchment area. This web of data has the potential to be extremely powerful, allowing people to explore, discover and use all the information in the dataset.

     

    To explore Defra's linked data, go to https://environment.data.gov.uk/. For enquiries about the data, or to talk to the team about your data, please Submit Feedback/Report an issue.

    Was this article helpful?
    0 out of 0 found this helpful

    Comments

    Please sign in to leave a comment