TILDE - for describing entities

TILDE is a new approach for describing entities.

Defining TILDE is a work-in-progress. If you wish to comment, please email me, Bill Oldroyd.
First version, 21 September 2019.

TILDE : a Trim, Indented Layout for Describing Entities. TILDE follows on from approaches such as JSON and YAML by providing a simple, minimal approach to describe entities. Entities being such things as information resources, people, organisations, objects, events and so forth. Entities can be both real and imaginary.

TILDE addresses the manner in which descriptions of entities are recorded and presented. The objective is to provide a simple text layout which can be easily understood, created and edited by humans using simple text editing software. At the same time the text can be interpreted by software into a set of 'descriptors' stored in a hierarchical data model. The hierarchy is defined through indentation within the text. The simplicity and clarity of the layout assists understanding of the manner in which the descriptions are constructed, thus allowing experimentation and development. A schema can be included in the description or loaded separately.

Within the examples shown here the choice of descriptors is illustrative not definitive. Comparison with any current schemes is to provide a basis for the choices made in designing TILDE. Differences are highlighted to illustrate a different way to describe entities. Here is an example of a description using TILDE.

~language~ english
~updated~ 2019-08-23T09:55:07 

    title~ Yorkshire cotton
        sub-title~ The Yorkshire cotton industry, 1780-1835
        by~ George Ingle       
    is-a~ book; text, illustrations, photographs, sources, gazetter
       by~ Carnegie Publishing Limited
       location~ Preston, United Kingdom
       date~ 1997
       form~ soft cover; print, on paper; 279p
       identifier~ ISBN 1-85936-0820-9
        location~ Yorkshire, United Kingdom
        subject~ history, cotton industry
        period~ 1780 - 1835
        keywords~ industrial revolution, surveys, water power, 
           steam power, machinery, mill construction, capital, labour supply

Initially the description above is specifying the language of the description, which is 'english' and the date of the last update. 'IR~'stands for "information resource" and is equivalent to the entity being described, which in this case is a book. The descriptors used are fairly obvious for a book except for 'is-a~' which describes the intellectual form of the entity and about~ introduces a description of the content. Bold and italic text is used only to aid comprehension.

Descriptors have 4 main components : an indentation reflecting position in the hierarchy, a label, a list of one to many values, with each value having a dictionary of zero to many subsiduary Descriptors. A description is an hierachical arrangement of Descriptors. This model can, for example, be implemented in Python as a group of instances of a 'descriptor' class, but the prime source of the description is the original text stream.

Although there is a Python implementation to create an appropriate data model for a description there is no reason why the model cannot be implemented by other methods at a lower level, for example RDF.

The choice of Descriptors is application dependant. In the examples the use of TILDE is to describe information resources and related entities. Entities such as books, films, websites, services, people, organisations, events, categories, term lists and so forth.

TILDE uses a special character to denote labels, this is the ~ (tilde, U+007E).

A Descriptor is one or more lines of text which have the following form :

Labels can be duplicated at any level in the hierarchy but when used at the same level in a hierarchy the effect is to create a further value in the initial descriptor instance.

So for example :

~language~   english

Person~ John Smith
        birth~ 18 March 1824
            place~ Leeds
        death~ 9 September 1879
    description~ John Smith was the eldest of five children of Samuel Smith, 
                 a wealthy butcher and tanner from Meanwood in Leeds. In 
                 1847 John Smith purchased the Backhouse & Hartley brewery 
                 in Tadcaster with funding provided by his father. Smith's 
                 timing proved fortuitous; pale ales were displacing porter 
                 as the public's most popular style of beer, and 
                 Tadcaster's hard water proved to be well-suited for brewing 
                 the new style. The prosperity of the 1850s and 1860s, 
                 together with the arrival of the railways, realised greater 
                 opportunities for brewers, and by 1861 Smith employed eight 
                 men in his brewing and malting enterprise. 
        ~note~ Copied from Wikipedia
            url~ https://en.wikipedia.org/wiki/John_Smith_(brewer)            

Descriptors can be defined by the application, however a convention is used in this description is that labels may be of 3 types : an attribute, a property and a cluster.

An attribute has a value which describes the entity. The description can be broken down in more detail using indentation.

A property, which starts and ends with the special character ~ , describes aspects of a Descriptor such as language, format, dates and notes of correction, etc.. Within the description properties qualify the parent attribute within the description providing information about the value or its creation or update. A value defined in a property also applies to all its subsiduary Descriptors unless it is over-written at a lower level in the hierarchy. Bearing in mind the container is a Descriptor, properties such as the date/time of update or language can be used at the top level to provide information that applies to the whole description. In the example above the property ~language~ applies to the whole description, specifying that the description uses the english language.

A cluster is a special type of attribute that is equivalent to the entity being described and provides a point where all the attributes are grouped together in a description. There may be many clusters in the text stream.

By convention attributes and properties use lower case letters and clusters begin with an upper case letter.

A TILDE description is a stream of text which exists in a 'container', for example a text document. The stream of text comprises a set of descriptors which describe the entity but properties at the zero indentation level define characterists of the container. For example an identifier or as we have seen earlier the date of an update and the language of the description. These properties apply to the whole description unless overwritten at a lower level in the description hierarchy.

Also by convention, the meaning of the additional value lines can be affected by the first character.

In addition there are a number of special properties as follows :

It is important to understand that although these conventions are easy to incorporate into the text stream, they are application driven and must be supported by the software that creates and processes the data model.

The potential use of TILDE is shown in this separate page of examples.

Descriptor definitions and schemas

TILDE provides a simple mechanism for defining the meaning of a descriptor. Definitions are not stored in the data model but in a separate model that supports verification of descriptors, relationships and values. The definitions will therefore not appear in the output of a data model but are of course still preserved in the text stream.

The special property ~~~ is used for this purpose. The format is :

~~~ label~ Text description

# For example, two useful properties can be defined as follows 

~~~ ~id~ The identifier used for a descriptor. When used at the container 
         level (that is, with no indent) the value must be unique across 
         the application. When used for a descriptor in the container it 
         must be unique within the container.
~~~ ~created~ The date and time this description was created. The value 
         must be in an ISO 8601 format.

This approach allows a description to be supported by a set of descriptor definitions appearing at the beginning of the text stream. Definitions are read sequentially from the data stream and a later definition will over-ride an earlier one. Definitions can be qualified by properties or attributes though both must be defined in a schema for defining schema.

For example :
My Book Catalogue. A simple listing of books and other resources on my bookshelves.

~~~ ~id~ The identifier used for a descriptor. 
~~~ ~updated~ The date and time this description was first created or updated.

~~~ Item~ A specific item
~~~ title~ The title of the resource.
~~~ author~ The author of the resource.
~~~ identifier~ An identifier for the resource, for example an ISBN.
~~~ date~ The date of creation or publication.

~id~ mybookshelves
~updated~ 2019-09-21 12:27:58

    title~ Tadcaster's pubs, a brief history.
    author~ Ian Page
    date~ 2008
    identifier~ ISBN 0-9532249-4-5
    title~ Tadcaster history. Coaches, boats and trains.
    author~ Ian Page
    date~ 2010
    identifier~ None

# And so forth for other items.

Whilst it might be convenient for definitions to be directly associated with a short description, in most applications the descriptor definitions are likely to be shared across many descriptions. A second, special property may be used for this purpose. The ~~~~ property will incorporate a new text stream into the current text stream at the point where the property occurs. This can therefore be used to incorporate a shared schema of definitions. This property is preserved in the data model. The ~~~~ property has the following format :

~~~~ url of the text stream to be loaded.

# For example :

~~~~ file:///local-filename
~~~~ schemaName.txt 
~~~~ https://oldieshome.org.uk/TILDE/schemas/example1.txt

Whilst applications can define schemas as required it would make sense to share the definitions for a general set of properties and for the descriptors used to define definitions. These might be combined in a single text stream. The following is an example :

~~~ ~usage~ A descriptive note giving examples of how a descriptor should 
         be used.

~~~ ~id~ The identifier for the description or part of a description. The 
         description identifier must be unique within at least the service 
         providing the description.

~~~ ~#~  An identifer for a specific descriptor. The identifier must be unique 
         within the text stream. 

~~~ ~service~ A partial url which when containing the identifier provides 
         access to the description. The position of the identifier is marked 
         by {identifier}. A text stream may be provided by more a than one 
         An external identifier to a specific descriptor in the text stream 
         is a combination of {~id~ value}#{~#~ value}.

~~~ ~created~ The value is the date the description was first created.

~~~ ~updated~ The value is the date the description was updated.

~~~ ~language~ The language used in the text stream. This is not the same as 
         the language appearing in the entities described in the text stream. 
         Use language~ for that purpose. The value may be either the code or 
         the language name specified in ????.

~~~ ~note~ A note about the act of creating or updating a descriptor and its 
~~~ ~by~ The person or other agent responsible for an action.
    ~usage~ For example associated with an update :
|               ~updated~ 2019-08-31T10:37:56
|                    ~by~ Bill Oldroyd
|                    ~note~ Correcting spelling mistakes

The current simple schema is found at TILDE property definitions