TILDE is a new approach for describing entities.
Defining TILDE is a work-in-progress. If you wish to comment, please email me, Bill Oldroyd.
First version, 21 September 2019.
TILDE : a Trim, Indented Layout for Describing Entities. TILDE follows on from approaches such as JSON and YAML by providing a simple, minimal approach to describe entities. Entities being such things as information resources, people, organisations, objects, events and so forth. Entities can be both real and imaginary.
TILDE addresses the manner in which descriptions of entities are recorded and presented. The objective is to provide a simple text layout which can be easily understood, created and edited by humans using simple text editing software. At the same time the text can be interpreted by software into a set of 'descriptors' stored in a hierarchical data model. The hierarchy is defined through indentation within the text. The simplicity and clarity of the layout assists understanding of the manner in which the descriptions are constructed, thus allowing experimentation and development. A schema can be included in the description or loaded separately.
Within the examples shown here the choice of descriptors is illustrative not definitive. Comparison with any current schemes is to provide a basis for the choices made in designing TILDE. Differences are highlighted to illustrate a different way to describe entities. Here is an example of a description using TILDE.
~language~ english
~updated~ 2019-08-23T09:55:07
IR~
title~ Yorkshire cotton
sub-title~ The Yorkshire cotton industry, 1780-1835
written~
by~ George Ingle
is-a~ book; text, illustrations, photographs, sources, gazetter
published~
by~ Carnegie Publishing Limited
location~ Preston, United Kingdom
date~ 1997
form~ soft cover; print, on paper; 279p
identifier~ ISBN 1-85936-0820-9
about~
location~ Yorkshire, United Kingdom
subject~ history, cotton industry
period~ 1780 - 1835
keywords~ industrial revolution, surveys, water power,
steam power, machinery, mill construction, capital, labour supply
Initially the description above is specifying the language of the description, which is 'english' and the date of the last update. 'IR~'stands for "information resource" and is equivalent to the entity being described, which in this case is a book. The descriptors used are fairly obvious for a book except for 'is-a~' which describes the intellectual form of the entity and about~ introduces a description of the content. Bold and italic text is used only to aid comprehension.
Descriptors have 4 main components : an indentation reflecting position in the hierarchy, a label, a list of one to many values, with each value having a dictionary of zero to many subsiduary Descriptors. A description is an hierachical arrangement of Descriptors. This model can, for example, be implemented in Python as a group of instances of a 'descriptor' class, but the prime source of the description is the original text stream.
Although there is a Python implementation to create an appropriate data model for a description there is no reason why the model cannot be implemented by other methods at a lower level, for example RDF.
The choice of Descriptors is application dependant. In the examples the use of TILDE is to describe information resources and related entities. Entities such as books, films, websites, services, people, organisations, events, categories, term lists and so forth.
TILDE uses a special character to denote labels, this is the ~ (tilde, U+007E).
A Descriptor is one or more lines of text which have the following form :
- Zero to many spaces which define the indentation.
- A label comprising the following characters a-zA-Z0-9_.~ , terminated by a tilde and one or more spaces.
- A value which is any unicode text or may be an empty string.
- Zero to many lines following the label line that are added to the value. Leading spaces in these lines are ignored to allow text to be in a helpful position to assist reading. This space is not preserved in the data model.
So for example :
~language~ english
Person~ John Smith
life~
birth~ 18 March 1824
place~ Leeds
death~ 9 September 1879
description~ John Smith was the eldest of five children of Samuel Smith,
a wealthy butcher and tanner from Meanwood in Leeds. In
1847 John Smith purchased the Backhouse & Hartley brewery
in Tadcaster with funding provided by his father. Smith's
timing proved fortuitous; pale ales were displacing porter
as the public's most popular style of beer, and
Tadcaster's hard water proved to be well-suited for brewing
the new style. The prosperity of the 1850s and 1860s,
together with the arrival of the railways, realised greater
opportunities for brewers, and by 1861 Smith employed eight
men in his brewing and malting enterprise.
~note~ Copied from Wikipedia
url~ https://en.wikipedia.org/wiki/John_Smith_(brewer)
Descriptors can be defined by the application, however a convention is used in this description is that labels may be of 3 types : an attribute, a property and a cluster.
An attribute has a value which describes the entity. The description can be broken down in more detail using indentation.
A property, which starts and ends with the special character ~ , describes aspects of a Descriptor such as language, format, dates and notes of correction, etc.. Within the description properties qualify the parent attribute within the description providing information about the value or its creation or update. A value defined in a property also applies to all its subsiduary Descriptors unless it is over-written at a lower level in the hierarchy. Bearing in mind the container is a Descriptor, properties such as the date/time of update or language can be used at the top level to provide information that applies to the whole description. In the example above the property ~language~ applies to the whole description, specifying that the description uses the english language.
A cluster is a special type of attribute that is equivalent to the entity being described and provides a point where all the attributes are grouped together in a description. There may be many clusters in the text stream.
By convention attributes and properties use lower case letters and clusters begin with an upper case letter.
A TILDE description is a stream of text which exists in a 'container', for example a text document. The stream of text comprises a set of descriptors which describe the entity but properties at the zero indentation level define characterists of the container. For example an identifier or as we have seen earlier the date of an update and the language of the description. These properties apply to the whole description unless overwritten at a lower level in the description hierarchy.
Also by convention, the meaning of the additional value lines can be affected by the first character.
- Lines commencing with # are regarded as comments in the text stream and are not loaded into the data model.
- Lines commencing with |. A value generated from multiple value lines is run together as a single stream of text in the data model. If a value line begins with | , a newline character is inserted at start of a value line. This can be used to create paragraphs within the text, to preserve the layout of a line of text or prevent the interpetation of a line as a descriptor.
In addition there are a number of special properties as follows :
- ~ : A single tilde is use to create a list of subsiduary values with the label ~ . For example :
# This is a list of capitals of the countries that form the United Kingdom | UK_capitals~ ~ London ~ Edinburgh ~ Cardiff ~ Belfast - ~~ : This will replicate the previous label at the same level of indentation. This is purely syntactic sugar. For example :
title~ A Book written~ by~ Joan Collins ~~ Fred Dibnah ~~ Sam AllerdiceThis label allows a data value to be broken up into a set of specific components. It could be used to denote an additional qualifying value to the parent value, for example langauge or to break up the value into a set of components. This is illustrated by the following :# The insertion of a image in a stream of text description~ The Crank Mill was an early implementation of a Boulton and Watt steam engine in a Yorkshire cotton mill. The mill building can still be found near the bottom of Station Road in Morley. image~ url-for-an-image-of-the-Crank-Mill ~~ The power of the steam engine was transmitted into the mill through a shaft driven by a large, mostly uncovered, sun-and-planet crank built on the side of the mill; hence the name, Crank Mill. - ~~~ : Introduces a specification of a label. The use of this is described in more detail later in the section on schemas.
~~~ ~updated~ This property defines when the description, or a part of it, was last updated. - ~~~~ : Loads a text stream into the description from a value which contains an identifier or an address, for example a URL or a local filename.
~~~~ https://oldieshome.org.uk/TILDE/examples/load.txt
written~by~ Fred Bloggs
# is the same as
written~
by~ Fred Bloggs
It is important to understand that although these conventions are easy to incorporate into the text stream, they are application driven and must be supported by the software that creates and processes the data model.
The potential use of TILDE is shown in this separate page of examples.
Descriptor definitions and schemas
TILDE provides a simple mechanism for defining the meaning of a descriptor. Definitions are not stored in the data model but in a separate model that supports verification of descriptors, relationships and values. The definitions will therefore not appear in the output of a data model but are of course still preserved in the text stream.
The special property ~~~ is used for this purpose. The format is :
~~~ label~ Text description
# For example, two useful properties can be defined as follows
~~~ ~id~ The identifier used for a descriptor. When used at the container
level (that is, with no indent) the value must be unique across
the application. When used for a descriptor in the container it
must be unique within the container.
~~~ ~created~ The date and time this description was created. The value
must be in an ISO 8601 format.
This approach allows a description to be supported by a set of descriptor definitions appearing at the beginning of the text stream. Definitions are read sequentially from the data stream and a later definition will over-ride an earlier one. Definitions can be qualified by properties or attributes though both must be defined in a schema for defining schema.
For example :
My Book Catalogue. A simple listing of books and other resources on my bookshelves.
~~~ ~id~ The identifier used for a descriptor.
~~~ ~updated~ The date and time this description was first created or updated.
~~~ Item~ A specific item
~~~ title~ The title of the resource.
~~~ author~ The author of the resource.
~~~ identifier~ An identifier for the resource, for example an ISBN.
~~~ date~ The date of creation or publication.
~id~ mybookshelves
~updated~ 2019-09-21 12:27:58
Item~
title~ Tadcaster's pubs, a brief history.
author~ Ian Page
date~ 2008
identifier~ ISBN 0-9532249-4-5
Item~
title~ Tadcaster history. Coaches, boats and trains.
author~ Ian Page
date~ 2010
identifier~ None
# And so forth for other items.
Whilst it might be convenient for definitions to be directly associated with a short description, in most applications the descriptor definitions are likely to be shared across many descriptions. A second, special property may be used for this purpose. The ~~~~ property will incorporate a new text stream into the current text stream at the point where the property occurs. This can therefore be used to incorporate a shared schema of definitions. This property is preserved in the data model. The ~~~~ property has the following format :
~~~~ url of the text stream to be loaded. # For example : ~~~~ file:///local-filename ~~~~ schemaName.txt ~~~~ https://oldieshome.org.uk/TILDE/schemas/example1.txt
Whilst applications can define schemas as required it would make sense to share the definitions for a general set of properties and for the descriptors used to define definitions. These might be combined in a single text stream. The following is an example :
~~~ ~usage~ A descriptive note giving examples of how a descriptor should
be used.
~~~ ~id~ The identifier for the description or part of a description. The
description identifier must be unique within at least the service
providing the description.
~~~ ~#~ An identifer for a specific descriptor. The identifier must be unique
within the text stream.
~~~ ~service~ A partial url which when containing the identifier provides
access to the description. The position of the identifier is marked
by {identifier}. A text stream may be provided by more a than one
service.
An external identifier to a specific descriptor in the text stream
is a combination of {~id~ value}#{~#~ value}.
~~~ ~created~ The value is the date the description was first created.
~~~ ~updated~ The value is the date the description was updated.
~~~ ~language~ The language used in the text stream. This is not the same as
the language appearing in the entities described in the text stream.
Use language~ for that purpose. The value may be either the code or
the language name specified in ????.
~~~ ~note~ A note about the act of creating or updating a descriptor and its
value.
~~~ ~by~ The person or other agent responsible for an action.
~usage~ For example associated with an update :
| ~updated~ 2019-08-31T10:37:56
| ~by~ Bill Oldroyd
| ~note~ Correcting spelling mistakes
The current simple schema is found at TILDE property definitions