OASIS Mailing List ArchivesView the OASIS mailing list archive below
or browse/search using MarkMail.


Help: OASIS Mailing Lists Help | MarkMail Help

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]
What is Data?

Hi Folks,

Below is a definition of data, based on our recent discussions. I ask for your comments on these aspects:

    1. Is the definition factually correct?

    2. Is it general? Are there any hidden assumptions 
       that restricts the generality of the definition?

    3. Is it complete? Is there anything else you would 
       add to the definition?

    4. Is it clear and easy to understand?


        What is Data?

When you represent an entity, you've created data. 

When you represent an attribute of an entity, you've created data. 

When you represent a relationship of an entity, you've created data.


There is a fellow over there; here's some data about him:

John Smith
Six feet tall
Father of Mary

"John Smith" represents an entity (the fellow over there). Specifically, it represents the entity by his name.

"Six feet tall" represents an attribute of the entity. 

"Father of" represents a relationship between the entity and another entity (Mary).


Data represents entities, attributes, and relationships.

Every piece of data can be categorized as either a representation of an entity, attribute, or relationship.

Entities, attributes, and relationships are intrinsic (innate, inseparable) parts of data, just as iron and carbon are intrinsic parts of steel.

A representation of a relationship is not more important than a representation of an entity or attribute, i.e. links are not more important than the entities they relate. 

Relationships are not different in nature than entities or attributes. 

Relationships, entities, and attributes have equal weight.

        What's Not Data?

The following description of a book is not data, although it contains data: 

    In this groundbreaking book, evolutionary
    biologist Jared Diamond stunningly dismantles
    racially biased theories of human history by
    revealing the environmental factors actually
    responsible for history's broadcast patterns.

Here is some of the data:

There is an entity:
    -	book

It has an attribute:
    -	innovativeness: groundbreaking

There is an entity:
    -	evolutionary biologist

It has attribute:
    -	name: Jared Diamond

It has a relationship:
    -	this entity is the author of the book entity

And so forth.

This example shows that text can be mined for data. 


This is not data and it contains no data:

    Run really fast.

The sentence contains a verb followed by an adverb followed by an adjective. Verbs, adverbs, and adjectives are not data.

Data are nouns.


Recent research suggests that there may be just two categories of data:
    1. Entities
    2. Relationships

An attribute is merely a special case of a relationship.


Above we stated that these represent an entity, attribute, and relationship, respectively: 

John Smith
Six feet tall
Father of

Rather than considering "Six feet tall" as an attribute of entity "John Smith", we can consider "Six" to be an entity and there is a relationship (has a height of) between "John Smith" and "Six":

John Smith has a height of Six

Thus, in this example there are two entities ("John Smith" and "Six") and two relationships ("has a height of" and "Father of")

        Data and Datum

Data is the plural of datum, a singular item. In practice, however, people use data as both the singular and plural form of the word.

[Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index]

News | XML in Industry | Calendar | XML Registry
Marketplace | Resources | MyXML.org | Sponsors | Privacy Statement

Copyright 1993-2007 XML.org. This site is hosted by OASIS