Linked Statistical Models Vocabulary (LIMO)

A Vocabulary for Incorporating Predictive Models into the Linked Data Web

Unofficial Draft 15 October 2013

This version:
http://www.purl.org/limo-ontology/limo/2013/vocab-limo-20131015
Latest Published version:
http://www.purl.org/limo-ontology/limo
Previous version:
http://www.purl.org/limo-ontology/limo/2013/vocab-limo-20131015
Authors:
Evangelos Kalampokis, CERTH/ITI and University of Macedonia,
Areti Karamanou, CERTH/ITI and University of Macedonia,
Efthimios Tambouris, CERTH/ITI and University of Macedonia,
Konstantinos Tarabanis, CERTH/ITI and University of Macedonia,

Valid XHTML + RDFa Creative Commons License This document is licensed under a Creative Commons Attribution License. This copyright applies to the Limo Specification and accompanying documentation in RDF.


Abstract

Predictive modeling reflects the process of using data and statistical or data mining methods for predicting new observations. The predictive models that are created out of this process could be reused in different applications in the same sense that open data is reused. Towards this end, a few standards have been proposed in order to enable transfer of predictive models across platforms and applications. In this paper we suggest the need for incorporating predictive models into the Linked Data Web. Towards this end, we propose an RDF Schema vocabulary that will enable the creation of predictive models descriptions adhering to the Linked Data principles. The incorporation of these descriptions into the Linked Data Web could create new potentials beyond cross-platform model reuse. In particular, it will enable (a) easy discovery and reuse of appropriate models at a Web Scale and (b) creation of more accurate models exploiting connections of models to other models, datasets and other resources on the Web.

Status of This Document

This document is merely a public working draft of Limo vocabulary specification.

This document was published by Information Systems Lab (ISLab). If you wish to make comments regarding this document, please contact Evangelos Kalampokis or Areti Karamanou. All comments are welcome.

Table of Contents

  1. Outline of Limo
    1. Vocabulary Index
  2. Introduction
  3. Namespaces
  4. Conformance
  5. Overview
    1. A Basic Example
  6. Limo Specification
  1. Acknowledgments
  2. Change History
  3. References
    1. Normative References
    2. Informative References

1. Outline of Limo

Limo is an RDF vocabulary describing statistical and data mining models. Limo defines five main classes:

UML-style block diagram of the terms in this vocabulary
Fig. 1 Pictorial summary of Limo key terms and their relationship

1.1 Vocabulary Index

An a-z index of Limo terms, by class (categories or types) and by property.

Classes: | File | Method | Model | Power | Variable |

Properties: | accessURL | baseline | creator | data | description | evaluationData | evaluationMethod | file | issued | method | modelType | outcome | power | publishedIn | rawData | spatial | temporal | theme | title | trainingData | usageType | validationData | variable | variableType |

Main Limo terms, grouped in broad categories.

2. Introduction

Predictive modeling reflects the process of using data and statistical or data mining methods for predicting new observations. The predictive models that are created out of this process could be reused in different applications in the same sense that open data is reused. Towards this end, a few standards such as PMML [PMML] have been proposed in order to enable transfer of predictive models across platforms and applications. Limo is an RDF Schema vocabulary that will enable the creation of predictive models descriptions adhering to the Linked Data principles. The incorporation of these descriptions into the Linked Data Web could create new potentials beyond cross-platform model reuse.

You can find the Turtle source of Limo here.

3. Namespaces

The following table presents a set of namespaces and prefixes used in this document.

PrefixNamespaceReference
limohttp://purl.org/limo-ontology/limo#This document, [LIMO]
qbhttp://purl.org/linked-data/cube#[RDF Data Qube Vocabulary]
skoshttp://www.w3.org/2004/02/skos/core#[SKOS-REFERENCE]
foafhttp://xmlns.com/foaf/0.1/[FOAF]
dcthttp://purl.org/dc/terms/[DCMI]
dctypehttp://purl.org/dc/dcmitype/[DCMITYPE]
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#[RDF-CONCEPTS]
rdfshttp://www.w3.org/2000/01/rdf-schema#[RDF-SCHEMA]
eghttp://example.org/ns#(Non-normative, used for examples only)

4. Conformance

[TODO]

5. Overview

5.1 A Basic Example

This section provides the Limo description of the predictive model developed by Ginsberg et al. and presented in [GINSBERG_2009]. The model aims at predicting influenza-like illness (ILI) physician visits from ILI-related queries.

Initially, we define a description of the model itself. The model employs a linear regression method as well as data from Google and the US Centers for Disease Control and Prevention. The data is about nine regions of the United States between 2003 and 2008.

	eg:DDCILImodel a limo:Model;
		dct:title "CDC-ILI model"@en;
		limo:spatial [rdf:type dbpedia:United_States];
		limo:temporal
			[a dc:terms PeriodOfTime;
			limo:startDate "2003-09-28"^^xsd:date;
			limo:endDate "2008-05-11"^^xsd:date;];
		limo:modelType eg:regression;
		limo:variable eg:resp;
		limo:variable eg:pred;
		limo:method eg:linearregression;
		limo:power eg:CDCILIpower;
		limo:file eg:CDCILIfile;
		limo:rawData eg:CDCILIdataset;
		limo:evaluationData eg:CDCILIevaluationdata;
		limo:validationData eg:CDCILIvalidationdata;
		limo:trainingData eg:CDCILItrainingdata;
		dct:creator eg:ginsberg, eg:mohebbi, eg:patel, eg:brammer,
		eg:smolinski, eg:brilliant.
	eg:CDCILIdataset a dctype:DataSet;
		dct:resource <http://www.cdc.gov/flu/weekly<.
	

We then define the variables of the model. Our example defines two variables: one of them, the predictor, reflects the probability that a random search query submitted from a region is ILI-related while the other, the response, is the percentage of physician visits in which a patient presents with influenza-like symptoms in a region.

	eg:resp a limo:Variable;
		limo:variableType eg:continuous;
		dct:description "Percentage of physician visits in which a
		patient presents with influenza-like symptoms in a region"@en;
		limo:usageType eg:response;
		limo:theme eg:ILIphysvisits.
	eg:pred a limo:Variable;
		limo:variableType eg:continuous;
		dct:description "Probability that a random search query
		submitted from a region is ILI-related"@en;
		limo:usageType eg:predictor;
		limo:theme eg:ILIrandquery.
	

The predictive power of the model can been described by the next example. The model was assessed using cross validation and a mean correlation of 0.97 was obtained.

	eg:CDCILIpower a limo:Power;
		limo:evaluationMethod eg:crossvalidation;
		limo:outcome 0.97.
  

In addition, Limo will enable the performance of queries across distributed description of predictive models. For example, below we present a query that unveils the variables that are predictors of influenza-like physician visits constructed by data regarding the U.S.:

	SELECT ?variable
	WHERE {
			?model limo:variable ?variable1;
			       limo:variable ?variable2;
			       limo:spatial ?sp1.
			?variable1 limo:usageType eg:predictor.
			?variable2 limo:usageType eg:response;
					   limo:theme eg:ILIphysvisits.
			?sp1 rdf:type dbpedia:United_States.
	}
  

7. Limo Specification

Classes and Properties (full detail)

Classes

Class: File

limo:File - Describes a file that can be imported in a particular platform such as R or SAS and execute the model. This could also be a PMML-XML file.
Properties include: accessURL
Used with: file

[#] [back to top]


Class: Method

limo:Method - Describes a statistical or data mining method used for creating a model. We assume that this class uses a set of predefined concepts such as linear regression, logistic regression, markov models, support vector machine, random forest, neural networks etc.
Used with: method
Sub class of: skos:Concept

[#] [back to top]


Class: Model

limo:Model - The actual predictive model that is described by the vocabulary.
Properties include: title file method issued baseline rawData variable description power spatial temporal data modelType creator publishedIn
Used with: baseline

[#] [back to top]


Class: Power

limo:Power - Describes the predictive power of a model.
Properties include: outcome evaluationMethod
Used with: power

[#] [back to top]


Class: Variable

limo:Variable - Represents the variables that are included in the predictive model.
Properties include: usageType description title variableType theme
Used with: variable

[#] [back to top]


Properties

Property: accessURL

limo:accessURL - The URL that the file can be accessed from.
Domain: limo:File

[#] [back to top]


Property: baseline

limo:baseline - Explicitly denotes that the predictive power of a model has been evaluated against the power of another model.
Domain: limo:Model
Range: limo:Model

[#] [back to top]


Property: creator

dct:creator - The person or organization that actually builds the model.
Domain: limo:Model
Range: foaf:Agent

[#] [back to top]


Property: data

limo:data - Connects a model to the dataset that contains the actual data that have been used for the development of the model.
Domain: limo:Model
Range: qb:Dataset
Has sub property limo:validationData limo:evaluationData limo:trainingData

[#] [back to top]


Property: description

dct:description - A small text to describe what the variable or model is about.
Domain: limo:Model limo:Variable
Range: rdfs:Literal

[#] [back to top]


Property: evaluationData

limo:evaluationData - Connects a model to the dataset that represents its evaluation data.
Sub property of limo:data

[#] [back to top]


Property: evaluationMethod

limo:evaluationMethod - Is used to infer the predictive power of the model. The evaluation methods include out-of-sample evaluation with statistics such as Predicted Residual Sums of Squares, Root Mean Square Error or cross-validation techniques.
Domain: limo:Power

[#] [back to top]


Property: file

limo:file - Connects a model with a relative file.
Domain: limo:Model
Range: limo:File

[#] [back to top]


Property: issued

dct:issued - Defines the actual date that the model has been created.
Domain: limo:Model

[#] [back to top]


Property: method

limo:method - Connects a model with its statistical or data mining method.
Domain: limo:Model
Range: limo:Method

[#] [back to top]


Property: modelType

limo:modelType - Describes the main categories of the model that can be developed, namely classification, regression, clustering and dimension reduction.
Domain: limo:Model

[#] [back to top]


Property: outcome

limo:outcome -
Domain: limo:Power

[#] [back to top]


Property: power

limo:power - Connects a model to its predictive power.
Domain: limo:Model
Range: limo:Power

[#] [back to top]


Property: publishedIn

limo:publishedIn - Connects a model with the bibliographic resource it is published.
Domain: limo:Model
Range: http://purl.org/dc/terms/:BibliographicResource

[#] [back to top]


Property: rawData

limo:rawData - Connects a model with it's dataset.
Domain: limo:Model
Range: dctype:Dataset

[#] [back to top]


Property: spatial

limo:spatial - Dexcribes the spatial dimension of the model. The spatial dimension is derived from the actual data that has been employed.
Domain: limo:Model
Range: http://purl.org/dc/terms/:Location

[#] [back to top]


Property: temporal

limo:temporal - Describes the time period that the model covers. The time period reflects the period that is described in the actual data that have been used for the development of the model.
Domain: limo:Model
Range: http://purl.org/dc/terms/:PeriodOfTime

[#] [back to top]


Property: theme

limo:theme - Connects a variable to its concept theme.
Domain: limo:Variable
Range: skos:Concept

[#] [back to top]


Property: title

dct:title - A name given to the model.
Domain: limo:Model limo:Variable
Range: rdfs:Literal

[#] [back to top]


Property: trainingData

limo:trainingData - Connects a model to a qb:DataSet that represents data that was used for the training of the model.
Sub property of limo:data

[#] [back to top]


Property: usageType

limo:usageType - Denotes whether the variable is the response of the model or one of the predictors.
Domain: limo:Variable

[#] [back to top]


Property: validationData

limo:validationData - Connects a model to a qb:DataSet that represents data that was used for the validation of the model.
Sub property of limo:data

[#] [back to top]


Property: variable

limo:variable - Connects a model to one of its variables.
Domain: limo:Model
Range: limo:Variable

[#] [back to top]


Property: variableType

limo:variableType - Denotes whether the variable is continuous, categorical or ordinal.
Domain: limo:Variable

[#] [back to top]


A. Acknowledgments

[TODO]

B. Changes History

[TODO]

C. References

C.1 Normative References

[DCMI]
DCMI Metadata Terms (DCMI), URL: http://dublincore.org/documents/dcmi-terms/
[DCMITYPE]
DCMI Type Vocabulary (DCMITYPE), URL: http://dublincore.org/documents/2000/07/11/dcmi-type-vocabulary/
[FOAF]
Friend of a Friend (FOAF), URL: http://www.foaf-project.org/
[GINSBERG_2009]
Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., Brilliant, L. Detecting inlfluenza epidemics using research engine query data. Nature 457(7232), 2009.
[LIMO]
Kalampokis, E., Karamanou, A., Tambouris, E., and Tarabanis, K. Towards a Vocabulary for Incorporating Predictive Models into the Linked Data Web. SemStats 2013 in conjunction with ISWC 2013.
[PMML]
Predictive Model Markup Language URL: http://www.dmg.org/v4-1/GeneralStructure.html
[RDF-CONCEPTS]
Klyne, G. and Carroll, J.J.. Resource Description Framework (RDF): Concepts and Abstract Syntax.10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210
[RDF Data Cube Vocabulary]
RDF Data Cube Vocabulary. URL: http://www.w3.org/TR/vocab-data-cube/
[RDF-SCHEMA]
Brickley, D., and Guha, E.T. RDF Vocabulary Description Language 1.0: RDF Schema.10 February 2004. W3C Recommendation. URL: http://www.w3.org/TR/2004/REC-rdf-schema-20040210
[SKOS]
Simple Knowledge Organization System (SKOS), URL: http://www.w3.org/2004/02/skos/

C.2 Infrormative References