Consuming Linked Data SemTech2010

Consuming Linked DataJuan F. SequedaDepartment of Computer ScienceUniversity of Texas at AustinSemTech 2010

How many people are familiar withRDFSPARQLLinked DataWeb Architecture (HTTP, etc)

HistoryLinked Data Design Issues by TimBL July 2006Linked Open Data Project WWW2007First LOD Cloud May 20071st Linked Data on the Web Workshop WWW20081stTriplification Challenge 2008How to Publish Linked Data Tutorial ISWC2008BBC publishes Linked Data 20082nd Linked Data on the Web Workshop WWW2009NY Times announcement SemTech2009 - ISWC091st Linked Data-a-thon ISWC20091st How to Consume Linked Data Tutorial ISWC2009Data.gov.uk publishes Linked Data 20102st How to Consume Linked Data Tutorial WWW20101st International Workshop on Consuming Linked Data COLD2010…

June 2010YOU GET THE PICTUREITS BIG and getting BIGGER andBIGGER

Now what can we do with this data?

The Modigliani TestShow me all the locations of all the original paintings of ModiglianiDaniel Koller (@dakoller) showed that you can find this with a SPARQL query on DBpediaThanks Richard MacManus - ReadWriteWeb

Results of the Modigliani TestAtanasKiryakov from OntotextUsed LDSR – Linked Data Semantic RepositoryDbpediaFreebaseGeonamesUMBELWordnetPublished April 26, 2010:http://www.readwriteweb.com/archives/the_modigliani_test_for_linked_data.php

SPARQL QueryPREFIX fb: http://rdf.freebase.com/ns/PREFIX dbpedia: http://dbpedia.org/resource/PREFIX dbp-prop: http://dbpedia.org/property/PREFIX dbp-ont: http://dbpedia.org/ontology/PREFIX umbel-sc: http://umbel.org/umbel/sc/PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#PREFIX ot: http://www.ontotext.com/SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_citWHERE { ?pfb:visual_art.artwork.artistdbpedia:Amedeo_Modigliani ; fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ; ot:preferredLabel ?painting_l. ?owot:preferredLabel ?owner_l . OPTIONAL { ?owfb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } . OPTIONAL { ?owdbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc } OPTIONAL { ?owdbp-ont:city [ ot:preferredLabel ?city_db_cit ] }}

Let’s start by making sure that we understand what Linked Data is…

Search forFootball Players who went to the University of Texas at Austin, played for the Dallas Cowboys as Cornerback

Why can’t we just FIND it…

I’ll tell you how I did NOT find it

Current Web = internet + links + docs

So what is the problem?We aren’t always interested in documentsWe are interested in THINGSThese THINGS might be in documentsWe can read a HTML document rendered in a browser and find what we are searching forThis is hard for computers. Computers have to guess (even though they are pretty good at it)

What do we need to do?Make it easy for computers/software to find THINGS

How can we do that?Besides publishing documents on the webwhich computers can’t understand easilyLet’s publish something that computers can understand

But wait… don’t we do that already?

Current Data on the WebRelational DatabasesAPIsXMLCSVXLS…Can’t computers and applications already consume that data on the web?

True! But it is all in different formats and data models!

This makes it hard to integrate data

The data in different data sources aren’t linked

For example, how do I know that the Juan Sequeda in Facebook is the same as Juan Sequeda in Twitter

Or if I create a mashup from different services, I have to learn different APIs and I get different formats of data back

Wouldn’t it be great if we had a standard way of publishing data on the Web?

We have a standardized way of publishing documents on the web, right?HTML

Then why can’t we have a standard way of publishing data on the Web?

Good question! And the answer is YES. There is!

Resource Description Framework (RDF)A data model A way to model datai.e. Relational databases use relational data modelRDF is a triple data modelLabeled GraphSubject, Predicate, Object<Juan> <was born in> <California><California> <is part of> <the USA><Juan> <likes> <the Semantic Web>

RDF can be serialized in different waysRDF/XMLRDFa (RDF in HTML)N3TurtleJSON

So does that mean that I have to publish my data in RDF now?

You don’t have to… but we would like you to 

Databases back up documentsTHINGS have PROPERTIES:A Book as a Title, an author, …This is a THING:A book title “Programming the Semantic Web” by Toby Segaran, …

Lets represent the data in RDFProgramming the Semantic WebtitleauthorbookToby Segaranisbn978-0-596-15381-6publishernamePublisherO’Reilly

Remember that we are on the webEverything on the web is identified by a URI

And now let’s link the data to other dataProgramming the Semantic Webtitleauthorhttp://…/isbn978Toby Segaranisbn978-0-596-15381-6publishernamehttp://…/publisher1O’Reilly

And now consider the data from Revyu.comhasReviewhttp://…/review1http://…/isbn978descriptionreviewerAwesome Bookhttp://…/reviewernameJuan Sequeda

Let’s start to link datahasReviewhttp://…/review1http://…/isbn978Programming the Semantic WebtitledescriptionsameAshasReviewerAwesome Bookauthorhttp://…/isbn978Toby Segaranhttp://…/reviewernameisbn978-0-596-15381-6Juan Sequedapublishernamehttp://…/publisher1O’Reilly

Juan Sequeda publishes data toohttp://juansequeda.com/idhttp://dbpedia.org/AustinlivesInnameJuan Sequeda

Let’s link more datahasReviewhttp://…/review1http://…/isbn978descriptionhasReviewerAwesome Bookhttp://…/reviewernameJuan SequedasameAshttp://juansequeda.com/idhttp://dbpedia.org/AustinlivesInnameJuan Sequeda

And morehasReviewhttp://…/review1http://…/isbn978Programming the Semantic WebtitledescriptionsameAshasReviewerAwesome Bookauthorhttp://…/isbn978Toby Segaranhttp://…/reviewernameisbn978-0-596-15381-6Juan SequedapublishersameAshttp://…/publisher1nameO’Reillyhttp://juansequeda.com/idhttp://dbpedia.org/AustinlivesInnameJuan Sequeda

Data on the Web that is in RDF and is linked to other RDF data is LINKED DATA

Linked Data PrinciplesUse URIs as names for thingsUse HTTP URIs so that people can look up (dereference) those names.When someone looks up a URI, provide useful information.Include links to other URIs so that they can discover more things.

Linked Data makes the web appear as ONEGIANTHUGEGLOBALDATABASE!

I can query a database with SQL. Is there a way to query Linked Data with a query language?

Yes! There is actually a standardize language for thatSPARQL

FIND all the reviews on the book “Programming the Semantic Web” by people who live in Austin

hasReviewhttp://…/review1http://…/isbn978Programming the Semantic WebtitledescriptionsameAshasReviewerAwesome Bookauthorhttp://…/isbn978Toby Segaranhttp://…/reviewernameisbn978-0-596-15381-6Juan SequedapublishersameAsnamehttp://…/publisher1O’Reillyhttp://juansequeda.comhttp://dbpedia.org/AustinlivesInnameJuan Sequeda

This looks cool, but let’s be realistic. What is the incentive to publish Linked Data?

What was your incentive to publish an HTML page in 1990?

1) Share data in documents2) Because you neighbor was doing it

So why should we publish Linked Data in 2010?

1) Share data as data2) Because you neighbor is doing it

And guess who is starting to publish Linked Data now?

Linked Data PublishersUK GovernmentUS GovernmentBBCOpen Calais – Thomson ReutersFreebaseNY TimesBest BuyCNETDbpediaAre you?

How can I publish Linked Data?

Publishing Linked DataLegacy Data in Relational DatabasesD2R ServerVirtuosoTriplifyUltrawrapCMSDrupal 7Native RDF StoresDatabases for RDF (Triple Stores)AllegroGraph, Jena, Sesame, VirtuosoTalis Platform (Linked Data in the Cloud)In HTML with RDFa

Consuming Linked Data by Humans

<span rel="foaf:interest"><a href="http://dbpedia.org/resource/Database" property="dcterms:title">Database</a>,<a href="http://dbpedia.org/resource/Data_integration" property="dcterms:title">Data Integration</a>,<a href="http://dbpedia.org/resource/Semantic_Web" property="dcterms:title">Semantic Web</a>,<a href="http://dbpedia.org/resource/Linked_Data" property="dcterms:title">Linked Data</a>,etc.</span>

HTML BrowsersRDF can be serialized in RDFaHave you heard ofYahoo’s Search MonkeyGoogle Rich Snippets?They are consuming RDFaBut WHY?

Because there is life beyond ten blue links

Google and Yahoo are starting to crawl RDFa!The Semantic Web is a reality!

The RealityYahoo is crawling data that is in RDFa and Microformats under a specific vocabularies FOAFGoodRelations…Google is crawling RDFa and Microformats that use the Google vocabulary

Linked Data BrowsersNot actually separate browsers. Run inside of HTML browsersView the data that is returned after looking up a URI in tabular form(IMO) UI lacks usability

Linked Data BrowsersTabulatorhttp://www.w3.org/2005/ajar/tabOpenLinkhttp://ode.openlinksw.com/ZitgistDataviewrhttp://dataviewer.zitgist.com/Marbleshttp://www5.wiwiss.fu-berlin.de/marbles/Exploratorhttp://www.tecweb.inf.puc-rio.br/explorator

http://dev.semsol.com/2010/semtech/

Time to create new and innovative ways to interact with Linked Data

This may be one of the Killer Apps that we have all been waiting forhttp://en.wikipedia.org/wiki/File:Mosaic_browser_plaque_ncsa.jpg

It’s time to partner with HCI communitySemantic Web UIs don’t have to be ugly

Consume Linked Data with SPARQL

SPARQL EndpointsLinked Data sources usually provide a SPARQL endpoint for their dataset(s)SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol*Send your SPARQL query, receive the result* http://www.w3.org/TR/rdf-sparql-protocol/

Where can I find SPARQL Endpoints?Dbpedia: http://dbpedia.org/sparqlMusicbrainz: http://dbtune.org/musicbrainz/sparqlU.S. Census: http://www.rdfabout.com/sparqlSemantic Crunchbase: http://cb.semsol.org/sparqlhttp://esw.w3.org/topic/SparqlEndpoints

Accessing a SPARQL EndpointSPARQL endpoints: RESTful Web servicesIssuing SPARQL queries to a remote SPARQL endpoint is basically an HTTP GET request to the SPARQL endpoint with parameter queryGET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1URL-encoded string with the SPARQL query

Query Results FormatsSPARQL endpoints usually support different result formats:XML, JSON, plain text (for ASK and SELECT queries)RDF/XML, NTriples, Turtle, N3 (for DESCRIBE and CONSTRUCT queries)

Query Results FormatsPREFIX dbp: http://dbpedia.org/ontology/PREFIX dbpprop: http://dbpedia.org/property/SELECT ?name ?bdayWHERE { ?pdbp:birthplace <http://dbpedia.org/resource/Berlin> . ?pdbpprop:dateOfBirth ?bday . ?pdbpprop:name ?name .}

Query Result FormatsUse the ACCEPT header to request the preferred result format:GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1 Accept: application/sparql-results+json

Query Result FormatsAs an alternative some SPARQL endpoint implementations (e.g. Joseki) provide an additional parameter outGET /sparql?out=json&query=... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1

Accessing a SPARQL EndpointMore convenient: use a librarySPARQL JavaScript Libraryhttp://www.thefigtrees.net/lee/blog/2006/04 sparql_calendar_demo_a_sparql.htmlARC for PHPhttp://arc.semsol.org/RAP – RDF API for PHPhttp://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html

Accessing a SPARQL EndpointJena / ARQ (Java)http://jena.sourceforge.net/Sesame (Java)http://www.openrdf.org/SPARQL Wrapper (Python)http://sparql-wrapper.sourceforge.net/PySPARQL (Python)http://code.google.com/p/pysparql/

Accessing a SPARQL EndpointExample with Jena/ARQimport com.hp.hpl.jena.query.*;String service = "..."; // address of the SPARQL endpoint String query = "SELECT ..."; // your SPARQL query QueryExecutione = QueryExecutionFactory.sparqlService(service, query)ResultSet results = e.execSelect(); while ( results.hasNext() ) {QuerySolutions = results.nextSolution(); // ...} e.close();

Querying a single dataset is quite boringcompared to:Issuing SPARQL queries over multiple datasetsHow can you do this?Issue follow-up queries to different endpointsQuerying a central collection of datasetsBuild store with copies of relevant datasetsUse query federation system

Follow-up QueriesIdea: issue follow-up queries over other datasets based on results from previous queriesSubstituting placeholders in query templates

String s1 = "http://cb.semsol.org/sparql"; String s2 = "http://dbpedia.org/sparql";String qTmpl = "SELECT ?c WHERE{ <%s> rdfs:comment ?c }";String q1 = "SELECT ?s WHERE { ..."; QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1); ResultSet results1 = e1.execSelect(); while ( results1.hasNext() ) {QuerySolution s1 = results.nextSolution(); String q2 = String.format( qTmpl, s1.getResource("s"),getURI() );QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2); ResultSet results2 = e2.execSelect(); while ( results2.hasNext() ) { // ... } e2.close();}e1.close();Find a list of companies Filtered by some criteria and return DbpediaURIs from them

Follow-up QueriesAdvantageQueried data is up-to-dateDrawbacksRequires the existence of a SPARQL endpoint for each datasetRequires program logicVery inefficient

Querying a Collection of DatasetsIdea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasetsExample:SPARQL endpoint over a majority of datasets from the LOD cloud at:http://uberblic.orghttp://lod.openlinksw.com/sparql

Querying a Collection of DatasetsAdvantage:No need for specific program logicDrawbacks:Queried data might be out of date Not all relevant datasets in the collection

Own Store of Dataset CopiesIdea: Build your own store with copies of relevant datasets and query itPossible stores:Jena TDB http://jena.hpl.hp.com/wiki/TDBSesame http://www.openrdf.org/OpenLink Virtuoso http://virtuoso.openlinksw.com/4store http://4store.org/AllegroGraphhttp://www.franz.com/agraph/etc.

Populating Your StoreGet RDF dumps provided for the datasets(Focused) Crawlingldspiderhttp://code.google.com/p/ldspider/Multithreaded API for focussed crawlingCrawling strategies (breath-first, load-balancing)Flexible configuration with callbacks and hooks

Own Store of Dataset CopiesAdvantages:No need for specific program logic Can include all datasetsIndependent of the existence, availability, and efficiency of SPARQL endpointsDrawbacks:Requires effort to set up and to operate the store Ideally, data sources provide RDF dumps; if not? How to keep the copies in sync with the originals?Queried data might be out of date

Federated Query ProcessingIdea: Querying a mediator which distributes sub-queries to relevant sources and integrates the results

Federated Query ProcessingInstance-based federationEach thing described by only one data source Untypical for the Web of DataTriple-based federationNo restrictions Requires more distributed joinsStatistics about datasets required (both cases)

Federated Query ProcessingDARQ (Distributed ARQ)http://darq.sourceforge.net/Query engine for federated SPARQL queriesExtension of ARQ (query engine for Jena)Last update: June 28, 2006Semantic Web Integrator and Query Engine(SemWIQ)http://semwiq.sourceforge.net/Actively maintained

Federated Query ProcessingAdvantages:No need for specific program logic Queried data is up to dateDrawbacks:Requires the existence of a SPARQL endpoint for each datasetRequires effort to set up and configure the mediator

In any case:You have to know the relevant data sourcesWhen developing the app using follow-up queriesWhen selecting an existing SPARQL endpoint over a collection of dataset copiesWhen setting up your own store with a collection of dataset copiesWhen configuring your query federation system You restrict yourself to the selected sources

In any case:You have to know the relevant data sourcesWhen developing the app using follow-up queriesWhen selecting an existing SPARQL endpoint over a collection of dataset copiesWhen setting up your own store with a collection of dataset copiesWhen configuring your query federation system You restrict yourself to the selected sourcesThere is an alternative: Remember, URIs link to data

Automated Link TraversalIdea: Discover further data by looking up relevant URIs in your applicationCan be combined with the previous approaches

Link Traversal Based Query ExecutionApplies the idea of automated link traversal to the execution of SPARQL queriesIdea:Intertwine query evaluation with traversal of RDF linksDiscover data that might contribute to query results during query executionAlternately:Evaluate parts of the query Look up URIs in intermediate solutions

Link Traversal Based Query Execution

Link Traversal Based Query ExecutionAdvantages:No need to know all data sources in advanceNo need for specific programming logicQueried data is up to dateDoes not depend on the existence of SPARQL endpoints provided by the data sourcesDrawbacks:Not as fast as a centralized collection of copiesUnsuitable for some queriesResults might be incomplete (do we care?)

ImplementationsSemantic Web Client library (SWClLib) for Javahttp://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/SWIC for Prologhttp://moustaki.org/swic/

ImplementationsSQUIN http://squin.orgProvides SWClLib functionality as a Web serviceAccessible like a SPARQL endpointInstall package: unzip and startLess than 5 mins!Convenient access with SQUIN PHP tools:$s = 'http:// ...'; // address of the SQUIN service $q = new SparqlQuerySock( $s, '... SELECT ...' ); $res = $q->getJsonResult();// or getXmlResult()

Getting Started Finding URIsFinding Additional DataFinding SPARQL Endpoints

What is a Linked Data applicationSoftware system that makes use of data on the web from multiple datasets and that benefits from links between the datasets

Characteristics of Linked Data ApplicationsConsume data that is published on the web following the Linked Data principles: an application should be able to request, retrieve and process the accessed data

Discover further information by following the links between different data sources: the fourth principle enables this.

Combine the consumed linked data with data from sources (not necessarily Linked Data)

Expose the combined data back to the web following the Linked Data principles

Consuming Linked Data SemTech2010

More Related Content

What's hot

Viewers also liked

Similar to Consuming Linked Data SemTech2010

More from Juan Sequeda

Recently uploaded

Consuming Linked Data SemTech2010