Tải bản đầy đủ (.pdf) (46 trang)

TÀI LIỆU - Cao Học Khóa 8 - ĐH CNTT 5. cNoSQLDatabase

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.84 MB, 46 trang )

The Design of a SQL Interface for a NoSQL
Database
Mary Holstege, PhD, Principal Engineer
Nov 7, 2012
Slide 1

Copyright © 2012 MarkLogic® Corporation. All rights reserved.

@mathling
 


Topics

§  MarkLogic: Enterprise NoSQL Database
§  SQL over NoSQL, What’s The Point?
§  How Does It Work?
§  Technical Nitty Gritty
§  Q&A

Slide 2

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


MarkLogic
NoSQL Database
ü  Shared-nothing
ü  Clustered
ü  Non-relational
ü  Schema-free


ü  Scalable

Host 1

Host 4

partition1

Slide 3

Copyright © 2012 MarkLogic® Corporation. All rights reserved.

Host 2

Host 5

Host 6

partition2 partition
partition
3
4

Host 3

Host k

partitionm



MarkLogic
Enterprise NoSQL Database
ü  ACID
ü  Real-time full-text search
ü  Automatic failover
ü  Replication
ü  Point in-time recovery
ü  Government-grade security

Slide 4

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


Keep This In Mind

§  Non-relational data model
§  Documents (XML, JSON, binary, text)

§  Rich “query” language (XQuery+extension functions)
§  Really a complete language for application development

§  Search engine core
§  Full-text
§  Hierarchical, structure
§  Geospatial
§  Values

Slide 5


Copyright © 2012 MarkLogic® Corporation. All rights reserved.


What’s the Point?

Slide 6

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


MarkLogic and BI Tools

§  Use familiar relational tools with non-relational data
§  Such as BI tools
§  Standard connection – no code, no custom integration

§  All the benefits of a BI tool – data analysis, visualization - with
an operational Big Data database
Slide 7

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


Structured and Unstructured Data

Personal Info
Aliases
Phone numbers
Bank accounts
Credit cards

Vehicles

§  MarkLogic’s XML data model was designed to handle rich
structured and unstructured data

Slide 8

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


Structured and Unstructured Data

Personal Info
Aliases
Phone numbers
Bank accounts
Credit cards
Vehicles

§  Richness of unstructured content does not fit naturally into a
relational model

Slide 9

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


XML vs Tables and Views

§  Data is stored in MarkLogic as XML

§  Rich, powerful way to represent complex data

§  BI tools expect to see relational tables and views
§  Rows and columns, accessible via SQL

Slide 10

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


How Can This Possibly Work?

Slide 11

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


How Can This Possibly Work?

§  Focus on "structured" pieces of the data
§  Create an in-memory column index on each piece
§  Create a view over column indexes

Slide 12

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


<?xml version="1.0" encoding="UTF-8"?>

date="2010-05-17T01:00:21.627923-08:00">
<headers>
<from personal="Ian Roberts"></from>
<to personal="Grails User"></to>
<subject>How to inject a session-scoped service into another service
</subject>
</headers>
<body type="text/plain; charset=us-ascii">

<url> Covers the
same thing, but has a little bit more detail WRT testing etc.</para>
Note rather than inject the application context you can also do
<function>myServiceProxy</function>(org.springframework.aop.scope.
ScopedProxyFactoryBean){ targetBeanName = 'myService' proxyTarget
Class = true}</para>
Or see <url> for an
intro to proxies.</para>
<footer type="signature" depth="1" hash="1986520897999785197">-<name>Ian Roberts</name> | Department of Computer Science
<email></email>
<affiliation>University of Sheffield, UK</affiliation></footer>
</body>
</message>
Slide 13

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


<?xml version="1.0" encoding="UTF-8"?>
date="2010-05-17T01:00:21.627923-08:00">

List
<headers>
<from personal="Ian Roberts"></from>
org.codehaus.grails.user
<to personal="Grails User"></to>
From
<subject>How to inject a session-scoped
service into another service
</subject>

</headers>
<body type="text/plain; charset=us-ascii">

<url> Covers the
same thing, but has a little bit more detail WRT testing etc.</para>
Note rather than inject the application context you can also do
<function>myServiceProxy</function>(org.springframework.aop.scope.
ScopedProxyFactoryBean){ targetBeanName = 'myService' proxyTarget
Class = true}</para>
Or see <url> for an
intro to proxies.</para>
<footer type="signature" depth="1" hash="1986520897999785197">-<name>Ian Roberts</name> | Department of Computer Science
<email></email>
<affiliation>University of Sheffield, UK</affiliation></footer>
</body>
</message>
Slide 14

Copyright © 2012 MarkLogic® Corporation. All rights reserved.



DOC
List
List
/mm/21
org.codehaus.grails.user
org.codehaus.grails.user

DOC

From

/mm/21



/mm/57

org.codehaus.grails.user

/mm/57



/mm/99

org.ruby-lang.ruby-core

/mm/99












§  Every document has a unique identifier in the database
§  We can map the document identifier to the list name
§  We can do that for every list attribute in every document
§  … and we can do that for any piece-of-structure in the database
§  These look like columns!

Slide 15

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


DOC
/mm/21

List
org.codehaus.grails.user

From

/mm/57


org.codehaus.grails.user



/mm/99

org.ruby-lang.ruby-core











§  Now we can stitch the two "columns" together
§  Using co-occurrence

Slide 16

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


DOC

List


From

Subject

DOC
List i.roberts
/mm/21
grails
How to
/mm/21 org.codehaus.grails.user
/mm/21
i.roberts
How to
/mm/57 grails
org.codehaus.grails.user

URL

Name

..From
Ian
Roberts


Ian
Roberts
/mm/99 org.ruby-lang.ruby-core
/mm/57 grails j.doe

Tips on .. John Doe



/mm/99

ruby

john.louis

New ..

/>












Affiliation
University
of Sheffield
University
of Sheffield

University
of Life


§  Now we can stitch the two "columns" together
§  Using co-occurrence
§  When we do that for several structured pieces, it starts to look
like a table

Slide 17

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


MarkLogic as a Columnar Database
V

§  Combine existing capabilities:
§  In-memory distributed indexes
§  Co-occurrence (extended)

§  MarkLogic as an in-memory columnar database

Slide 18

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


How Does It Work?


Slide 19

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


Mapping Unstructured Data
Indexes have typed values, but text is text
§  Rich set of types
§  Numbers (decimal, float, double, …)
§  Strings (string, name, uri, …)
§  Date/time (date, time, timestamp, …)
§  Geospatial (points)

§  But maybe the value has the wrong type
§  Not a real date <date>2001/755</date> ⇒ (nothing)

§  Maybe there is an empty value
§ 
§  Maybe there is no value at all
§ 

There is no function here.

⇒ (nothing)

Slide 20

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


Mapping Unstructured Data
Indexes can select specific elements or attributes
§  Named element

§  <function>myServiceProxy</function>
⇒ "myServiceProxy"

§  Named attribute of an element
§  ⇒ "org.codehouse.grails.user"

§  Multiple values OK
§  <function>myServiceProxy</function> and
<function>myServiceStub</function>
⇒ "myServiceProxy","myServiceStub"

Slide 21

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


Mapping Unstructured Data
Indexes can select concatenated values
§  Concatenated values of specific elements
§  <caption>A <b>good</b> example</caption>
⇒ "A good example"

§  Maybe some children should be skipped
§  <caption>A good example<footnote>Please exclude
this text</footnote></caption>
⇒ "A good example"

§  Conditional inclusion/exclusion based on attribute values
§ 

This one.

Not that one.


p>
⇒ "This one."

Slide 22

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


Mapping Unstructured Data
Indexes can select complex paths
§  Selecting a targeted structural relationship: section/title
§  <chapter><title>Chapter Onetitle><section><title>Section A</title>...
⇒ "Section A"

§  Selecting multiple related elements: /info/(person|org)
§  John Smith</person> is at <org>IBM</org>
⇒ "John Smith","IBM"

§  Selecting based on comparison predicates: person[@age>18]
§  John Smith</person> and his son
Jack Smith</person>
⇒ "John Smith"

Slide 23

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


Mapping Unstructured Data

Indexes can select correlated parts (geospatial)
§  Paired attributes
§ 
⇒ point(0.1,24.24)

§  Paired elements
§  <long>-110.44</long><lat>39.2</lat></pos>
⇒ point(39.2,-110.44)

Slide 24

Copyright © 2012 MarkLogic® Corporation. All rights reserved.


Indexes
Map document ids to values, and values to document ids
in a compact distributed in-memory representation
DOC ID

Slide 25

YEAR

YEAR

DOC ID

1

2009


2002

3

3

2002

2003

10

4

2007

2004

5

5

2004

2004

11

8


2011

2007

2

10

2003

2007

4

11

2004

2007

17

17

2007

2011

8






...



Copyright © 2012 MarkLogic® Corporation. All rights reserved.


×