Tải bản đầy đủ (.pdf) (339 trang)

Entity information life cycle for big data master data management and information integration

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.3 MB, 339 trang )

www.allitebooks.com


www.allitebooks.com


EntityInformationLifeCycleforBig
Data
MasterDataManagementand
InformationIntegration
JohnR.Talburt
YinleZhou

www.allitebooks.com


www.allitebooks.com


TableofContents
Coverimage
Titlepage
Copyright
Foreword
Preface
Acknowledgements
Chapter1.TheValuePropositionforMDMandBigData
DefinitionandComponentsofMDM
TheBusinessCaseforMDM
DimensionsofMDM
TheChallengeofBigData


MDMandBigData–TheN-SquaredProblem
ConcludingRemarks

Chapter2.EntityIdentityInformationandtheCSRUDLifeCycleModel
EntitiesandEntityReferences
ManagingEntityIdentityInformation
EntityIdentityInformationLifeCycleManagementModels
ConcludingRemarks

Chapter3.ADeepDiveintotheCapturePhase
AnOverviewoftheCapturePhase
BuildingtheFoundation
UnderstandingtheData
DataPreparation
SelectingIdentityAttributes
AssessingERResults
DataMatchingStrategies
ConcludingRemarks

Chapter4.StoreandShare–EntityIdentityStructures
EntityIdentityInformationManagementStrategies
DedicatedMDMSystems
TheIdentityKnowledgeBase
MDMArchitectures
ConcludingRemarks

Chapter5.UpdateandDisposePhases–OngoingDataStewardship
www.allitebooks.com



TheAutomatedUpdateProcess
TheManualUpdateProcess
AssertedResolution
EISVisualizationTools
ManagingEntityIdentifiers
ConcludingRemarks

Chapter6.ResolveandRetrievePhase–IdentityResolution
IdentityResolution
IdentityResolutionAccessModes
ConfidenceScores
ConcludingRemarks

Chapter7.TheoreticalFoundations
TheFellegi-SunterTheoryofRecordLinkage
TheStanfordEntityResolutionFramework
EntityIdentityInformationManagement
ConcludingRemarks

Chapter8.TheNutsandBoltsofEntityResolution
TheERChecklist
Cluster-to-ClusterClassification
SelectinganAppropriateAlgorithm
ConcludingRemarks

Chapter9.Blocking
Blocking
BlockingbyMatchKey
DynamicBlockingversusPreresolutionBlocking
BlockingPrecisionandRecall

MatchKeyBlockingforBooleanRules
MatchKeyBlockingforScoringRules
ConcludingRemarks

Chapter10.CSRUDforBigData
Large-ScaleERforMDM
TheTransitiveClosureProblem
Distributed,Multiple-Index,Record-BasedResolution
AnIterative,NonrecursiveAlgorithmforTransitiveClosure
IterationPhase:SuccessiveClosurebyReferenceIdentifier
DeduplicationPhase:FinalOutputofComponents
ERUsingtheNullRule
TheCapturePhaseandIKB
TheIdentityUpdateProblem

www.allitebooks.com


TheLargeComponentandBigEntityProblems
IdentityCaptureandUpdateforAttribute-BasedResolution
ConcludingRemarks

Chapter11.ISODataQualityStandardsforMasterData
Background
GoalsandScopeoftheISO8000-110Standard
FourMajorComponentsoftheISO8000-110Standard
SimpleandStrongCompliancewithISO8000-110
ISO22745IndustrialSystemsandIntegration
BeyondISO8000-110
ConcludingRemarks


AppendixA.SomeCommonlyUsedERComparators
References
Index

www.allitebooks.com


www.allitebooks.com


Copyright
AcquiringEditor:SteveElliot
EditorialProjectManager:AmyInvernizzi
ProjectManager:PriyaKumaraguruparan
CoverDesigner:MatthewLimbert
MorganKaufmannisanimprintofElsevier
225WymanStreet,Waltham,MA02451,USA
Copyright©2015ElsevierInc.Allrightsreserved.
Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,
electronicormechanical,includingphotocopying,recording,oranyinformationstorage
andretrievalsystem,withoutpermissioninwritingfromthepublisher.Detailsonhowto
seekpermission,furtherinformationaboutthePublisher’spermissionspoliciesandour
arrangementswithorganizationssuchastheCopyrightClearanceCenterandthe
CopyrightLicensingAgency,canbefoundatourwebsite:www.elsevier.com/permissions.
Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightby
thePublisher(otherthanasmaybenotedherein).


Notices

Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchand
experiencebroadenourunderstanding,changesinresearchmethods,professional
practices,ormedicaltreatmentmaybecomenecessary.
Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledge
inevaluatingandusinganyinformation,methods,compounds,orexperimentsdescribed
herein.Inusingsuchinformationormethodstheyshouldbemindfuloftheirownsafety
andthesafetyofothers,includingpartiesforwhomtheyhaveaprofessional
responsibility.
Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,or
editors,assumeanyinjuryand/ordamagetopersonsorpropertyasamatterofproducts
liability,negligenceorotherwise,orfromanyuseoroperationofanymethods,
products,instructions,orideascontainedinthematerialherein.
ISBN:978-0-12-800537-8
BritishLibraryCataloguinginPublicationData
AcataloguerecordforthisbookisavailablefromtheBritishLibrary

www.allitebooks.com


AcatalogrecordforthisbookisavailablefromtheLibraryofCongress


ForinformationonallMKpublicationsvisitourwebsiteatwww.mkp.com

www.allitebooks.com



Foreword
InJulyof2015theMassachusettsInstituteofTechnology(MIT)willcelebratethe20th

anniversaryoftheInternationalConferenceonInformationQuality.Myjourneyto
informationanddataqualityhashadmanytwistsandturns,butIhavealwaysfoundit
interestingandrewarding.Formethemostrewardingpartofthejourneyhasbeenthe
chancetomeetandworkwithotherswhosharemypassionforthistopic.IfirstmetJohn
Talburtin2002whenhewasworkingintheDataProductsDivisionofAcxiom
Corporation,adatamanagementcompanywithglobaloperations.Johnhadbeentasked
byleadershiptoanswerthequestion,“Whatisourdataquality?”Lookingforhelponthe
InternethefoundtheMITInformationQualityProgramandcontactedme.Mybook
QualityInformationandKnowledge(Huang,Lee,&Wang,1999)hadrecentlybeen
published.JohninvitedmetoAcxiomheadquarters,atthattimeinConway,Arkansas,to
giveaone-dayworkshoponinformationqualitytotheAcxiomLeadershipteam.
ThiswasthebeginningofJohn’sjourneytodataquality,andwehavebeentraveling
togetheronthatjourneyeversince.AfterIhelpedhimleadAcxiom’sefforttoimplement
aTotalDataQualityManagementprogram,heinturnhelpedmetorealizeoneofmy
long-timegoalsofseeingaU.S.universitystartadegreeprogramininformationquality.
ThroughthelargessofAcxiomCorporation,ledatthattimebyCharlesMorganandthe
academicentrepreneurshipofDr.MaryGood,FoundingDeanoftheEngineeringand
InformationTechnologyCollegeattheUniversityofArkansasatLittleRock,theworld’s
firstgraduatedegreeprogramininformationqualitywasestablishedin2006.Johnhas
beenleadingthisprogramatUALReversince.InitiallycreatedaroundaMasterof
ScienceinInformationQuality(MSIQ)degree(Leeetal.,2007),ithassinceexpandedto
includeaGraduateCertificateinIQandanIQPhDdegree.Asofthiswritingtheprogram
hasgraduatedmorethan100students.
Thesecondpartofthisstorybeganin2008.Inthatyear,YinleZhou,ane-commerce
graduatefromNanjingUniversityinChina,cametotheU.S.andwasadmittedtothe
UALRMSIQprogram.AfterfinishingherMSdegree,sheenteredtheIQPhDprogram
withJohnasherresearchadvisor.Togethertheydevelopedamodelforentityidentity
informationmanagement(EIIM)thatextendsentityresolutioninsupportofmasterdata
management(MDM),theprimaryfocusofthisbook.Dr.ZhouisnowaSoftware
EngineerandDataScientistforIBMInfoSphereMDMDevelopmentinAustin,Texas,

andanAdjunctAssistantProfessorofElectricalandComputerEngineeringatthe
UniversityofTexasatAustin.Andsothetorchwaspassedandanotherjourneybegan.
Ihavealsobeenfascinatedtoseehowthelandscapeofinformationtechnologyhas
changedoverthepast20years.DuringthattimeIThasexperiencedadramaticshiftin
focus.Inexpensive,large-scalestorageandprocessorshavechangedthefaceofIT.
Organizationsareexploitingcloudcomputing,software-as-a-service,andopensource
software,asalternativestobuildingandmaintainingtheirowndatacentersand
developingcustomsolutions.Allofthesetrendsarecontributingtothecommoditization


technology.Atthesametime,moreandmoredataarebeingproducedandretained,from
structuredoperationaldatatounstructured,user-generateddatafromsocialmedia.
Togetherthesefactorsareproducingmanynewchallengesfordatamanagement,and
especiallyformasterdatamanagement.
Thecomplexityofthenewdata-drivenenvironmentcanbeoverwhelming.Howtodeal
withdatagovernanceandpolicy,dataprivacyandsecurity,dataquality,MDM,RDM,
informationriskmanagement,regulatorycompliance,andthelistgoeson.JustasJohn
andYinlestartedtheirjourneysasindividuals,nowweseethatentireorganizationsare
embarkingonjourneystodataandinformationquality.Thedifferenceisthatan
organizationneedsaleadertosetthecourse,andIstronglybelievethisleadershouldbe
theChiefDataOfficer(CDO).
TheCDOisagrowingroleinmodernorganizationstoleadtheircompany’sjourneyto
strategicallyusedataforregulatorycompliance,performanceoptimization,and
competitiveadvantage.TheMITCDOForumrecognizestheemergingcriticalityofthe
CDO’sroleandhasdevelopedaseriesofeventswhereleaderscomeforbidirectional
sharingandcollaborationtoaccelerateidentificationandestablishmentofbestpracticesin
strategicdatamanagement.
IandothershavebeenconductingtheMITLongitudinalStudyontheChiefData
OfficerandhostingeventsforseniorexecutivestoadvanceCDOresearchandpractice.
Wehavepublishedresearchresultsinleadingacademicjournals,aswellasthe

proceedingsoftheMITCDOForum,MITCDOIQSymposium,andtheInternational
ConferenceonInformationQuality(ICIQ).Forexample,wehavedevelopedathreedimensionalcubicframeworktodescribetheemergingroleoftheChiefDataOfficerin
thecontextofBigData(Leeetal.,2014).
IbelievethatCDOs,MDMarchitectsandadministrators,andanyoneinvolvedwith
datagovernanceandinformationqualitywillfindthisbookuseful.MDMisnow
consideredanintegralcomponentofadatagovernanceprogram.Thematerialpresented
hereclearlylaysoutthebusinesscaseforMDMandaplantoimprovethequalityand
performanceofMDMsystemsthrougheffectiveentityinformationlifecycle
management.Itnotonlyexplainsthetechnicalaspectsofthelifecycle,italsoprovides
guidanceontheoftenoverlookedtasksofMDMqualitymetricsandanalyticsandMDM
stewardship.
RichardWang,MITChiefDataOfficerandInformationQualityProgram



Preface
TheChangingLandscapeofInformation
Quality
SincethepublicationofEntityResolutionandInformationQuality(MorganKaufmann,
2011),alothasbeenhappeninginthefieldofinformationanddataquality.Oneofthe
mostimportantdevelopmentsishoworganizationsarebeginningtounderstandthatthe
datatheyholdareamongtheirmostimportantassetsandshouldbemanagedaccordingly.
Asmanyofusknow,thisisbynomeansanewmessage,onlythatitisjustnowbeing
heeded.LeadingexpertsininformationanddataqualitysuchasRichWang,YangLee,
TomRedman,LarryEnglish,DanetteMcGilvray,DavidLoshin,LauraSebastianColeman,RajeshJugulum,SunilSoares,ArkadyMaydanchik,andmanyothershavebeen
advocatingthisprincipleformanyyears.
Evidenceofthisnewunderstandingcanbefoundinthedramaticsurgeoftheadoption
ofdatagovernance(DG)programsbyorganizationsofalltypesandsizes.Conferences,
workshops,andwebinarsonthistopicareoverflowingwithattendees.Theprimaryreason
isthatDGprovidesorganizationswithananswertothequestion,“Ifinformationisreally

animportantorganizationalasset,thenhowcanitbemanagedattheenterpriselevel?”
OneoftheprimarybenefitsofaDGprogramisthatitprovidesaframeworkfor
implementingacentralpointofcommunicationandcontroloverallofanorganization’s
dataandinformation.
AsDGhasgrownandmatured,itsessentialcomponentsbecomemoreclearlydefined.
Thesecomponentsgenerallyincludecentralrepositoriesfordatadefinitions,business
rules,metadata,data-relatedissuetracking,regulationsandcompliance,anddataquality
rules.TwootherkeycomponentsofDGaremasterdatamanagement(MDM)and
referencedatamanagement(RDM).Consequently,theincreasingadoptionofDG
programshasbroughtacommensurateincreaseinfocusontheimportanceofMDM.
CertainlythisisnotthefirstbookonMDM.SeveralexcellentbooksincludeMaster
DataManagementandDataGovernancebyAlexBersonandLarryDubov(2011),
MasterDataManagementinPracticebyDaltonCervoandMarkAllen(2011),Master
DataManagementbyDavidLoshin(2009),EnterpriseMasterDataManagementby
AllenDreibelbis,EberhardHechler,IvanMilman,MartinOberhofer,PaulvanRun,and
DanWolfson(2008),andCustomerDataIntegrationbyJillDychéandEvanLevy(2006).
However,MDMisanextensiveandevolvingtopic.Nosinglebookcanexploreevery
aspectofMDMateverylevel.


MotivationforThisBook
Numerousthingshavemotivatedustocontributeyetanotherbook.However,theprimary
reasonisthis.Basedonourexperienceinbothacademiaandindustry,webelievethat
manyoftheproblemsthatorganizationsexperiencewithMDMimplementationand
operationarerootedinthefailuretounderstandandaddresscertaincriticalaspectsof
entityidentityinformationmanagement(EIIM).EIIMisanextensionofentityresolution
(ER)withthegoalofachievingandmaintainingthehighestlevelofaccuracyintheMDM
system.Twokeytermsare“achieving”and“maintaining.”
Havingagoalanddefinedrequirementsisthestartingpointforeveryinformationand
dataqualitymethodologyfromtheMITTDQM(TotalDataQualityManagement)tothe

Six-SigmaDMAIC(Define,Measure,Analyze,Improve,andControl).Unfortunately,
whenitcomestoMDM,manyorganizationshavenotdefinedanygoals.Consequently
theseorganizationsdon’thaveawaytoknowiftheyhaveachievedtheirgoal.Theyleave
manyquestionsunanswered.Whatisouraccuracy?Nowthataproposedprogrammingor
procedurehasbeenimplemented,isthesystemperformingbetterorworsethanbefore?
FewMDMadministratorscanprovideaccurateestimatesofeventhemostbasicmetrics
suchasfalsepositiveandfalsenegativeratesortheoverallaccuracyoftheirsystem.In
thisbookwehaveemphasizedtheimportanceofobjectiveandsystematicmeasurement
andprovidedpracticalguidanceonhowthesemeasurementscanbemade.
Tohelporganizationsbetteraddressthemaintainingofhighlevelsofaccuracythrough
EIIM,themajorityofthematerialinthebookisdevotedtoexplainingtheCSRUDfivephaseentityinformationlifecyclemodel.CSRUDisanacronymforcapture,storeand
share,resolveandretrieve,update,anddispose.Webelievethatfollowingthismodelcan
helpanyorganizationimproveMDMaccuracyandperformance.
Finally,nomoderndayITbookcanbecompletewithouttalkingaboutBigData.
Seeminglyrisingupovernight,BigDatahascapturedeveryone’sattention,notjustinIT,
buteventhemanonthestreet.JustasDGseemstobegettingupagoodheadofsteam,it
nowhastodealwiththeBigDataphenomenon.TheimmediatequestioniswhetherBig
DatasimplyfitsrightintothecurrentDGmodel,orwhethertheDGmodelneedstobe
revisedtoaccountforBigData.
Regardlessofone’sopiniononthistopic,onethingisclear–BigDataisbadnewsfor
MDM.Thereasonisasimplemathematicalfact:MDMreliesonentityresolution,and
entityresolutionprimarilyreliesonpair-wiserecordmatching,andthenumberofpairsof
recordstomatchincreasesasthesquareofthenumberofrecords.Forthisreason,
ordinarydata(millionsofrecords)isalreadyachallengeforMDM,soBigData(billions
ofrecords)seemsalmostinsurmountable.Fortunately,BigDataisnotjustmatterofmore
data;itisalsousheringinanewparadigmformanagingandprocessinglargeamountsof
data.BigDataisbringingwithitnewtoolsandtechniques.Perhapsthemostimportant
techniqueishowtoexploitdistributedprocessing.However,itiseasiertotalkaboutBig
Datathantodosomethingaboutit.Wewantedtoavoidthatandincludeinourbooksome
practicalstrategiesanddesignsforusingdistributedprocessingtosolvesomeofthese




Audience
ItisourhopethatbothITprofessionalsandbusinessprofessionalsinterestedinMDMand
BigDataissueswillfindthisbookhelpful.Mostofthematerialfocusesonissuesof
designandarchitecture,makingitaresourceforanyoneevaluatinganinstalledsystem,
comparingproposedthird-partysystems,orforanorganizationcontemplatingbuildingits
ownsystem.Wealsobelievethatitiswrittenatalevelappropriateforauniversity
textbook.


OrganizationoftheMaterial
Chapters1and2providethebackgroundandcontextofthebook.Chapter1providesa
definitionandoverviewofMDM.Itincludesthebusinesscase,dimensions,and
challengesfacingMDMandalsostartsthediscussionofBigDataanditsimpacton
MDM.Chapter2definesandexplainsthetwoprimarytechnologiesthatsupportMDM–
ERandEIIM.Inaddition,Chapter2introducestheCSRUDLifeCycleforentityidentity
information.Thissetsthestageforthenextfourchapters.
Chapters3,4,5,and6aredevotedtoanin-depthdiscussionoftheCSRUDlifecycle
model.Chapter3isanin-depthlookattheCapturePhaseofCSRUD.Aspartofthe
discussion,italsocoversthetechniquesoftruthsetbuilding,benchmarking,andproblem
setsastoolsforassessingentityresolutionandMDMoutcomes.Inaddition,itdiscusses
someoftheprosandconsofthetwomostcommonlyuseddatamatchingtechniques–
deterministicmatchingandprobabilisticmatching.
Chapter4explainstheStoreandSharePhaseofCSRUD.Thischapterintroducesthe
conceptofanentityidentitystructure(EIS)thatformsthebuildingblocksoftheidentity
knowledgebase(IKB).InadditiontodiscussingdifferentstylesofEISdesigns,italso
includesadiscussionofthedifferenttypesofMDMarchitectures.
Chapter5coverstwocloselyrelatedCSRUDphases,theUpdatePhaseandtheDispose

Phase.TheUpdatePhasediscussioncoversbothautomatedandmanualupdateprocesses
andthecriticalrolesplayedbyclericalreviewindicators,correctionassertions,and
confirmationassertions.Chapter5alsopresentsanexampleofanidentityvisualization
systemthatassistsMDMdatastewardswiththereviewandassertionprocess.
Chapter6coverstheResolveandRetrievePhaseofCSRUD.Italsodiscussessome
designconsiderationsforaccessingidentityinformation,andasimplemodelfora
retrievedidentifierconfidencescore.
Chapter7introducestwoofthemostimportanttheoreticalmodelsforER,theFellegiSunterTheoryofRecordLinkageandtheStanfordEntityResolutionFrameworkorSERF
Model.Chapter7isinsertedherebecausesomeoftheconceptsintroducedintheSERF
ModelareusedinChapter8,“TheNutsandBoltsofER.”Thechapterconcludeswitha
discussionofhowEIIMrelatestoeachofthesemodels.
Chapter8describesadeeperlevelofdesignconsiderationsforERandEIIMsystems.It
discussesindetailthethreelevelsofmatchinginanEIIMsystem:attribute-level,
reference-level,andcluster-levelmatching.
Chapter9coversthetechniqueofblockingasawaytoincreasetheperformanceofER
andMDMsystems.Itfocusesonmatchkeyblocking,thedefinitionofmatch-key-tomatch-rulealignment,andtheprecisionandrecallofmatchkeys.Preresolutionblocking
andtransitiveclosureofmatchkeysarediscussedasapreludetoChapter10.
Chapter10discussestheproblemsinimplementingtheCSRUDLifeCycleforBig
Data.ItgivesexamplesofhowtheHadoopMap/Reduceframeworkcanbeusedto
addressmanyoftheseproblemsusingadistributedcomputingenvironment.


standardisnotwellunderstoodoutsideofafewindustryverticals,butithaspotential
implicationsforallindustries.Thischaptercoversthebasicrequirementsofthestandard
andhoworganizationscanbecomeISO8000compliant,andperhapsmoreimportantly,
whyorganizationswouldwanttobecompliant.
Finally,toreduceERdiscussionsinChapters3and8,AppendixAgoesintomore
detailonsomeofthemorecommondatacomparisonalgorithms.
Thisbookalsoincludesawebsitewithexercises,tipsandfreedownloadsof
demonstrationsthatuseatrialversionoftheHiPEREIMsystemforhands-onlearning.

Thewebsiteincludescontrolscriptsandsyntheticinputdatatoillustratehowthesystem
handlesvariousaspectsoftheCSRUDlifecyclesuchasidentitycapture,identityupdate,
andassertions.Youcanaccessthewebsitehere:
/>
www.allitebooks.com



Acknowledgements
Thisbookwouldnothavebeenpossiblewithoutthehelpofmanypeopleand
organizations.Firstofall,YinleandIwouldliketothankDr.RichWang,Directorofthe
MITInformationQualityProgram,forstartingusonourjourneytodataqualityandfor
writingtheforewordforourbook,andDr.ScottSchumacher,DistinguishedEngineerat
IBM,forhissupportofourresearchandcollaboration.Wewouldalsoliketothankour
employers,IBMCorporation,UniversityofArkansasatLittleRock,andBlackOak
Analytics,Inc.,fortheirsupportandencouragementduringitswriting.
IthasbeenaprivilegetobeapartoftheUALRInformationQualityProgramandto
workwithsomanytalentedstudentsandgiftedfacultymembers.Iwouldespeciallylike
toacknowledgeseveralofmycurrentstudentsfortheircontributionstothiswork.These
includeFumikoKobayashi,identityresolutionmodelsandconfidencescoresinChapter6;
ChengChen,EISvisualizationtoolsandconfirmationassertionsinChapter5andHadoop
map/reduceinChapter10;DanielPullen,clericalreviewindicatorsinChapter5and
Hadoopmap/reduceinChapter10;PeiWang,blockingforscoringrulesinChapter9,
Hadoopmap/reduceinChapter10,andthedemonstrationdata,scripts,andexerciseson
thebook’swebsite;DebanjanMahata,EIIMforunstructureddatainChapter1;Melody
Penning,entity-baseddataintegrationinChapter1;andReedPetty,IKBstructurefor
HDFSinChapter10.InadditionIwouldliketothankmyformerstudentDr.EricNelson
forintroducingthenullruleconceptandforsharinghisexpertiseinHadoopmap/reduce
inChapter10.SpecialthanksgotoDr.LauraSebastian-Coleman,DataQualityLeaderat
Cigna,andJoshuaJohnson,UALRTechnicalWritingProgram,fortheirhelpinediting

andproofreading.FinallyIwanttothankmyteachingassistants,FumikoKobayashi,
KhizerSyed,MichaelGreer,PeiWang,andDanielPullen,andmyadministrative
assistant,NihalErian,forgivingmetheextratimeIneededtocompletethiswork.
Iwouldalsoliketotakethisopportunitytoacknowledgeseveralorganizationsthat
havesupportedmyworkformanyyears.AcxiomCorporationunderCharlesMorganwas
oneofthefoundersoftheUALRIQprogramandcontinuestosupporttheprogramunder
ScottHowe,thecurrentCEO,andAllisonNicholas,DirectorofCollegeRecruitingand
UniversityRelations.IamgratefulformyexperienceatAcxiomandtheopportunityto
learnaboutBigDataentityresolutioninadistributedcomputingenvironmentfromDr.
TerryTalleyandthemanyotherworld-classdataexpertswhoworkthere.
TheArkansasResearchCenterunderthedirectionofDr.NealGibsonandDr.Greg
HollandwerethefirsttosupportmyworkontheOYSTERopensourceentityresolution
system.TheArkansasDepartmentofEducation–inparticularformerAssistant
CommissionerJimBoardmanandhissuccessor,Dr.CodyDecker,alongwithArijit
SarkarintheITServicesDivision–gavemetheopportunitytobuildastudentMDM
systemthatimplementsthefullCSRUDlifecycleasdescribedinthisbook.
TheTranslationalResearchInstitute(TRI)attheUniversityofArkansasforMedical
Scienceshasgivenmeandseveralofmystudentstheopportunityforhands-onexperience


Hogan,theformerDirectorofTRIforteachingmeaboutreferenttracking,andalsoDr.
UmitTopalogluthecurrentDirectorofInformaticsatTRIwhoalongwithDr.Mathias
Brochhausencontinuesthiscollaboration.
LastbutnotleastaremybusinesspartnersatBlackOakAnalytics.OurCEO,Rick
McGraw,hasbeenatrustedfriendandbusinessadvisorformanyyears.BecauseofRick
andourCOO,JonathanAskins,whatwasonlyavisionhasbecomeareality.
JohnR.Talburt,andYinleZhou


CHAPTER1



TheValuePropositionforMDMandBig
Data


×