www.allitebooks.com
www.allitebooks.com
EntityInformationLifeCycleforBig
Data
MasterDataManagementand
InformationIntegration
JohnR.Talburt
YinleZhou
www.allitebooks.com
www.allitebooks.com
TableofContents
Coverimage
Titlepage
Copyright
Foreword
Preface
Acknowledgements
Chapter1.TheValuePropositionforMDMandBigData
DefinitionandComponentsofMDM
TheBusinessCaseforMDM
DimensionsofMDM
TheChallengeofBigData
MDMandBigData–TheN-SquaredProblem
ConcludingRemarks
Chapter2.EntityIdentityInformationandtheCSRUDLifeCycleModel
EntitiesandEntityReferences
ManagingEntityIdentityInformation
EntityIdentityInformationLifeCycleManagementModels
ConcludingRemarks
Chapter3.ADeepDiveintotheCapturePhase
AnOverviewoftheCapturePhase
BuildingtheFoundation
UnderstandingtheData
DataPreparation
SelectingIdentityAttributes
AssessingERResults
DataMatchingStrategies
ConcludingRemarks
Chapter4.StoreandShare–EntityIdentityStructures
EntityIdentityInformationManagementStrategies
DedicatedMDMSystems
TheIdentityKnowledgeBase
MDMArchitectures
ConcludingRemarks
Chapter5.UpdateandDisposePhases–OngoingDataStewardship
www.allitebooks.com
TheAutomatedUpdateProcess
TheManualUpdateProcess
AssertedResolution
EISVisualizationTools
ManagingEntityIdentifiers
ConcludingRemarks
Chapter6.ResolveandRetrievePhase–IdentityResolution
IdentityResolution
IdentityResolutionAccessModes
ConfidenceScores
ConcludingRemarks
Chapter7.TheoreticalFoundations
TheFellegi-SunterTheoryofRecordLinkage
TheStanfordEntityResolutionFramework
EntityIdentityInformationManagement
ConcludingRemarks
Chapter8.TheNutsandBoltsofEntityResolution
TheERChecklist
Cluster-to-ClusterClassification
SelectinganAppropriateAlgorithm
ConcludingRemarks
Chapter9.Blocking
Blocking
BlockingbyMatchKey
DynamicBlockingversusPreresolutionBlocking
BlockingPrecisionandRecall
MatchKeyBlockingforBooleanRules
MatchKeyBlockingforScoringRules
ConcludingRemarks
Chapter10.CSRUDforBigData
Large-ScaleERforMDM
TheTransitiveClosureProblem
Distributed,Multiple-Index,Record-BasedResolution
AnIterative,NonrecursiveAlgorithmforTransitiveClosure
IterationPhase:SuccessiveClosurebyReferenceIdentifier
DeduplicationPhase:FinalOutputofComponents
ERUsingtheNullRule
TheCapturePhaseandIKB
TheIdentityUpdateProblem
www.allitebooks.com
TheLargeComponentandBigEntityProblems
IdentityCaptureandUpdateforAttribute-BasedResolution
ConcludingRemarks
Chapter11.ISODataQualityStandardsforMasterData
Background
GoalsandScopeoftheISO8000-110Standard
FourMajorComponentsoftheISO8000-110Standard
SimpleandStrongCompliancewithISO8000-110
ISO22745IndustrialSystemsandIntegration
BeyondISO8000-110
ConcludingRemarks
AppendixA.SomeCommonlyUsedERComparators
References
Index
www.allitebooks.com
www.allitebooks.com
Copyright
AcquiringEditor:SteveElliot
EditorialProjectManager:AmyInvernizzi
ProjectManager:PriyaKumaraguruparan
CoverDesigner:MatthewLimbert
MorganKaufmannisanimprintofElsevier
225WymanStreet,Waltham,MA02451,USA
Copyright©2015ElsevierInc.Allrightsreserved.
Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,
electronicormechanical,includingphotocopying,recording,oranyinformationstorage
andretrievalsystem,withoutpermissioninwritingfromthepublisher.Detailsonhowto
seekpermission,furtherinformationaboutthePublisher’spermissionspoliciesandour
arrangementswithorganizationssuchastheCopyrightClearanceCenterandthe
CopyrightLicensingAgency,canbefoundatourwebsite:www.elsevier.com/permissions.
Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightby
thePublisher(otherthanasmaybenotedherein).
Notices
Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchand
experiencebroadenourunderstanding,changesinresearchmethods,professional
practices,ormedicaltreatmentmaybecomenecessary.
Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledge
inevaluatingandusinganyinformation,methods,compounds,orexperimentsdescribed
herein.Inusingsuchinformationormethodstheyshouldbemindfuloftheirownsafety
andthesafetyofothers,includingpartiesforwhomtheyhaveaprofessional
responsibility.
Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,or
editors,assumeanyinjuryand/ordamagetopersonsorpropertyasamatterofproducts
liability,negligenceorotherwise,orfromanyuseoroperationofanymethods,
products,instructions,orideascontainedinthematerialherein.
ISBN:978-0-12-800537-8
BritishLibraryCataloguinginPublicationData
AcataloguerecordforthisbookisavailablefromtheBritishLibrary
www.allitebooks.com
AcatalogrecordforthisbookisavailablefromtheLibraryofCongress
ForinformationonallMKpublicationsvisitourwebsiteatwww.mkp.com
www.allitebooks.com
Foreword
InJulyof2015theMassachusettsInstituteofTechnology(MIT)willcelebratethe20th
anniversaryoftheInternationalConferenceonInformationQuality.Myjourneyto
informationanddataqualityhashadmanytwistsandturns,butIhavealwaysfoundit
interestingandrewarding.Formethemostrewardingpartofthejourneyhasbeenthe
chancetomeetandworkwithotherswhosharemypassionforthistopic.IfirstmetJohn
Talburtin2002whenhewasworkingintheDataProductsDivisionofAcxiom
Corporation,adatamanagementcompanywithglobaloperations.Johnhadbeentasked
byleadershiptoanswerthequestion,“Whatisourdataquality?”Lookingforhelponthe
InternethefoundtheMITInformationQualityProgramandcontactedme.Mybook
QualityInformationandKnowledge(Huang,Lee,&Wang,1999)hadrecentlybeen
published.JohninvitedmetoAcxiomheadquarters,atthattimeinConway,Arkansas,to
giveaone-dayworkshoponinformationqualitytotheAcxiomLeadershipteam.
ThiswasthebeginningofJohn’sjourneytodataquality,andwehavebeentraveling
togetheronthatjourneyeversince.AfterIhelpedhimleadAcxiom’sefforttoimplement
aTotalDataQualityManagementprogram,heinturnhelpedmetorealizeoneofmy
long-timegoalsofseeingaU.S.universitystartadegreeprogramininformationquality.
ThroughthelargessofAcxiomCorporation,ledatthattimebyCharlesMorganandthe
academicentrepreneurshipofDr.MaryGood,FoundingDeanoftheEngineeringand
InformationTechnologyCollegeattheUniversityofArkansasatLittleRock,theworld’s
firstgraduatedegreeprogramininformationqualitywasestablishedin2006.Johnhas
beenleadingthisprogramatUALReversince.InitiallycreatedaroundaMasterof
ScienceinInformationQuality(MSIQ)degree(Leeetal.,2007),ithassinceexpandedto
includeaGraduateCertificateinIQandanIQPhDdegree.Asofthiswritingtheprogram
hasgraduatedmorethan100students.
Thesecondpartofthisstorybeganin2008.Inthatyear,YinleZhou,ane-commerce
graduatefromNanjingUniversityinChina,cametotheU.S.andwasadmittedtothe
UALRMSIQprogram.AfterfinishingherMSdegree,sheenteredtheIQPhDprogram
withJohnasherresearchadvisor.Togethertheydevelopedamodelforentityidentity
informationmanagement(EIIM)thatextendsentityresolutioninsupportofmasterdata
management(MDM),theprimaryfocusofthisbook.Dr.ZhouisnowaSoftware
EngineerandDataScientistforIBMInfoSphereMDMDevelopmentinAustin,Texas,
andanAdjunctAssistantProfessorofElectricalandComputerEngineeringatthe
UniversityofTexasatAustin.Andsothetorchwaspassedandanotherjourneybegan.
Ihavealsobeenfascinatedtoseehowthelandscapeofinformationtechnologyhas
changedoverthepast20years.DuringthattimeIThasexperiencedadramaticshiftin
focus.Inexpensive,large-scalestorageandprocessorshavechangedthefaceofIT.
Organizationsareexploitingcloudcomputing,software-as-a-service,andopensource
software,asalternativestobuildingandmaintainingtheirowndatacentersand
developingcustomsolutions.Allofthesetrendsarecontributingtothecommoditization
technology.Atthesametime,moreandmoredataarebeingproducedandretained,from
structuredoperationaldatatounstructured,user-generateddatafromsocialmedia.
Togetherthesefactorsareproducingmanynewchallengesfordatamanagement,and
especiallyformasterdatamanagement.
Thecomplexityofthenewdata-drivenenvironmentcanbeoverwhelming.Howtodeal
withdatagovernanceandpolicy,dataprivacyandsecurity,dataquality,MDM,RDM,
informationriskmanagement,regulatorycompliance,andthelistgoeson.JustasJohn
andYinlestartedtheirjourneysasindividuals,nowweseethatentireorganizationsare
embarkingonjourneystodataandinformationquality.Thedifferenceisthatan
organizationneedsaleadertosetthecourse,andIstronglybelievethisleadershouldbe
theChiefDataOfficer(CDO).
TheCDOisagrowingroleinmodernorganizationstoleadtheircompany’sjourneyto
strategicallyusedataforregulatorycompliance,performanceoptimization,and
competitiveadvantage.TheMITCDOForumrecognizestheemergingcriticalityofthe
CDO’sroleandhasdevelopedaseriesofeventswhereleaderscomeforbidirectional
sharingandcollaborationtoaccelerateidentificationandestablishmentofbestpracticesin
strategicdatamanagement.
IandothershavebeenconductingtheMITLongitudinalStudyontheChiefData
OfficerandhostingeventsforseniorexecutivestoadvanceCDOresearchandpractice.
Wehavepublishedresearchresultsinleadingacademicjournals,aswellasthe
proceedingsoftheMITCDOForum,MITCDOIQSymposium,andtheInternational
ConferenceonInformationQuality(ICIQ).Forexample,wehavedevelopedathreedimensionalcubicframeworktodescribetheemergingroleoftheChiefDataOfficerin
thecontextofBigData(Leeetal.,2014).
IbelievethatCDOs,MDMarchitectsandadministrators,andanyoneinvolvedwith
datagovernanceandinformationqualitywillfindthisbookuseful.MDMisnow
consideredanintegralcomponentofadatagovernanceprogram.Thematerialpresented
hereclearlylaysoutthebusinesscaseforMDMandaplantoimprovethequalityand
performanceofMDMsystemsthrougheffectiveentityinformationlifecycle
management.Itnotonlyexplainsthetechnicalaspectsofthelifecycle,italsoprovides
guidanceontheoftenoverlookedtasksofMDMqualitymetricsandanalyticsandMDM
stewardship.
RichardWang,MITChiefDataOfficerandInformationQualityProgram
Preface
TheChangingLandscapeofInformation
Quality
SincethepublicationofEntityResolutionandInformationQuality(MorganKaufmann,
2011),alothasbeenhappeninginthefieldofinformationanddataquality.Oneofthe
mostimportantdevelopmentsishoworganizationsarebeginningtounderstandthatthe
datatheyholdareamongtheirmostimportantassetsandshouldbemanagedaccordingly.
Asmanyofusknow,thisisbynomeansanewmessage,onlythatitisjustnowbeing
heeded.LeadingexpertsininformationanddataqualitysuchasRichWang,YangLee,
TomRedman,LarryEnglish,DanetteMcGilvray,DavidLoshin,LauraSebastianColeman,RajeshJugulum,SunilSoares,ArkadyMaydanchik,andmanyothershavebeen
advocatingthisprincipleformanyyears.
Evidenceofthisnewunderstandingcanbefoundinthedramaticsurgeoftheadoption
ofdatagovernance(DG)programsbyorganizationsofalltypesandsizes.Conferences,
workshops,andwebinarsonthistopicareoverflowingwithattendees.Theprimaryreason
isthatDGprovidesorganizationswithananswertothequestion,“Ifinformationisreally
animportantorganizationalasset,thenhowcanitbemanagedattheenterpriselevel?”
OneoftheprimarybenefitsofaDGprogramisthatitprovidesaframeworkfor
implementingacentralpointofcommunicationandcontroloverallofanorganization’s
dataandinformation.
AsDGhasgrownandmatured,itsessentialcomponentsbecomemoreclearlydefined.
Thesecomponentsgenerallyincludecentralrepositoriesfordatadefinitions,business
rules,metadata,data-relatedissuetracking,regulationsandcompliance,anddataquality
rules.TwootherkeycomponentsofDGaremasterdatamanagement(MDM)and
referencedatamanagement(RDM).Consequently,theincreasingadoptionofDG
programshasbroughtacommensurateincreaseinfocusontheimportanceofMDM.
CertainlythisisnotthefirstbookonMDM.SeveralexcellentbooksincludeMaster
DataManagementandDataGovernancebyAlexBersonandLarryDubov(2011),
MasterDataManagementinPracticebyDaltonCervoandMarkAllen(2011),Master
DataManagementbyDavidLoshin(2009),EnterpriseMasterDataManagementby
AllenDreibelbis,EberhardHechler,IvanMilman,MartinOberhofer,PaulvanRun,and
DanWolfson(2008),andCustomerDataIntegrationbyJillDychéandEvanLevy(2006).
However,MDMisanextensiveandevolvingtopic.Nosinglebookcanexploreevery
aspectofMDMateverylevel.
MotivationforThisBook
Numerousthingshavemotivatedustocontributeyetanotherbook.However,theprimary
reasonisthis.Basedonourexperienceinbothacademiaandindustry,webelievethat
manyoftheproblemsthatorganizationsexperiencewithMDMimplementationand
operationarerootedinthefailuretounderstandandaddresscertaincriticalaspectsof
entityidentityinformationmanagement(EIIM).EIIMisanextensionofentityresolution
(ER)withthegoalofachievingandmaintainingthehighestlevelofaccuracyintheMDM
system.Twokeytermsare“achieving”and“maintaining.”
Havingagoalanddefinedrequirementsisthestartingpointforeveryinformationand
dataqualitymethodologyfromtheMITTDQM(TotalDataQualityManagement)tothe
Six-SigmaDMAIC(Define,Measure,Analyze,Improve,andControl).Unfortunately,
whenitcomestoMDM,manyorganizationshavenotdefinedanygoals.Consequently
theseorganizationsdon’thaveawaytoknowiftheyhaveachievedtheirgoal.Theyleave
manyquestionsunanswered.Whatisouraccuracy?Nowthataproposedprogrammingor
procedurehasbeenimplemented,isthesystemperformingbetterorworsethanbefore?
FewMDMadministratorscanprovideaccurateestimatesofeventhemostbasicmetrics
suchasfalsepositiveandfalsenegativeratesortheoverallaccuracyoftheirsystem.In
thisbookwehaveemphasizedtheimportanceofobjectiveandsystematicmeasurement
andprovidedpracticalguidanceonhowthesemeasurementscanbemade.
Tohelporganizationsbetteraddressthemaintainingofhighlevelsofaccuracythrough
EIIM,themajorityofthematerialinthebookisdevotedtoexplainingtheCSRUDfivephaseentityinformationlifecyclemodel.CSRUDisanacronymforcapture,storeand
share,resolveandretrieve,update,anddispose.Webelievethatfollowingthismodelcan
helpanyorganizationimproveMDMaccuracyandperformance.
Finally,nomoderndayITbookcanbecompletewithouttalkingaboutBigData.
Seeminglyrisingupovernight,BigDatahascapturedeveryone’sattention,notjustinIT,
buteventhemanonthestreet.JustasDGseemstobegettingupagoodheadofsteam,it
nowhastodealwiththeBigDataphenomenon.TheimmediatequestioniswhetherBig
DatasimplyfitsrightintothecurrentDGmodel,orwhethertheDGmodelneedstobe
revisedtoaccountforBigData.
Regardlessofone’sopiniononthistopic,onethingisclear–BigDataisbadnewsfor
MDM.Thereasonisasimplemathematicalfact:MDMreliesonentityresolution,and
entityresolutionprimarilyreliesonpair-wiserecordmatching,andthenumberofpairsof
recordstomatchincreasesasthesquareofthenumberofrecords.Forthisreason,
ordinarydata(millionsofrecords)isalreadyachallengeforMDM,soBigData(billions
ofrecords)seemsalmostinsurmountable.Fortunately,BigDataisnotjustmatterofmore
data;itisalsousheringinanewparadigmformanagingandprocessinglargeamountsof
data.BigDataisbringingwithitnewtoolsandtechniques.Perhapsthemostimportant
techniqueishowtoexploitdistributedprocessing.However,itiseasiertotalkaboutBig
Datathantodosomethingaboutit.Wewantedtoavoidthatandincludeinourbooksome
practicalstrategiesanddesignsforusingdistributedprocessingtosolvesomeofthese
Audience
ItisourhopethatbothITprofessionalsandbusinessprofessionalsinterestedinMDMand
BigDataissueswillfindthisbookhelpful.Mostofthematerialfocusesonissuesof
designandarchitecture,makingitaresourceforanyoneevaluatinganinstalledsystem,
comparingproposedthird-partysystems,orforanorganizationcontemplatingbuildingits
ownsystem.Wealsobelievethatitiswrittenatalevelappropriateforauniversity
textbook.
OrganizationoftheMaterial
Chapters1and2providethebackgroundandcontextofthebook.Chapter1providesa
definitionandoverviewofMDM.Itincludesthebusinesscase,dimensions,and
challengesfacingMDMandalsostartsthediscussionofBigDataanditsimpacton
MDM.Chapter2definesandexplainsthetwoprimarytechnologiesthatsupportMDM–
ERandEIIM.Inaddition,Chapter2introducestheCSRUDLifeCycleforentityidentity
information.Thissetsthestageforthenextfourchapters.
Chapters3,4,5,and6aredevotedtoanin-depthdiscussionoftheCSRUDlifecycle
model.Chapter3isanin-depthlookattheCapturePhaseofCSRUD.Aspartofthe
discussion,italsocoversthetechniquesoftruthsetbuilding,benchmarking,andproblem
setsastoolsforassessingentityresolutionandMDMoutcomes.Inaddition,itdiscusses
someoftheprosandconsofthetwomostcommonlyuseddatamatchingtechniques–
deterministicmatchingandprobabilisticmatching.
Chapter4explainstheStoreandSharePhaseofCSRUD.Thischapterintroducesthe
conceptofanentityidentitystructure(EIS)thatformsthebuildingblocksoftheidentity
knowledgebase(IKB).InadditiontodiscussingdifferentstylesofEISdesigns,italso
includesadiscussionofthedifferenttypesofMDMarchitectures.
Chapter5coverstwocloselyrelatedCSRUDphases,theUpdatePhaseandtheDispose
Phase.TheUpdatePhasediscussioncoversbothautomatedandmanualupdateprocesses
andthecriticalrolesplayedbyclericalreviewindicators,correctionassertions,and
confirmationassertions.Chapter5alsopresentsanexampleofanidentityvisualization
systemthatassistsMDMdatastewardswiththereviewandassertionprocess.
Chapter6coverstheResolveandRetrievePhaseofCSRUD.Italsodiscussessome
designconsiderationsforaccessingidentityinformation,andasimplemodelfora
retrievedidentifierconfidencescore.
Chapter7introducestwoofthemostimportanttheoreticalmodelsforER,theFellegiSunterTheoryofRecordLinkageandtheStanfordEntityResolutionFrameworkorSERF
Model.Chapter7isinsertedherebecausesomeoftheconceptsintroducedintheSERF
ModelareusedinChapter8,“TheNutsandBoltsofER.”Thechapterconcludeswitha
discussionofhowEIIMrelatestoeachofthesemodels.
Chapter8describesadeeperlevelofdesignconsiderationsforERandEIIMsystems.It
discussesindetailthethreelevelsofmatchinginanEIIMsystem:attribute-level,
reference-level,andcluster-levelmatching.
Chapter9coversthetechniqueofblockingasawaytoincreasetheperformanceofER
andMDMsystems.Itfocusesonmatchkeyblocking,thedefinitionofmatch-key-tomatch-rulealignment,andtheprecisionandrecallofmatchkeys.Preresolutionblocking
andtransitiveclosureofmatchkeysarediscussedasapreludetoChapter10.
Chapter10discussestheproblemsinimplementingtheCSRUDLifeCycleforBig
Data.ItgivesexamplesofhowtheHadoopMap/Reduceframeworkcanbeusedto
addressmanyoftheseproblemsusingadistributedcomputingenvironment.
standardisnotwellunderstoodoutsideofafewindustryverticals,butithaspotential
implicationsforallindustries.Thischaptercoversthebasicrequirementsofthestandard
andhoworganizationscanbecomeISO8000compliant,andperhapsmoreimportantly,
whyorganizationswouldwanttobecompliant.
Finally,toreduceERdiscussionsinChapters3and8,AppendixAgoesintomore
detailonsomeofthemorecommondatacomparisonalgorithms.
Thisbookalsoincludesawebsitewithexercises,tipsandfreedownloadsof
demonstrationsthatuseatrialversionoftheHiPEREIMsystemforhands-onlearning.
Thewebsiteincludescontrolscriptsandsyntheticinputdatatoillustratehowthesystem
handlesvariousaspectsoftheCSRUDlifecyclesuchasidentitycapture,identityupdate,
andassertions.Youcanaccessthewebsitehere:
/>
www.allitebooks.com
Acknowledgements
Thisbookwouldnothavebeenpossiblewithoutthehelpofmanypeopleand
organizations.Firstofall,YinleandIwouldliketothankDr.RichWang,Directorofthe
MITInformationQualityProgram,forstartingusonourjourneytodataqualityandfor
writingtheforewordforourbook,andDr.ScottSchumacher,DistinguishedEngineerat
IBM,forhissupportofourresearchandcollaboration.Wewouldalsoliketothankour
employers,IBMCorporation,UniversityofArkansasatLittleRock,andBlackOak
Analytics,Inc.,fortheirsupportandencouragementduringitswriting.
IthasbeenaprivilegetobeapartoftheUALRInformationQualityProgramandto
workwithsomanytalentedstudentsandgiftedfacultymembers.Iwouldespeciallylike
toacknowledgeseveralofmycurrentstudentsfortheircontributionstothiswork.These
includeFumikoKobayashi,identityresolutionmodelsandconfidencescoresinChapter6;
ChengChen,EISvisualizationtoolsandconfirmationassertionsinChapter5andHadoop
map/reduceinChapter10;DanielPullen,clericalreviewindicatorsinChapter5and
Hadoopmap/reduceinChapter10;PeiWang,blockingforscoringrulesinChapter9,
Hadoopmap/reduceinChapter10,andthedemonstrationdata,scripts,andexerciseson
thebook’swebsite;DebanjanMahata,EIIMforunstructureddatainChapter1;Melody
Penning,entity-baseddataintegrationinChapter1;andReedPetty,IKBstructurefor
HDFSinChapter10.InadditionIwouldliketothankmyformerstudentDr.EricNelson
forintroducingthenullruleconceptandforsharinghisexpertiseinHadoopmap/reduce
inChapter10.SpecialthanksgotoDr.LauraSebastian-Coleman,DataQualityLeaderat
Cigna,andJoshuaJohnson,UALRTechnicalWritingProgram,fortheirhelpinediting
andproofreading.FinallyIwanttothankmyteachingassistants,FumikoKobayashi,
KhizerSyed,MichaelGreer,PeiWang,andDanielPullen,andmyadministrative
assistant,NihalErian,forgivingmetheextratimeIneededtocompletethiswork.
Iwouldalsoliketotakethisopportunitytoacknowledgeseveralorganizationsthat
havesupportedmyworkformanyyears.AcxiomCorporationunderCharlesMorganwas
oneofthefoundersoftheUALRIQprogramandcontinuestosupporttheprogramunder
ScottHowe,thecurrentCEO,andAllisonNicholas,DirectorofCollegeRecruitingand
UniversityRelations.IamgratefulformyexperienceatAcxiomandtheopportunityto
learnaboutBigDataentityresolutioninadistributedcomputingenvironmentfromDr.
TerryTalleyandthemanyotherworld-classdataexpertswhoworkthere.
TheArkansasResearchCenterunderthedirectionofDr.NealGibsonandDr.Greg
HollandwerethefirsttosupportmyworkontheOYSTERopensourceentityresolution
system.TheArkansasDepartmentofEducation–inparticularformerAssistant
CommissionerJimBoardmanandhissuccessor,Dr.CodyDecker,alongwithArijit
SarkarintheITServicesDivision–gavemetheopportunitytobuildastudentMDM
systemthatimplementsthefullCSRUDlifecycleasdescribedinthisbook.
TheTranslationalResearchInstitute(TRI)attheUniversityofArkansasforMedical
Scienceshasgivenmeandseveralofmystudentstheopportunityforhands-onexperience
Hogan,theformerDirectorofTRIforteachingmeaboutreferenttracking,andalsoDr.
UmitTopalogluthecurrentDirectorofInformaticsatTRIwhoalongwithDr.Mathias
Brochhausencontinuesthiscollaboration.
LastbutnotleastaremybusinesspartnersatBlackOakAnalytics.OurCEO,Rick
McGraw,hasbeenatrustedfriendandbusinessadvisorformanyyears.BecauseofRick
andourCOO,JonathanAskins,whatwasonlyavisionhasbecomeareality.
JohnR.Talburt,andYinleZhou
CHAPTER1
TheValuePropositionforMDMandBig
Data