UnleashingthepowerofApacheAtlaswith ApacheRanger VirtualDataConnectorProject NIGELJONES JONESN@UK.IBM.COM DATAWORKS,MUNICH,APRIL2017 Apache®,ApacheAtlas,ApacheRanger&otherApacheprojectnamesreferencedareeitherregisteredtrademarksortrademarksoftheApache SoftwareFoundationintheUnitedStatesand/orothercountries.NoendorsementbyTheApacheSoftwareFoundationisimpliedbytheuseof thesemarks.
AboutMe–NigelJones •https://www.linkedin.com/in/nigelljones/ •jonesn@uk.ibm.com(Anyonestilluseemail?) •@planetf1–noisy,f1,electricvehicles,food&drink….Asplitofwork/life accountsdidn’tworkforme! •AndofcoursetheApacheAtlas&Rangermailinglists&JIRA! •Sciencefanatschooluni.Itwascloudchambersbackthen…nowjustthecloud J •IBMHursley,UKsince1990 •Last3yearsfocusonDataLake,InformationGovernance,OpenMetadata
TheProblem….. WHYAREWEHERE…..
Data? •WhatdatadoIhave? •Whatdoesitmean? •Whereisit? •Whohasaccesstoit? •Whoownsit? •Whatqualityisit? •Howdoesitrelatetootherdata? •HowtoIcontrol,audit&understandaccess?
Regulatoryneeds •AdheretoregulationslikeBCBS-239andGDPR •Needtoknowmeaning,valueofthedata •Demonstrateprocessesinplacetogovernaccess •Audit •Significantfinesifrulesbreached •Whilstensuringeasy,readyaccesstoappropriatedatafordataprofessionalstosupport anagilebusiness
Sowhatdoweneedtoaddressthis?
Metadata.. •Metadataenablesdatatobeusedoutsideoftheapplicationthatcreatedit. •Analyticsanddecisionmaking •Newbusinessapplications •Reportingandcompliance •Metadatadescribestheformatandcontentofdataallowingpeopletojudgewhich datasettouseforanewproject •Structure •Meaning •Origin •Validvaluesandquality •Usageandownership •Regulationsandclassificationsthatapply
Whichcansupport… •Anenterprisedatacataloguethatlistsalldataincludingwhereitis,whatitis,who ownsit,it’smeaning,quality,whereitcamefrom,andcanfullydescribeit’s businesscontext&howthedatashouldbegoverned…. •SubjectMatterexpertssearching,collaborating,feedingbackabouttheirdata needsanduse •Automatedgovernanceactionstoprotectandmanageincludingauditing, monitoring,qualitycontrol,rightsmanagement
Buteasily… •Openframeworks&APIs •Automaticcollection&discoveryofmetadatainadynamicheterogeneous environment •Usingpredefinedstandardsforglossaries,schemas,rules,regulationstoreduce cost •Cheaptointegratenewtools •Noproprietarylock-in&assumptionsthatalltoolsarefromonesuiteorvendor •Avoidingsilos •DistributedandOpen
Thevision Open	and Unified	Metadata
VirtualizationDataConnectorproject
Datavirtualizationproject •Collaboration–IBM,severalbanks&opencommunity •ADataLakeenvironment •NotjustHadoop,butothersourcestoo •BusinessTerms,Classifications,Metadatarich •Offervirtualizedviews.Exposerelationaldatawithbusinessterms •ManageAccesstoresources–permit,deny,log,filter/mask….THROUGH METADATA •Open,pluggable •Workingthroughusecases,design,initialMVP(thisyear) •Critique,feedbackiswelcomed.We’relookingforguidanceandsupportfromthe Atlas&Rangercommunitiesaswellascontributeourideas •ProposedchangesallgothroughmailinglistandJIRAforfeedback
ApacheAtlas •“Atlasisascalableandextensiblesetofcorefoundationalgovernanceservices– enablingenterprisestoeffectivelyandefficientlymeettheircompliance requirementswithinHadoopandallowsintegrationwiththewholeenterprisedata ecosystem.”….http://www.apache.org •OpenCommunity--ApacheIncubatorsinceMay2015 •Typeagnosticmetadatastore •RESTAPI&UI •SupportsmanyHadoopcomponentsincludingHBase,Hive,Sqoop,Storm& others
ApacheRanger •Centralizedsecurityadministrationtomanageallsecurityrelatedtasksinacentral UIorusingRESTAPIs. •Finegrainedauthorizationtodoaspecificactionand/oroperationwithHadoop component/toolandmanagedthroughacentraladministrationtool •StandardizeauthorizationmethodacrossallHadoopcomponents. •Enhancedsupportfordifferentauthorizationmethods-Rolebasedaccesscontrol, attributebasedaccesscontroletc. •Centralizeauditingofuseraccessandadministrativeactions(securityrelated) withinallthecomponentsofHadoop. •…fromhttp://ranger.apache.org
ProjectInteractions Search/Report GaianDB •Searchforlistofassetsbymetadata •Searchfordata •Reportingtoolobtainsdatatodrawreport Underlyingdata,sql,hive, HDFS,Oracle,Netezzaetc Manageslogicalviews Deploysrules,pushes classifications,sourcefor userroles(notusers) +rangerplugintopermit/deny,masketc Pullsrules.classifications RDBMSHadoop ApacheAtlas Apache Ranger ApacheSolr
WhyAtlasandRanger? •OpenSourceessentialtoforminganactiveecosystem •Vision,activecommunity&evolving–abilitytocontribute&workwithothersto providethebestsolution •Alreadyhavegoodcorecapabilities •Atlastypesystemisveryflexible •Rangeroffersarangeofpolicytypesandprovidesapluggableframework •Alreadycrossprojectintegration •UseoftagbasedpolicieinRangersourcedfromAtlas •CanbeusedindependentlyoffullHadoopstack
Refinedvirtualconnectorscopescope GaianDB Ranger Plugin Titan (GraphDB, Metadata Repository) Ranger Config RangerServer Atlas PollPolicies OMAS OMRS IGC PrePostCreate	View Metadata Extract	physical metadata Manage Logical Tables Virtualizer Retrievemetadata Retrievemetadata Retrievemetadata Pushmetadata OracleNetezza Hive Tables Pushandquerymetadata DataLakeRepositories Meta Data DataLakeVirtualization tag-sync rule-sync Config	(eg	Policies, Audit	log	locaMon) LDAP Audit	Log Mapper	Searchfordata/reporting Pushandquery metadata	Meta Data Navigator	Meta Data Datameer
GaianDB&Virtualizer •GaianDB •OpenSource •Federated,selflearning,dynamicconfiguration •BasedonApacheDerby •Alreadyhad“policy”support–we’replugginginRangerfor thisproject •Virtualizer •Listenstoeventnotificationsonassetsetc •CreatesviewdefinitionsinGaianDB,andnewAtlasAPIsto storemetadata.Couldusedifferentvirtualengine.. •Designedtobeopentoothervirtualizationtechnologies. LT1LT2 DS2DS1DS3 Policy Plugin (ranger) VirtualizerAtlas GaianDBsupportsfederation –notusedforMVP
Atlas–glossaryenhancements •GetAtlasclosertoparitywithcommercialofferings •BusinessTerms–categories,categoryhierarchies •Has-a,is-a,type-of,synonym,antonym,arbitraryrelationships •AssetsmappedtoBusinessTerms •Classifications •Hierarchy •Navigablemappingstoretainabilitytoflattentagstoranger •InsteadofhivecolumnEMP_SALARY->SPI,nowcanbeEMP_SALARY->SALARY-> SPI… •Usedtodrivegovernance •ATLAS-1410
Atlas–otherenhancements •ConsumerCentricAPIs •OpenMetadataAccessServices(OMAS) •REST&moreKafkanotifications •Asset,Catalog,Connector,Glossary,GovernanceAction,GovernanceDefinitions, InformationView,RolesandAccess •RepositorylevelAPIs •OpenMetadataRepositoryServices(OMRS) •REST&moreKafkanotifications •PluggabilitythroughanOpenConnectorFrameworktoothermetadatarepositories– distributedandOpen •Standarddatamodel/core •Enhancementtocoremodel–versioning,externallinkageetc •Morestandardtypesieforallrelationaldatabasestoeasesharing
Rangerareasbeinglookedat •BuildingapluginforGaianDB •Accesscontrol,simplemasking.Morelater •Usersynchronization(large#users,roleofAtlas) •ChangestotagsyncprocessforNewglossaryproposal •AsmoremetadatagoesintoAtlas,itbecomessourceforgenerationofsomekinds ofpolicies.Whereisthemaster? •Generatingrangerrulesfromgovernancedefinitions •HowaboutcontrolofaccesstoAtlasitself? •Aside:Interfacesusedbyenforcementengines(suchastogetclassificationdata) needtobeefficient–theseshouldworkforprojectslikeApacheSentryaswellas Atlas
BeyondtheMVP •OpenDiscoveryFramework •Considerothersecurityenforcementengines–suchasApacheSentry&driving morecapabilityaroundrules&governanceactionsfromAtlasmetadata •Workonstandardmodelstosupportdifferentdomains •Lineage •Fromhighleveldesignlineagethroughtooperationaldetail.Logsvsgraph…. •APImetadata •Infrastructure–JanusGraph… •AbstractionaddedbyIBMinlastfewmonthsfortitan1
Thevision •Anenterprisedatacatalogthatlistsallofyourdata,whereitislocated,itsorigin(lineage), owner,structure,meaning,classificationandquality •Spanningsystemsbothonpremiseandcloudproviders •Hostedlocallytoyourdataplatformsbutintegratedtoprovidetheenterpriseview •Newdatatools(fromanyvendor)connecttoyourdatacatalogoutofthebox •Novendorlock-in;norexpensivepopulationofyetanotherproprietarysiloedmetadatarepository •Metadataisaddedautomaticallytothecatalogasnewdataiscreated •Extensiblediscoveryprocessescharacteriseandclassifythedata •Interestedpartiesandprocessesarenotified •Subjectmatterexpertscollaboratingaroundthedata •Locatethedatatheyneed,quicklyandefficiently •Feedbacktheirknowledgeaboutthedataandtheusestheyhavemadeaboutittohelpothersand supporteconomicevaluationofdata •Automatedgovernanceprocessesprotectandmanageyourdata •Metadata-drivenaccesscontrol
Summary •Atlascanhelpushaveanindustrywidecommonmetadataplatformaroundwhicha vibrantecosystemcanevolve •NotonlyinHadoopbutmorebroadly •Metadatadrivengovernancecanbescalable&enableustomanageourdatabetter, andbecompliantwithregulations •Theideaspresentedhereresonatewithmanypeoplewe’vespokento •Getinvolved!I’dlovetohearthefeedbackonthisapproach! •CommentontheJIRAS,askquestions,contribute,disagree…;-) •LookatJIRATag“VirtualDataConnector”orstartatATLAS-1689 •Atlaswiki •“Innovationhappensbestnotinisolationbutincollaboration”(keynote) •THANKS!
Questions Afterthistalk jonesn@uk.ibm.com 17:50Room4–Security&GovernanceBOF z zzz z z z Questions?
Backupcharts
Atlas graphDB “gaiandb” IG C IGC	REST	API Oracle Data HDFS Data Netezza Data P-JDBCP-JDBCP-JDBC GAF	OMAS Virtual Asset OMAS Search Search/ExploreUI Catalog OMAS OMR S OMR S GAF	Pre GAF	Post Connector	Framework * Atlas	boundaries Developed	in	POC May	not	be	in	POC	iniNally *May	be	hardcoded	at	first Conne ctor Frame work ATLAS Virtualizer Architecture
Metadataareasandtypes Policy	Metadata	(Principles, Regula6ons,	Standards,	Approaches, Rule	Specifica6ons,	Roles	and	Metrics) Governance Ac6ons	and Processes Augmenta6on Mapping Implementa6on Connector	Directories Access Access Informa6on Auditor Integra6on Developer Business Analyst Data Scien6st Informa6on Worker Informa6on Owner Informa6on Governor Informa6on Steward Data Quality Analyst Business	Objects	and Rela6onships,	Taxonomies	and Ontologies Business	AMributes Organiza6on Informa6on Curator Teaming	Metadata (people	profiles,	communi6es, projects, notebooks,	…) Models	and	Schemas 3 2 4 5 Physical	Asset	Descrip6ons (Data	stores,	APIs, models	and	components) Asset	Collec6ons (Sets,	Typed	Sets,	Type Organized	Sets) Informa6on	Views Rights Management Reference	Data Feedback	Metadata (tags,	comments,	ra6ngs,	…) Classifica6on Schemes C l a s s if i c a 6 o n StrategySubject	Area	Defini6on Campaigns	and	Projects Infrastructure	and	systems Rollout 1 Discovery Metadata	(profile	data,	technical classifica6on,	data	classifica6on, data	quality	assessment,	…) Augmenta6on Instrument Associa6on Informa6on	Process Instrumenta6on	(design	lineage) 6 7
User&Group/Rolesynchronization UserSync2 LDAPholdsrole-membership (LDAPgroups)–couldalsobe ActiveDirectory ATLASmanagesdefinitive listofroles<thatareusedfor atlasmanagedsources> •CorporateLDAPhasahugenumberofusers/groups •Rangercurrentlyneedstosyncall •Infutureperhapsweestablishgroup/rolemembership duringauthentication •Capabilityforalternativesourcecouldbemergedinto baseUserSync LDAPlookup-> group:member GovernanceActionOMAS -getRoles Apache Ranger LDAP ApacheAtlas
AtlasGlossaryv2:TagSynctoRanger TagSync2 ATLASglossarymanagesa sophisticatedenterpriseglossary structure •AtlasGlossaryv2ProposedinATLAS-1410(DavidRadley)SyncBuildsonexistingtagsyncapproach •NewAPIinAtlaswillflattenclassificationstructure •Nochangestoranger–butexposingricherclassificationcouldbeareaoffuturework GovernanceActionOMAS Confidential Salary emp_renum Business Term HiveColumn Business Term Confidential emp_renum HiveColumn Tag Apache Ranger ApacheAtlas
Policy(Rule)synchronization RuleSync •GeneratepoliciesinRangerbasedoffentitiesinAtlas •Currentlydesigninghowthisworks •ScopedbypolicyservicesoexistingRangerUIapproachstillworks GovernanceActionOMAS -getRules Role Classifications Asset RangerRule Action Apache RangerApacheAtlas
VirtualDataConnectorJIRAS20170402 •RANGER- 1488 •RANGER- 1487 •RANGER- 1486 •RANGER- 1485 •RANGER- 1464 •RANGER- 1454 •RANGER- 1234 •RANGER- •CreateRangerpluginforgaiandb •generaterulesfromGovernancedefinitionsinAtlas •NewusersyncalternativeforAtlas(vdc) •RangersupportforVirtualDataConnectorProject(ATLAS) •SupportAtlasv2glossaryinAtlasplugin(foraccesscontroltotermsetc) •SupportofAtlasv2glossaryAPIproposalfortagsource •Post-evaluationphaseuserextensions •RangerSource:eclipse •Adddatamaskingfortagbasedpolicies •GovernanceActionFrameworkOMAS •SampleassetstosupportVirtualConnectorProject •OMASInterfacesforAtlas •BuildATLASusingDocker
References •ApacheAtlas-http://atlas.apache.org/ •ToplevelJIRAforthisactivityhttps://issues.apache.org/jira/browse/ATLAS-1689 •ApacheRanger-http://ranger.apache.org/ •GaianDB •https://github.com/gaiandb/gaiandb •https://developer.ibm.com/open/openprojects/gaian-database/ •Thecaseforopenmetadata–A.M.Chessell •http://www.ibmbigdatahub.com/blog/case-open-metadata

Unleashing the Power of Apache Atlas with Apache Ranger