SlideShare a Scribd company logo
Open Source BI
Deep Dive




      Ravi Samji
      21/04/2011
Agenda
Business Intelligence – Why, What & Who?
Open Source BI – Introduction, Tech Stack
OLAP Engine – Mondrian
UI Layer – JPivot
Performance & Scalability
Constraints
About Yodlee
Business Intelligence – Why?
Data is the biggest asset
   Structured and Unstructured format
Most of our assets are buried
Helps us understand customer behavior
Helps us deliver better business value
Measure performance
Business Intelligence – What?
Reporting
Analytics
Data/Text Mining
ETL
Predictive Analytics
Business Intelligence – Who?
Open Source BI – Introduction
Mondrian – OLAP Engine
   Initially Independent Open Source Initiative
   Now Part of Pentaho Open Source BI Suite
100% Pure Java
Supports MDX and XML/A
Bundled With Other Open Source Packages
Open Source BI – Tech Stack

    JFreeChart              WCF




                                  log4j
                                  log4j
                  JPivot

                 Mondrian




                  RDBMS
OLAP Engine – Mondrian
Cube Definition – schema.xml
MDX – Query language to access multi dimensional data
Operates on normalized relational database
Mondrian – schema.xml
Logical model of a multi dimensional database
Cube, VirtualCube
Dimensions, Hierarchies, Levels
Measure, CalculatedMember
Logical Model – Multi Dimensional
<Schema>
                  Database
 <Cube name="Sales">
  <Table name="sales_fact_1997"/>
  <Dimension name="Gender" foreignKey="customer_id">
   <Hierarchy hasAll="true" allMemberName="All Genders" primaryKey="customer_id">
    <Table name="customer"/>
    <Level name="Gender" column="gender" uniqueMembers="true"/>
   </Hierarchy>
  </Dimension>
  <Dimension name="Time" foreignKey="time_id">
   <Hierarchy hasAll="false" primaryKey="time_id">
    <Table name="time_by_day"/>
    <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/>
    <Level name="Quarter" column="quarter" uniqueMembers="false"/>
    <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/>
   </Hierarchy>
  </Dimension>
  <Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###"/>
  <Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##"/>
  <Measure name="Store Cost" column="store_cost" aggregator="sum" formatString="#,###.00"/>
  <CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales] - [Measures].[Store Cost]">
   <CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/>
  </CalculatedMember>
 </Cube>
</Schema>
Dimensions & Shared Dimensions
 <Schema>

 <Dimension name="Time">
  <Hierarchy hasAll="false" primaryKey="time_id">
   <Table name="time_by_day"/>
   <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/>
   <Level name="Quarter" column="quarter" uniqueMembers="false"/>
   <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/>
  </Hierarchy>
 </Dimension>

  <Cube name="Sales">
   <Table name="sales_fact_1997"/>

  <DimensionUsage name=“Time" source=“Time" foreignKey="time_id”/>

   <Measure …/>
   <CalculatedMember …/>
  </Cube>

  <Cube name=“Warehouse">
   <Table name="sales_fact_1997"/>

  <DimensionUsage name=“Time" source=“Time" foreignKey="time_id”/>

  <Measure …/>
   <CalculatedMember …/>
  </Cube>

 </Schema>
Hierarchies
<Schema>

<Dimension name="Time">
 <Hierarchy hasAll="false" primaryKey="time_id">
  <Table name="time_by_day"/>
  <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/>
  <Level name="Quarter" column="quarter" uniqueMembers="false"/>
  <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/>
 </Hierarchy>

 <Hierarchy name=“Fiscal Calendar” hasAll="false" primaryKey="time_id">
  <Table name="time_by_day"/>
  <Level name="Year" column=“fiscal_year" type="Numeric" uniqueMembers="true"/>
  <Level name="Quarter" column=“fiscal_quarter" uniqueMembers="false"/>
  <Level name="Month" column=“fiscal_month_of_year" type="Numeric" uniqueMembers="false"/>
 </Hierarchy>
</Dimension>

 <Cube name="Sales">
  <Table name="sales_fact_1997"/>

  <DimensionUsage name=“Time" source=“Time" foreignKey="time_id”/>

 <Measure …/>
  <CalculatedMember …/>
 </Cube>
</Schema>
Schema.xml – Extensions
Plug-in classes
In-line tables
Views
User defined functions
Extensions – Plug-in Classes
Member Reader
Member Formatter
Cell Reader
Cell Formatter
Property Formatter
Extensions – In-line Tables
<Dimension name="Severity">
 <Hierarchy hasAll="true" primaryKey="severity_id">

 <InlineTable alias="severity">
  <ColumnDefs>
   <ColumnDef name="id" type="Numeric"/>
   <ColumnDef name="desc" type="String"/>
  </ColumnDefs>
  <Rows>
   <Row>
    <Value column="id">1</Value>
    <Value column="desc">High</Value>
   </Row>
   <Row>
    <Value column="id">2</Value>
    <Value column="desc">Medium</Value>
   </Row>
   <Row>
    <Value column="id">3</Value>
    <Value column="desc">Low</Value>
   </Row>
  </Rows>
 </InlineTable>

  <Level name="Severity" column="id" nameColumn="desc" uniqueMembers="true"/>
 </Hierarchy>
</Dimension>
Extensions – Views
<Cube name="Operations">

 <View alias="StateCountyCity">
  <SQL dialect="generic">
   <![CDATA[
SELECT s.state_name, c.county_name, t.city_name, s.state_id, c.county_id, t.city_id
FROM state s
LEFT JOIN county c ON (c.state_id = s.state_id)
LEFT JOIN city t ON (c.county_id = t.county_id)
  ]]>
  </SQL>
 </View>

</Cube>
Extensions – User Defined Functions
  Must implement mondrian.spi.UserDefinedFunction
  Implementation must be available in classpath
  UDF Definition in schema.xml
<Schema>
 ...
 <UserDefinedFunction name="PlusOne" className=“my.udf.PlusOne" />
</Schema>

  MDX Usage
WITH MEMBER [Measures].[Unit Sales Plus One]
    AS 'PlusOne([Measures].[Unit Sales])'
SELECT
    {[Measures].[Unit Sales]} ON COLUMNS,
    {[Gender].MEMBERS} ON ROWS
FROM [Sales]
MDX / JDBC Parallels
Mondrian                                JDBC
Connection – mondrian.olap.Connection   Connection – java.sql.Connection
Query – mondrian.olap.Query             Statement – java.sql.Statement
Result – mondrian.olap.Result           ResultSet – java.sql.ResultSet
Access Axis & Cell from Result          Access Rows & Columns from ResultSet
UI Layer – JPivot
Performance & Scalability
Enable SQL statement logging to analyze
mondrian generated SQL statements
Index on foreign/join keys
Use Aggregate Tables & Materialized Views
Query results in session
Constraints
Composite key joins are not supported
Uniqueness within a level is not based on id
Have had issues re-using same table with a
different alias
Make mondrian happy schema – must be
normalized
Requires dedicated Time dimension table
Summary
100% Pure Java BI tool
Not too difficult to work with
Extensible for different front-end layers
Scalable
Viable alternative to proprietary tools
    No vendor lock-in – Open Source
    Less TCO
    Quicker Time To Market

More Related Content

PPTX
ADF Bindings & Data Controls
PDF
“What did I do?” - T-SQL Worst Practices
PPTX
Large scale sql server best practices
PDF
Multidimensional Data Analysis with Ruby (sample)
PDF
TechDays 2013 Jari Kallonen: What's New WebForms 4.5
PDF
Odoo Experience 2018 - Develop an App with the Odoo Framework
PPT
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
PPT
OLAP Cubes in Datawarehousing
ADF Bindings & Data Controls
“What did I do?” - T-SQL Worst Practices
Large scale sql server best practices
Multidimensional Data Analysis with Ruby (sample)
TechDays 2013 Jari Kallonen: What's New WebForms 4.5
Odoo Experience 2018 - Develop an App with the Odoo Framework
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
OLAP Cubes in Datawarehousing

Similar to Learning Open Source Business Intelligence (20)

PPTX
Diplomado Técnico SQL Server 2012 - Sesión 5/8
PPTX
Analysis Services en SQL Server 2008
PDF
Mstr meetup
PPT
Mondrian update (Pentaho community meetup 2012, Amsterdam)
PPTX
Data Query Using Structured Query Language - WITH NOTES.pptx
PPT
Oracle bi ee architecture
PPT
BI 2008 Simple
PDF
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
PPTX
SQL for Web APIs - Simplifying Data Access for API Consumers
PPTX
Take Your XPages Development to the Next Level
PDF
PLAT-15 Forms Config, Customization, and Extension
PPTX
(Updated) SharePoint & jQuery Guide
PDF
Staying railsy - while scaling complexity or Ruby on Rails in Enterprise Soft...
PDF
Accessible Salesforce
PPTX
Oracle Application Express & jQuery Mobile - OGh Apex Dag 2012
PDF
Profiling Mondrian MDX Requests in a Production Environment
PDF
WebNet Conference 2012 - Designing complex applications using html5 and knock...
PDF
AngularJS
PPTX
Developing Next-Gen Enterprise Web Application
PDF
Global Scale ESB with Mule
Diplomado Técnico SQL Server 2012 - Sesión 5/8
Analysis Services en SQL Server 2008
Mstr meetup
Mondrian update (Pentaho community meetup 2012, Amsterdam)
Data Query Using Structured Query Language - WITH NOTES.pptx
Oracle bi ee architecture
BI 2008 Simple
Microsoft SQL Server Analysis Services (SSAS) - A Practical Introduction
SQL for Web APIs - Simplifying Data Access for API Consumers
Take Your XPages Development to the Next Level
PLAT-15 Forms Config, Customization, and Extension
(Updated) SharePoint & jQuery Guide
Staying railsy - while scaling complexity or Ruby on Rails in Enterprise Soft...
Accessible Salesforce
Oracle Application Express & jQuery Mobile - OGh Apex Dag 2012
Profiling Mondrian MDX Requests in a Production Environment
WebNet Conference 2012 - Designing complex applications using html5 and knock...
AngularJS
Developing Next-Gen Enterprise Web Application
Global Scale ESB with Mule
Ad

More from Saltmarch Media (17)

PDF
Concocting an MVC, Data Services and Entity Framework solution for Azure
PDF
Caring about Code Quality
PDF
Java EE 7: the Voyage of the Cloud Treader
PDF
Is NoSQL The Future of Data Storage?
PDF
Introduction to WCF RIA Services for Silverlight 4 Developers
PDF
Integrated Services for Web Applications
PDF
Gaelyk - Web Apps In Practically No Time
PDF
CDI and Seam 3: an Exciting New Landscape for Java EE Development
PDF
JBoss at Work: Using JBoss AS 6
PDF
WF and WCF with AppFabric – Application Infrastructure for OnPremise Services
PDF
Building RESTful Services with WCF 4.0
PDF
Building Facebook Applications on Windows Azure
PDF
Architecting Smarter Apps with Entity Framework
PDF
Agile Estimation
PDF
Alternate JVM Languages
PDF
A Cocktail of Guice and Seam, the missing ingredients for Java EE 6
PDF
A Bit of Design Thinking for Developers
Concocting an MVC, Data Services and Entity Framework solution for Azure
Caring about Code Quality
Java EE 7: the Voyage of the Cloud Treader
Is NoSQL The Future of Data Storage?
Introduction to WCF RIA Services for Silverlight 4 Developers
Integrated Services for Web Applications
Gaelyk - Web Apps In Practically No Time
CDI and Seam 3: an Exciting New Landscape for Java EE Development
JBoss at Work: Using JBoss AS 6
WF and WCF with AppFabric – Application Infrastructure for OnPremise Services
Building RESTful Services with WCF 4.0
Building Facebook Applications on Windows Azure
Architecting Smarter Apps with Entity Framework
Agile Estimation
Alternate JVM Languages
A Cocktail of Guice and Seam, the missing ingredients for Java EE 6
A Bit of Design Thinking for Developers
Ad

Recently uploaded (20)

PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPT
Teaching material agriculture food technology
PDF
Encapsulation theory and applications.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
KodekX | Application Modernization Development
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Approach and Philosophy of On baking technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
“AI and Expert System Decision Support & Business Intelligence Systems”
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Understanding_Digital_Forensics_Presentation.pptx
Teaching material agriculture food technology
Encapsulation theory and applications.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
KodekX | Application Modernization Development
The Rise and Fall of 3GPP – Time for a Sabbatical?
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Spectral efficient network and resource selection model in 5G networks
Review of recent advances in non-invasive hemoglobin estimation
Approach and Philosophy of On baking technology
Network Security Unit 5.pdf for BCA BBA.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows

Learning Open Source Business Intelligence

  • 1. Open Source BI Deep Dive Ravi Samji 21/04/2011
  • 2. Agenda Business Intelligence – Why, What & Who? Open Source BI – Introduction, Tech Stack OLAP Engine – Mondrian UI Layer – JPivot Performance & Scalability Constraints
  • 4. Business Intelligence – Why? Data is the biggest asset Structured and Unstructured format Most of our assets are buried Helps us understand customer behavior Helps us deliver better business value Measure performance
  • 5. Business Intelligence – What? Reporting Analytics Data/Text Mining ETL Predictive Analytics
  • 7. Open Source BI – Introduction Mondrian – OLAP Engine Initially Independent Open Source Initiative Now Part of Pentaho Open Source BI Suite 100% Pure Java Supports MDX and XML/A Bundled With Other Open Source Packages
  • 8. Open Source BI – Tech Stack JFreeChart WCF log4j log4j JPivot Mondrian RDBMS
  • 9. OLAP Engine – Mondrian Cube Definition – schema.xml MDX – Query language to access multi dimensional data Operates on normalized relational database
  • 10. Mondrian – schema.xml Logical model of a multi dimensional database Cube, VirtualCube Dimensions, Hierarchies, Levels Measure, CalculatedMember
  • 11. Logical Model – Multi Dimensional <Schema> Database <Cube name="Sales"> <Table name="sales_fact_1997"/> <Dimension name="Gender" foreignKey="customer_id"> <Hierarchy hasAll="true" allMemberName="All Genders" primaryKey="customer_id"> <Table name="customer"/> <Level name="Gender" column="gender" uniqueMembers="true"/> </Hierarchy> </Dimension> <Dimension name="Time" foreignKey="time_id"> <Hierarchy hasAll="false" primaryKey="time_id"> <Table name="time_by_day"/> <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/> <Level name="Quarter" column="quarter" uniqueMembers="false"/> <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/> </Hierarchy> </Dimension> <Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###"/> <Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##"/> <Measure name="Store Cost" column="store_cost" aggregator="sum" formatString="#,###.00"/> <CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales] - [Measures].[Store Cost]"> <CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/> </CalculatedMember> </Cube> </Schema>
  • 12. Dimensions & Shared Dimensions <Schema> <Dimension name="Time"> <Hierarchy hasAll="false" primaryKey="time_id"> <Table name="time_by_day"/> <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/> <Level name="Quarter" column="quarter" uniqueMembers="false"/> <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/> </Hierarchy> </Dimension> <Cube name="Sales"> <Table name="sales_fact_1997"/> <DimensionUsage name=“Time" source=“Time" foreignKey="time_id”/> <Measure …/> <CalculatedMember …/> </Cube> <Cube name=“Warehouse"> <Table name="sales_fact_1997"/> <DimensionUsage name=“Time" source=“Time" foreignKey="time_id”/> <Measure …/> <CalculatedMember …/> </Cube> </Schema>
  • 13. Hierarchies <Schema> <Dimension name="Time"> <Hierarchy hasAll="false" primaryKey="time_id"> <Table name="time_by_day"/> <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/> <Level name="Quarter" column="quarter" uniqueMembers="false"/> <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/> </Hierarchy> <Hierarchy name=“Fiscal Calendar” hasAll="false" primaryKey="time_id"> <Table name="time_by_day"/> <Level name="Year" column=“fiscal_year" type="Numeric" uniqueMembers="true"/> <Level name="Quarter" column=“fiscal_quarter" uniqueMembers="false"/> <Level name="Month" column=“fiscal_month_of_year" type="Numeric" uniqueMembers="false"/> </Hierarchy> </Dimension> <Cube name="Sales"> <Table name="sales_fact_1997"/> <DimensionUsage name=“Time" source=“Time" foreignKey="time_id”/> <Measure …/> <CalculatedMember …/> </Cube> </Schema>
  • 14. Schema.xml – Extensions Plug-in classes In-line tables Views User defined functions
  • 15. Extensions – Plug-in Classes Member Reader Member Formatter Cell Reader Cell Formatter Property Formatter
  • 16. Extensions – In-line Tables <Dimension name="Severity"> <Hierarchy hasAll="true" primaryKey="severity_id"> <InlineTable alias="severity"> <ColumnDefs> <ColumnDef name="id" type="Numeric"/> <ColumnDef name="desc" type="String"/> </ColumnDefs> <Rows> <Row> <Value column="id">1</Value> <Value column="desc">High</Value> </Row> <Row> <Value column="id">2</Value> <Value column="desc">Medium</Value> </Row> <Row> <Value column="id">3</Value> <Value column="desc">Low</Value> </Row> </Rows> </InlineTable> <Level name="Severity" column="id" nameColumn="desc" uniqueMembers="true"/> </Hierarchy> </Dimension>
  • 17. Extensions – Views <Cube name="Operations"> <View alias="StateCountyCity"> <SQL dialect="generic"> <![CDATA[ SELECT s.state_name, c.county_name, t.city_name, s.state_id, c.county_id, t.city_id FROM state s LEFT JOIN county c ON (c.state_id = s.state_id) LEFT JOIN city t ON (c.county_id = t.county_id) ]]> </SQL> </View> </Cube>
  • 18. Extensions – User Defined Functions Must implement mondrian.spi.UserDefinedFunction Implementation must be available in classpath UDF Definition in schema.xml <Schema> ... <UserDefinedFunction name="PlusOne" className=“my.udf.PlusOne" /> </Schema> MDX Usage WITH MEMBER [Measures].[Unit Sales Plus One] AS 'PlusOne([Measures].[Unit Sales])' SELECT {[Measures].[Unit Sales]} ON COLUMNS, {[Gender].MEMBERS} ON ROWS FROM [Sales]
  • 19. MDX / JDBC Parallels Mondrian JDBC Connection – mondrian.olap.Connection Connection – java.sql.Connection Query – mondrian.olap.Query Statement – java.sql.Statement Result – mondrian.olap.Result ResultSet – java.sql.ResultSet Access Axis & Cell from Result Access Rows & Columns from ResultSet
  • 20. UI Layer – JPivot
  • 21. Performance & Scalability Enable SQL statement logging to analyze mondrian generated SQL statements Index on foreign/join keys Use Aggregate Tables & Materialized Views Query results in session
  • 22. Constraints Composite key joins are not supported Uniqueness within a level is not based on id Have had issues re-using same table with a different alias Make mondrian happy schema – must be normalized Requires dedicated Time dimension table
  • 23. Summary 100% Pure Java BI tool Not too difficult to work with Extensible for different front-end layers Scalable Viable alternative to proprietary tools No vendor lock-in – Open Source Less TCO Quicker Time To Market