Data Integration
Organizations face an increasing challenge to manage and extract
value from a growing variety and volume of data across their
edge-to-cloud infrastructure. With Pentaho Data Integration (PDI),
organizations can access data from complex and heterogeneous
sources and blend it with existing relational data to produce
high-quality, ready-to-analyze information — all without writing a
line of code. A rich graphical user interface paired with a powerful
multithreaded transformation engine offers high-performance ETL
(extract, transform and load) capabilities that cover all data integra-
tion needs, including big data ingestion and processing.
Pentaho Data Integration Features
Intuitive drag-and-drop interface to simplify the creation of analytic
data pipelines (see Figure 2).
Broad connectivity to virtually any data source, either on premises
or in the cloud, including flat files, relational database management
systems (RDBMS), APIs and more
.
Integration with transactional databases, including Oracle, IBM
®
DB2
®
, PostgreSQL, MySQL and others.
Access to data in enterprise applications, including SAP,
Salesforce.com, Google Analytics and more.
Rich library of prebuilt components to access, prepare, blend and
cleanse data.
Direct access to complete analytics, including charts, visualiza-
tions and reporting from any step of PDI.
Robust orchestration capabilities to coordinate complex work-
flows, including scheduling and alerts.
Integration of advanced analytic models from R, Python, Scala and
Weka that incorporate libraries, such as scikit-learn, Spark MLlib,
Tensorflow and Keras, into the data flow.
Enterprise-grade administration, scalability, load balancing, container-
ization* and security capabilities.
Big Data
The Pentaho platform enables companies to realize business value
from large volumes of diverse data by dramatically reducing the
time and complexity required to design, develop and deploy big
data analytics. Pentaho covers the entire big data life cycle, from
data extraction and preparation of diverse data, to scalable pro-
cessing on Spark and Hadoop, leading to end-to-end analytics
solutions.
Pentaho Is the Leading Solution for Big Data Integration and
Analytics
Visual design environment for blending multiple big data sources
(see Figure 3) and processing data at scale.
Integration with leading Hadoop distributions, object stores,
NoSQL stores and analytic databases, as well as log file data and
JSON/XML formats.
Code-free data transformation design that empowers 15 times
faster productivity versus hand coding and executes Spark or
Hadoop jobs in clusters for high performance.
Operationalize with Spark stream and batch job execution, SQL
on Spark connectivity, Kafka access and more.
Figure 2. Drag-and-Drop Data Transformation in Pentaho Data Integration
Figure 3. Variety of Big Data Sources Supported by Pentaho
“Using Pentaho, we are now helping clients blend a 360-
degree view of all equipment data sources to enable early
prediction of potential machinery failure.”
– Caterpillar Marine Asset Intelligence