Start to work with HANA Predictive Analysis Library (PAL)

The HANA Predictive Analysis Library (PAL) is a set of predictive algorithms in the HANA Application Function Library (AFL).

Predictive Analysis Library (PAL) contains a number of universal predictive algorithms that can be executed directly against the data in HANA, includes classic and universal predictive analysis algorithms in data-mining categories: clustering, classification, regression, association, time series, preprocessing, statistics, Social Network analysis.

When the PAL is installed, you can use algorithms using SQL Script or you can create a Predictive Analytical Model via Flowgraph Model tool.

Univariate Statistics

In this tutorial Univariate Statistics function is used. This function calculates several basic univariate statistics including mean, median, variance, standard deviation, skewness and kurtosis. The function treats each column as one dataset and calculates the statistics respectively. You can find details in SAP HANA Predictive Analysis Library (PAL) documentation.

Step-by-step guide

  1. In Eclipse / HANA Studio, open SAP HANA Development perspective.

If you don’t have it, click Open Perspective:

And open SAP HANA Development.

2. Add workspace to your repository. When repository tab is opened, click Create Repository Workspace (you need to do it only once).

Specify name and click Finish.

3. Create new Flowgraph Model:
a. Select package where you would like to create your graph. Right click on selected package and select new.

b. Select Flowgraph Model (under SAP HANA – Database Development)

c. Determine Name for you flowgraph and click Finish. You should see empty flowgraph with object ready to be used on the right side.

4. Add to your flowgraph data for which you want to calculate basic univariate statistics.

a. Open Systems – Catalog.

b. Drag and Drop your table to flowgraph window.

Note: for table used in the example, the structure is very simple but obviously source can have more columns (integer or double data type).

c. When dragging table to the flowgraph, choose data source when prompt appears

Note: columns in input table can be integer or double. If you use another data types, you won’t be able to activate your flowgraph. More details can be found in univariate statistics documentation.

5. Add Univariate Statistics to flowgraph:

a. Locate Univariate Statistics in Predictive Analysis Library – Statistics

b. Drag and Drop it to flowgraph.

Note: Univariate Statistics block has two input structures – one for Data, one for Parameters. It has just one output (Result) – table with calculated parameters.

c. Connect data source with data input in Univariate Statistics

6. Add table to store results:

a. Drag and drop Data Sink (Template Table) into flowgraph. Data Sink (Template Table) will create table during AFM activation

c. Connect it with Univariate Statistics

Go to Data Sink properties

d. Specify Authoring Schema and Catalog Object for result table. This table will be created when flowgraph is activated. Data will be stored here after each run of flowgraph.

7. Specify Authoring Schema for the whole flowgraph

a. Right click on empty space on you graph

b. Specify target schema

8. Run flowgraph

a. Save and activate the diagram

b. Check results in result table (data preview), defined in Data Sink object (using data preview).

In the results you can see basic univariate statistics calculated for your data (in the example, it is X1 and X2).

Note: when you activate your flowgraph, HANA stored procedure is created in schema specified in point 7. You can open procedure and check SQL code inside.

More details

To find more about Univariate statistics and other SAP HANA Predictive Analysis Library (PAL), check SAP documentation here.

Leave a Reply

Your email address will not be published. Required fields are marked *