Main Content

Analyze Large Data in Database Using Tall Arrays

This example determines the minimum arrival delay of a large set of flight data that is stored in a database. You can access large data sets and create a tall array using a DatabaseDatastore object with Database Toolbox™. Once a tall array exists, you can visualize data in the tall array. Alternatively, you can write a MapReduce algorithm that defines the chunking and reduction of the data.

The DatabaseDatastore object does not support using a parallel pool with Parallel Computing Toolbox™ installed. To analyze data using tall arrays or run MapReduce algorithms, set the global execution environment to be the local MATLAB® session.

This example uses a preconfigured JDBC data source to create the database connection. For more information, see the configureJDBCDataSource function.

Create DatabaseDatastore Object

Set the global execution environment to be the local MATLAB® session.

mapreducer(0);

The file airlinesmall.csv contains the large set of flight data. Load this file into the Microsoft® SQL Server® database table airlinesmall. This table contains 123,523 records.

Create a database connection to the JDBC data source MSSQLServerJDBCAuth. This data source configures a JDBC driver to a Microsoft® SQL Server® database with Windows® authentication. Specify a blank user name and password.

datasource = "MSSQLServerJDBCAuth";
username = "";
password = "";
conn = database(datasource,username,password);

Create a DatabaseDatastore object using the database connection and an SQL query. This SQL query retrieves arrival-delay data from the airlinesmall table. databaseDatastore executes the SQL query.

sqlquery = 'select ArrDelay from airlinesmall';

dbds = databaseDatastore(conn,sqlquery,'ReadSize',50000);

Find Minimum Arrival Delay Using Tall Array

Because the DatabaseDatastore object returns a table, create a tall table.

tt = tall(dbds);

Find the minimum arrival delay.

minArrDelay = min(tt.ArrDelay);

minArrDelay contains the unevaluated minimum arrival delay. To return the output value, use gather. For details, see Lazy Evaluation of Tall Arrays.

minArrDelayValue = gather(minArrDelay)
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 1: Completed in 1.6 sec
Evaluation completed in 1.9 sec

minArrDelayValue =

   -64

In addition to determining a minimum, tall arrays support many other functions. For details, see Supporting Functions.

Close DatabaseDatastore Object and Database Connection

close(dbds)

See Also

| | | | | |

Related Examples

More About

External Websites