HDP 3.1.1 and MATLAB R2019a Compatability

1 view (last 30 days)
I have a cluster of linux machines running HDP 3.1.1. I see in all the HDFS documentation that only Hadoop versions 1.x and 2.x are supported. Will I have compatability issues due to using a 3.x version of Hadoop? When I try to read HDFS files from the cluster from a edge Windows machine, I continually get an error when using datastore (see below).
Error using datastore (line 120)
Cannot determine the Hadoop version. Verify that HADOOP_PREFIX, HADOOP_HOME, or the MATLAB_HADOOP_INSTALL environment variable is set to the root of your Hadoop installation folder.
The Windows machine I am accessing the cluster from does not have Hadoop installed and I point the required environment variable(s) to root of the Hadoop installation on the namenode of the cluster. I have spent considerable time attempting all fixes I could find assuming that Hadoop 3.1.1 is compatable with Matlab, but am suspecting that may be the root of the problem.
-Brian

Accepted Answer

Onomitra Ghosh
Onomitra Ghosh on 1 Oct 2019
Brian, can you provide some more details on the problem you are running into when running the mapreduce command. A code example of how you are setting the environment variable, creating datastore and running mapreduce will be good. Also, is the error message exactly the same as before? Is this the complete error stack?
  2 Comments
Brian Coghlan
Brian Coghlan on 1 Oct 2019
Onomitra Ghosh,
I was setting the environment as instructed here, and while I'm unsure if what I posted constitutes the entire error stack, it was the complete error output from MATLAB. As for the code, here is simplest sample which produced the error:
%The data.csv file is a small, well formatted CSV utilized for testing purposes to
%ensure its reliability
dd = datastore('hdfs://NamenodeServer.local:8020/path/to/data.csv');
dd.SelectedVariableNames = 'columnOne';
tt = tall(dd)
s = Size(tt)
gather(s) %error occurs at this point a few percent in
I ended up downgrading the entire cluster to Hadoop 2.7. However, I ran into the same problem once I had done this, then realized that when using Hortonworks Dataplatform and Ambari, each machine on the cluster must also have an HDFS client installed, not simply the base install of Hadoop. While I am unsure if this would have fixed the issue when using Hadoop 3.x, it was the thing that fixed it when using Hadoop 2.x. As per the Mathworks documentation, using Hortonworks meant that none of the environment variables needed to be set.
I'll be happy to answer any other questions you have, though my cluster is not at a point where, thankfully, I cannot recreate the error.
Onomitra Ghosh
Onomitra Ghosh on 2 Oct 2019
Brian, thanks for finding this. You have uncovered an issue in the way we seek for Hadoop version. We will work on finding a better solution for this. Until then, please continue to have the HDFS installation on the worker nodes as a workaround. We will reach out to you if we need more information.

Sign in to comment.

More Answers (1)

Brian Coghlan
Brian Coghlan on 9 Aug 2019
Edited: Brian Coghlan on 9 Aug 2019
I solved the issue by using a Linux based edge node for the cluster and installing an HDFS client on it for access to the cluster. At this point, setting the environment variables per the MATLAB documentation, then using datastore() to load in HDFS data works perfectly the first time.
MATLAB R2019a is compatable with Hortonworks HDP 3.1.1 for loading data, however it needs its own local installation of an HDFS client for cluster access and proper setting of MATLAB Hadoop environment variables and that HDFS client can only be installed on a linux based machine (to the best of my knowledge).
I am still running into issues with executing any mapreduce functions on the data where I get a similar error where MATLAB says it cannot find Hadoop, then asks for the Hadoop environment variables to be set. If anyone has any ideas about this, please let me know. I'd be happy to start a new thread, but it still seemed relevent here.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!