I have a cluster of linux machines running HDP 3.1.1. I see in all the HDFS documentation that only Hadoop versions 1.x and 2.x are supported. Will I have compatability issues due to using a 3.x version of Hadoop? When I try to read HDFS files from the cluster from a edge Windows machine, I continually get an error when using datastore (see below).
Error using datastore (line 120)
Cannot determine the Hadoop version. Verify that HADOOP_PREFIX, HADOOP_HOME, or the MATLAB_HADOOP_INSTALL environment variable is set to the root of your Hadoop installation folder.
The Windows machine I am accessing the cluster from does not have Hadoop installed and I point the required environment variable(s) to root of the Hadoop installation on the namenode of the cluster. I have spent considerable time attempting all fixes I could find assuming that Hadoop 3.1.1 is compatable with Matlab, but am suspecting that may be the root of the problem.

Onomitra Ghosh
Onomitra Ghosh on 1 Oct 2019
Brian, can you provide some more details on the problem you are running into when running the mapreduce command. A code example of how you are setting the environment variable, creating datastore and running mapreduce will be good. Also, is the error message exactly the same as before? Is this the complete error stack?
Onomitra Ghosh
Onomitra Ghosh on 2 Oct 2019
Brian, thanks for finding this. You have uncovered an issue in the way we seek for Hadoop version. We will work on finding a better solution for this. Until then, please continue to have the HDFS installation on the worker nodes as a workaround. We will reach out to you if we need more information.

Brian Coghlan
Brian Coghlan on 9 Aug 2019
Edited: Brian Coghlan on 9 Aug 2019
I solved the issue by using a Linux based edge node for the cluster and installing an HDFS client on it for access to the cluster. At this point, setting the environment variables per the MATLAB documentation, then using datastore() to load in HDFS data works perfectly the first time.
MATLAB R2019a is compatable with Hortonworks HDP 3.1.1 for loading data, however it needs its own local installation of an HDFS client for cluster access and proper setting of MATLAB Hadoop environment variables and that HDFS client can only be installed on a linux based machine (to the best of my knowledge).
I am still running into issues with executing any mapreduce functions on the data where I get a similar error where MATLAB says it cannot find Hadoop, then asks for the Hadoop environment variables to be set. If anyone has any ideas about this, please let me know. I'd be happy to start a new thread, but it still seemed relevent here.


