Some questions about House Pricing problem

4 views (last 30 days)
I am studying Matlab ANN toolbox. The Matlab ANN tutorial example (House Pricing problem) is the one I am looking at now. This is the link: http://www.mathworks.co.uk/help/nnet/gs/fit-data-with-a-neural-network.html#f9-33554 I want to figure out what will happen if I add some more rand numbers as inputs:
load house_dataset
inputs = houseInputs;
targets = houseTargets;
inputs(14:20,1:506)=rand(7,506);% add extra 7 rand numbers as inputs
The rest part code will be keep the same.
I have following questions, please help me.
1: In theory, 3 layer ANN can fit all the functions. So I keep did not change the number of layers in the example. However, if I add extra inputs (which might not useful for training the network), will the network ignore them (like give 0 weights for those useless inputs after training)?
2: In the House Pricing problem, it has 13 inputs, is it possible to tell which input give most contribution? For example, maybe one input is changed or a pair of inputs are changed, then the performance of the network is affected a lot?
3: How to decide the network size as inputs increase?
The reason I ask these questions because I meet a similar fitting problem like House Pricing problem. I have 100 inputs, but I do not know which input is useful. Maybe some inputs they have inner relationship, so if you add them all in network will improve the performance rather than add only one of them. Again, when I try to compare, how to determine size of network to make a fair comparison (network with different number of inputs).So is there a way to measure which input or a combination of inputs can improve the performance of the network?
Thanks in advance.

Accepted Answer

Greg Heath
Greg Heath on 4 Jun 2014
Edited: Greg Heath on 4 Jun 2014
1. Standardize all inputs and outputs to zero-mean/unit-variance
2. Remove or modify input and output outliers so that (help minmax)
inputlb <= input <= inputub
outputlb <= output <= outputub
3. Use STEPWISEFIT in the backward search mode to obtain the significant coefficients of a linear model. You should find that variables 3 and 7 are not significant and the Rsquare and adjusted Rsquare account for ~73% of the target variance
3a. (Optional) For comparison, repeat with the default forward search mode.
4. If you use CORCOEFF to look at the two-way correlations of the output with the 13 input variables you will see, from the P values, that all of the coefficients are significant. If the insignificance of variables 3 and 7 is caused by higher order correlations, it is not evident from looking at the full two-way correlations matrix.
5. For the FITNET neural model, find the smallest size of the hidden layer that will yield an acceptable degree-of-freedom adjusted Rsquare (e.g., R2a > 0.99). To account for poor random initial weights design Ntrials = 10 candidates for each size and choose the best of the 10.
6. Rank the variables by the MSE that occurs when only that variable is replaced by zeros. Also find the MSE when the net is retrained starting from the revised configuration.
6a.(Optional) Repeat with randn replacing the zeros.
7. Attempting a sequential backward (or forward) search similar to what is done in stepwisefit for the Linear Model comes to mind. You might be able to use SEQUENTIALFS. However, I am not familiar with it.
Hope this helps.
Thank you for formally accepting my answer
Greg

More Answers (0)

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!