Glitch8
 ( 10.62% )
- ago
I'm starting this topic as a jumping off point to discuss how many Hidden Layers and Hidden Neurons to use, in preparation for the release of NeuroLab. I'm documenting NeuroLab now, and realize there's no hard and fast rule regarding these questions. Let's use this topic to collect and share experience around Neural Network architecture.
1
1,969
27 Replies

Reply

Bookmark

Sort
- ago
#1
How do you define hidden layers and neurons?
0
- ago
#2
From an information theory point of view, the middle layer (hidden layer) shouldn't have more nodes than the number of inputs. Do do so would only be over fitting (and destabilizing) the model. I would think of it as one middle-layer node for each orthogonal input to maximize the "par" (i.e. R² fit) of the model. I would let this be the default setting. Knowledgeable users can manually override this at the risk of over fitting the model. Over fitting will significantly diminish the forecasting ability of the model with unseen data.

Now you can apply as much training data as you want. What that does--in effect--is increase the precision of the weights going to and from the middle layer, which is a good thing without over fitting the model if "proper model par" (i.e. #of middle-layer nodes) is maintained.
1
- ago
#3
QUOTE:
How do you define hidden layers and neurons?
Wikipedia has a crash description (about 15 pages) discussing NN design. The first 3-4 pages should cover your topology questions.
https://en.wikipedia.org/wiki/Artificial_neural_network
0
- ago
#4
QUOTE:
I would think of it as one middle-layer node for each orthogonal input to maximize the "par" (i.e. R² fit) of the model. I would let this be the default setting. Knowledgeable users can manually override this at the risk of over fitting the model.


I believe this is only generally true, HOWEVER I know that the code that Glitch is using has L1/L2 regularization (and may also have Dropout Regularization [Glitch please chime in here]), and may also permit noise injection into the hidden layers. Why do I mention all of this? If that is the case the heuristic approach that you suggest @superticker may not be valid. I do believe under the right conditions, with the correct NN algorithm, that the maximum number of neurons for the hidden layers is closer to "n-squared" inputs. This allows the coding of some "second-order" non-linear characteristics contained in the data. (non-linear cross-products)

The question of the number of hidden layers is more of a question of training speed. In the absence of GPU support, I doubt that anything more than 2-3 layers could be trained in a reasonable amount of time. Using a GPU I have experimented with training complex systems using a dozen layers in a reasonable time period.

Vince

1
- ago
#5
QUOTE:
maximum number of neurons for the hidden layers is closer to "n-squared" inputs
So you're saying if one has six orthogonal inputs, you have 36 degrees of freedom in the model? I totally disagree with that. Sorry. Such an over-fitted model would have poor forecasting for unseen data.

You can't pull significant terms from thin air with some kind of transform (or indicator). It doesn't work that way. The raw orthogonal data defines the number of significant terms. You can reduce noise, and that will improve the precision of the fit (particular solution), but that won't change the degrees of freedom of the fit (form of the general solution).

----
But the point we are all making is that the choice of the number of middle-layer (hidden) nodes should be user adjustable. I think we all agree with this (for one reason or another). :-)
0
- ago
#6
If we were dealing with linear systems I would be in total agreement with you superticker. But orthogonality is a linear construct that I do not believe belongs in the discussion of modeling of high-dimensionality nonlinear systems.

In early discussions of NNs (30-40 years ago), Principal Component Analysis (PCA), a linear process, was touted as a great way to "reduce" the dimensionality of the input data being used to train an NN. It has only been in the last 20 years that people have discovered the concept of "non-linear correlation", sometimes called "local correlation", (which has really been an outgrowth of modern ML models which have shown that they can extract more info from the data than previously thought). As a side note, this is why "trees", particularly Leo Breiman's Random Forests, are so effective at modeling high-dimensional non-linear systems, as almost all of the Kaggle competitions have demonstrated.

I realize that in the current absence of rigorous models underlying the foundations of ML that we must use a number of heuristics to build models effectively, but I think that we risk taking too many of our notional experiences of the modeling of linear systems into the process.

Vince
0
- ago
#7
QUOTE:
But the point we are making is that the choice of the number of middle-layer (hidden) nodes should be user adjustable. I think we all agree with this (for one reason or another). :-)


TOTAL AGREEMENT! :)

Vince
0
- ago
#8
Comment:
QUOTE:
Degrees of Freedom
- This again is a carryover from linear modeling which I feel is not particularly useful in most ML modeling systems that look to tease out and exploit "hidden" nonlinearities.

Vince
0
- ago
#9
QUOTE:
I realize that in the current absence of rigorous models underlying the foundations of ML that we must use a number of heuristics to build models effectively,...
I follow your point. Perhaps I'm not willing to bet my money on subtle data behavior or computational heuristics. I can't estimate/control my risk when heuristics are involved.

And how to apply heuristics (or risk for that matter) to investing is a religious issue with the individual investor. More reason why the number of middle-layer nodes should be adjustable.
1
- ago
#10
Good discussion superticker!

Glitch, are you sorry now that you asked?? ;)

Vince
0
- ago
#11
In my strategy I use 8 indicators and each with a weighting of one and at each bar I take a poll of the pluses and minuses, add them up send a buy signal when the plus poll is greater than the minus poll. Is this something that would used here or don't I know what I'm talking about and this has nothing to do with hidden layers and nodes. From the discussion it would seem that I have 64 layers here.
0
Glitch8
 ( 10.62% )
- ago
#12
Vince, no, this is exactly the kind of discussion I was hoping to ignite!
Pesto, when I said layers and neurons I was talking more specifically about those things in relation to neural network architecture.
0
- ago
#13
QUOTE:
Vince, no, this is exactly the kind of discussion I was hoping to ignite!


Well then, you got what you wanted!! :)

Vince
0
- ago
#14
Isn't layers count and neurons per layer count an option that is up to user?

As usual, you can experiment with your architecture - changing layers types, neurons count, and activation function having some affect, but the best effect you will get with downloading state-of-the-art model)), like transformers or something.

So having built in NN templates (architechtures) for some kind of SotA would be good.
1
- ago
#15
(It took me a bit of time to remember where I had read this...)
QUOTE:
Empirically, greater depth does seem to result in better generalization for a wide variety of tasks. […] This suggests that using deep architectures does indeed express a useful prior over the space of functions the model learns.

Source: Deep Learning by Ian Goodfellow, Yoshua Bengio, Aaron Courville, page 201

Glitch, this further makes a case for adding GPU support to WL7 eventually.

Vince
2
- ago
#16
Is anyone still using fully connected layers for time series and stock data analysis or anything else for that matter? I would think using convnets on graphs, transformer networks or RNNs for time series data, or other current architectures would be more up to date.
0
Glitch8
 ( 10.62% )
- ago
#17
MustPlayOptions, would love to see some of your ideas developed perhaps as a third party extension? Hopefully NeuroLab will provide some value despite your complete dismissal. 🤷🏼‍♂️
0
- ago
#18
Modern NN architectures use many techniques to reduce the connection weights (to zero, in the case of Dropout Regularization) to help improve generalization performance, so "fully-connected layers" are only of historical interest.

It really depends what you are trying to do with your model. If it is a prediction, then architectures such as RNNs, are most appropriate. If you are looking to do "stock analysis", such as ranking (regression), then that is much closer to classification. NN and trees seem to work best in those cases.

The problem defines the approach.

Vince
0
Glitch8
 ( 10.62% )
- ago
#19
NeuroLab will try to predict an output time series based on fully connected layers. Maybe it’s out of fashion but I’ve seen encouraging results so far. Rather than guessing how indicators may or may not be predictive we can let NeuroLab figure it out. Plus the interface is solid and can serve as a good base to push forward with other techniques post release.
1
- ago
#20
Glitch,

While the NN will start fully connected, during the Training process Regularization will attempt to drive all of the weights towards "0", and only those weights that are continually reinforced will survive. I suspect that very few well-trained nets will be anything close to fully connected.

Vince
1
Glitch8
 ( 10.62% )
- ago
#21
Makes sense, thanks Vince!
0
- ago
#22
LOL Glitch. I'm not dismissing, just saying there are much stronger things out there. And I wish I could program that stuff in C# but it's all already available in Python and I'm having a hard enough time getting things to work the way they did in WL6.

With or without dropout, the point about a FCN is not how sparse it is but how much potential it has at generalizable pattern recognition. They pale in comparison to transformer networks, reinforcement learning networks, etc.
0
- ago
#23
MPO,
QUOTE:
there are much stronger things out there

Agreed, but not in retail-market trading software. In the professional market much of what you describe is commonplace, but for most people the cost of that software is out of reach. Integrating better ML software into WL7 is probably not trivial, especially in a multiprocessing environment. (I am not a good C# programmer, so this is based on discussions with more knowledgeable folks). I hope that the suite of ML software increases with time, but Glitch has a LOT on his plate at the moment.

Vince
0
- ago
#24
I don't disagree and wasn't asking for that support at this time per se. There are a lot of other things I'd rather have first.

If I knew more about C# programming and dll's, I would know the answer to this question:

Is it possible to create a python library that can be accessed from C# in Wealth lab? If so then you don't really need specific ML support in WL.
0
- ago
#25
If you want to see see more ML capabilities in WL7, there is an item on the WishList, "Additional Machine Learning Algorithms and GPU support" that anyone can vote to improve its status.

Vince
1
- ago
#26
As someone trying out the NeuroLab extension for the first time, and new to Neural Networks, it's difficult to tackle the configuration of the hidden layers. I appreciate superticker's suggestion to keep the number of hidden layers from exceeding the number of inputs, as it at least provides a starting point. Is there any similar rule-of-thumb when choosing the number of hidden neurons?
0
mjj38
- ago
#27
Perhaps, it would be worthwhile to have python to .net functionality so Dion doesn't have to recreate the wheel for everything. Have any of you looked into using IronPython?

https://ironpython.net/
0

Reply

Bookmark

Sort