I have some suggestions to make it even more useful.

**Better Average Line**

Often I see nonlinear dependencies like in this image:

The blue line (constant overall average) doesn't help much, nor does the red line (overall linear regression)

Instead I suggest some "moving average" which connects the averages of the bins (average of all dots for Trading Hour 7, then average of all dots for trading hour 8, and so on)

This averaging should take care to use at least 100 dots for each average, probably by summing up several bins.

**Insample / Out-of-Sample**

I'd love to see the results for two distinct backtest intervals: Insample and Out-of-Sample. Probably with different colors:

red/green for Insample,

orange/blue for Out-of-sample.

This makes the interpretation of the results much more useful and robust.

This requires settings for these intervals, which could be borrowed form the IS/OS Scorecard:

QUOTE:

Insample / Out-of-Sample

Either I don't understand your idea or it doesn't make great sense in this context. You're suggesting that this particular visualizer should start behaving like a WFO optimization, in fact performing

*two*backtest runs: for IS and OOS. But IS / OOS is not something inherent or specific to the Position Metrics. Why this and not the others?

QUOTE:

Why this and not the others?

In fact this is the start of quite some more suggestions:

Use IS/OS metrics throughout WL8 - everywhere!

It is not necessary to run two separate backtests.

Just run a single backtest.

Then calculate and collect all results and metrics for the two parts of this backtest interval. First part: Insample, second part: Out-of-sample.

This is how the IS/OS scorecard works. It is quite helpful to

**avoid over-optimization**during manual improvements.

QUOTE:

... calculate ... results and metrics for ... two parts of this backtest .... First part: In-sample, second part: Out-of-sample. It is ... helpful to avoid over-optimization ...

This is about an optimization end test.

What he's really asking is to "contrast" the In-sample case against the Out-of-sample case to see how they are different. By "contrast" I mean take the difference of their performance slopes. (I do

__not__mean "contrast" in the statistical sense where we are taking the ratio of variances and performing an F-test for significance. Sorry about that confusion.)

I suppose a statistical contrast could also be done. That would please research types for publication, but "most" WL users aren't researchers.

---

I think the end test for the optimizer is something the optimizer should do. But I also think performing a "formal" F-test as part of an optimizer end test is over kill. I've only seen that done in stepwise linear regression algorithms to

__steer__the model permutation stepping automatically.

Your Answer
Post

Edit Post

Login is required