- ago
I could be wrong (known to happen alot), but it seems like with Build 91 backtesting may have slowed down. Today, I was running in offline mode with an intraday strategy, with one minute scale, that had a timeframe of approximately 7 days of data. I have the perception that the backtest was slow given my prior backtesting of the strategy.

So, I used JetBrains dotTrace profiler to look for any bottlenecks. I assumed I had done something bad in my code. But, after profiling a backtest, I found there may be an issue with some WL method named WealthLab.Backtest.Backtester.FlushIssuer. Its apparently chewing up a lot of time in a call to IndexOf.

Here's the call tree produced by the profiler. Notice out of the 12.5 seconds for the thread, 9.8 seconds was chewed up by FlushIssuer and 9.1 seconds of that was a call to IndexOf. 71.9% of the thread's time was spent in that IndexOf call...

CODE:
100% .NET TP Worker • 12,596 ms 99.8% WorkerThreadStart • 12,577 ms • System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() 83.6% OnDoWork • 10,534 ms • WealthLab.WPF.AbortableBackgroundWorker.OnDoWork(DoWorkEventArgs) 83.6% ViewCollection • 10,534 ms • WealthLab8.cwStrategy.ViewCollection(Object, DoWorkEventArgs) 83.6% CustomizeCollection • 10,534 ms • WealthLab8.cwStrategy.CustomizeCollection(StrategyExecutionMode) 81.2% RunBenchmarkBacktest • 10,228 ms • WealthLab.Backtest.Backtester.RunBenchmarkBacktest(BarHistory) 81.2% GetBenchmarkBacktester • 10,228 ms • WealthLab.Backtest.Backtester.GetBenchmarkBacktester(BarHistory, TimeSeries, Int32, BacktestSettings, PositionSize, StrategyExecutionMode, Strategy, String) 81.2% RunBacktest • 10,228 ms • WealthLab.Backtest.Backtester.RunBacktest(StrategyBase, List) 81.2% RunBacktestInternal • 10,228 ms • WealthLab.Backtest.UserStrategyExecutor.RunBacktestInternal(List) 78.2% LogoutTag • 9,853 ms • WealthLab.Backtest.UserStrategyExecutor.LogoutTag(List, DateTime) 78.2% Execute • 9,853 ms • WealthLab.Backtest.BuyAndHoldStrategy.Execute(BarHistory, Int32) 78.1% HasOpenPosition • 9,832 ms • WealthLab.Backtest.StrategyBase.HasOpenPosition(BarHistory, PositionType) 78.1% FindPosition • 9,832 ms • WealthLab.Backtest.Backtester.FindPosition(BarHistory, PositionType, Int32) 78.1% AssetIssuer • 9,832 ms • WealthLab.Backtest.Backtester.AssetIssuer(BarHistory, PositionType, Int32) 78.1% FlushIssuer • 9,832 ms • WealthLab.Backtest.Backtester.FlushIssuer(BarHistory, PositionType, Int32) 71.9% IndexOf • 9,058 ms • System.Array.IndexOf(T[], T, Int32, Int32) ► 3.64% Sort • 458 ms • System.Collections.Generic.List`1.Sort(Comparison) 0.36% StelemRef • 45 ms • System.Runtime.CompilerServices.CastHelpers.StelemRef(Array, IntPtr, Object) 0.15% AddWithResize • 19 ms • System.Collections.Generic.List`1.AddWithResize(T) 0.15% get_PositionType • 19 ms • WealthLab.Backtest.Position.get_PositionType() 0.10% PrintStatus • 13 ms • WealthLab.Backtest.Position.PrintStatus() 0.06% get_Symbol • 6.9 ms • WealthLab.Core.BarHistory.get_Symbol() 0.05% get_OpenPositions • 6.0 ms • WealthLab.Backtest.Backtester.get_OpenPositions() ► 0.16% get_CurrentBuyingPower • 21 ms • WealthLab.Backtest.StrategyBase.get_CurrentBuyingPower() ► 1.40% ProcessMarketClose • 176 ms • WealthLab.Backtest.StrategyBase.ProcessMarketClose(DateTime) ► 1.21% GetTag • 153 ms • WealthLab.Backtest.UserStrategyExecutor.GetTag(List) ► 0.32% ProcessMarketOpen • 40 ms • WealthLab.Backtest.StrategyBase.ProcessMarketOpen(DateTime, List) 0.06% Next • 7.0 ms • WealthLab.Core.Synchronizer.Next() ► 2.43% RunBacktest • 306 ms • WealthLab.Backtest.Backtester.RunBacktest(StrategyBase, List) ► 0.81% OrderAdapter • 103 ms • WealthLab8.MainWindow.OrderAdapter() 0.15% [Unsafe stack walking] • 19 ms #stacktrace


Maybe that's just the way it goes. I don't know. But, I thought the WL team should know in case there is a problem.
0
622
17 Replies

Reply

Bookmark

Sort
- ago
#1
CODE:
System.Array.IndexOf(T[], T, Int32, Int32)
Well, unless the entire System.Array can fit into processor cache memory (Can it?), doing a linear search is going to be slow. How many positions are in your backtest, which you are trying to search through? If you reduce that number of positions so it all fits into processor cache, you should get a 10 times speed up. Have you tried that?

I'm gathering you're suggesting to use a different datatype (not System.Array) to avoid the linear search. We know a B-tree would be the fastest lookup (but it takes extra time and memory to build that index). Did you have an alternative datatype in mind? .NET 8 does document the complexity (in Oh notation) of each .NET datatype for searching.

I'm a hardware guy, so my solution would be to buy another processor chip that maximizes on-chip cache memory while minimizing the number of processor cores.
0
- ago
#2
That's not my code. That's Wealth-Lab code. Look at the stack.
0
- ago
#3
QUOTE:
That's not my code.

I appreciate that. But you're saying you want something faster than a System.Array.IndexOf linear search. Right? Or am I reading this wrong?
0
- ago
#4
Thanks for the heads-up Paul. I haven't noticed backtesting slowdown with production B91 so we have to investigate into your findings yet. However, I believe that it may have to do with obfuscator.
0
- ago
#5
@Eugene - I understand.

I did some more testing today and found some very odd behavior with backtesting and performance. So, this is kind of a lengthy spiel, so bear with me here.

I created a small test strategy (below) and ran tests with profiling, but found no new information.

So, after a while, I experimented with dates. I found that minor changes to dates had a noticeable effect on performance. And, the performance issue seems to be occurring after BacktestComplete.

So, to test this you use the strategy below (its a simple RSI strategy)...
Dataset consists of symbols CLSK and MARA (wild intraday stocks, btw.)
Scale: one minute
Date range: more on this below
Filter Pre/Post Market Data not checked
Benchmark symbol: CLSK
Retain NSF positions checked
Use Granular Limit/Stop Processing not checked
Starting capital: $100,000
Position Size: Percent of equity
Percent: 10
Basis Price: Market Close this Bar
Margin Factor: 2.0
Max Open Pos, etc. all zero.

Offline mode is checked (under the File menu) because I am using TD Ameritrade as my intraday data source, and of course that is no longer available. However, I do have one-minute data for symbols CLSK and MARA up through 2024/05/09.

Slow date ranges (for me at least) in mm/dd/yyyy format...
04/24/2024 to 05/03/2024
04/24/2024 to 05/08/2024
05/01/2024 to 05/08/2024

Fast date ranges:
04/23/2024 to 05/03/2024 (only one day different than a date range above in the slow list)
04/22/2024 to 05/03/2024
04/23/2024 to 05/08/2024
04/25/2024 to 05/03/2024

When it runs fast you can see the backtest complete and the profit show up real fast (sub-second). For the slow date ranges you see a pause occur for several seconds before seeing the backtest complete and profit show up. Notice how the differences between the slow and fast date ranges are minor.

So, to re-iterate, the really strange behavior is that 04/23/2024 to 05/03/2024 is fast, but moving the start date to one day later (04/24/2024) it has the pause.

The debug log shows that between BacktestBegin and BacktestComplete only about 9 to 13 milliseconds occur (on my machine - AMD 5950x).

Anyhow, it seems something strange is going on. Hope this helps, if there is an internal problem with WL.

CODE:
using System.Diagnostics; using WealthLab.Backtest; using WealthLab.Core; using WealthLab.Indicators; namespace WealthLabStrategies.MovingAverage { public class MyStrategy : UserStrategyBase { //declare private variables below private RSI _rsi; private static readonly Stopwatch Sw = new(); //create indicators and other objects here, this is executed prior to the main trading loop public override void Initialize(BarHistory bars) { _rsi = RSI.Series(bars.Close, 4); } public override void BacktestBegin() { Sw.Restart(); base.BacktestBegin(); } public override void BacktestComplete() { Sw.Stop(); WriteToDebugLog($"Backtest took {Sw.ElapsedMilliseconds} milliseconds"); } //execute the strategy rules here, this is executed once for each bar in the backtest history public override void Execute(BarHistory bars, int idx) { var position = FindOpenPosition(-1); if (position == null) { if (_rsi.CrossesOver(30, idx)) { PlaceTrade(bars, TransactionType.Buy, OrderType.Market, 0, "Buy"); } } else { if (_rsi.CrossesUnder(70, idx)) { ClosePosition(position, OrderType.Market, 0, "Sell"); } } } } }
0
- ago
#6
QUOTE:
So, to re-iterate, the really strange behavior is that 04/23/2024 to 05/03/2024 is fast, but moving the start date to one day later (04/24/2024) it has the pause.

This is unlikely a software problem.
QUOTE:
... it may have to do with obfuscator.

Well, I can't speak for that possibility. You got me there.

For the linear search into the System.Array to be fast, it needs to fit into processor on-chip cache. Once it no longer fits, there's going to be a 10 times slow down. Anything that reduces that System.Array size (like less trades or a smaller date range) will speed things up dramatically. That's to be expected.
0
Cone8
 ( 4.98% )
- ago
#7
QUOTE:
after BacktestComplete.
I'm not following this too closely yet, but this would make me look to the Performance Visualizers in use. If disabling all the Visualizers makes the issue go away, you could isolate the one with the critical path.
0
- ago
#8
Hello, Cone. I turned off all of the Performance Visualizers and ran some tests. Its still slow for certain dates. Whether a bevy of visualizers are on, or all of them off, the results are the same. Using the settings shown in Post #5:

Data ranges:
2024-04-23 to 2024-05-03: fast
2024-04-24 to 2024-05-03: slow (odd)
2024-04-25 to 2024-05-03: fast

I realize you guys probably have many other priorities, but if you'd like me to try anything, please let me know.
1
- ago
#9
QUOTE:
Whether a bevy of visualizers are on, or all of them off, the results are the same.

Is the code still performing a linear search with System.Array.IndexOf with all the Performance Visualizers turned off?

If you reduce the Positions.Count of the linear search, does the problem go away?
0
- ago
#10
I simplified the test strategy a little bit, so that it wouldn't typically run from one day into the next (its a one-minute strategy). It is way below. Anyhow, with the settings given in Post #5 and all visualizers off we get:

2024-04-23 to 2024-05-03 - 552 positions - fast
2024-04-24 to 2024-05-03 - 495 positions - slow
2024-04-25 to 2024-05-03 - 428 positions - fast

For 2024-04-24 to 2024-05-03 (the slow case) the same profiler generated stack appears as in the original post. IndexOf is used on something only Cone, Eugene and Glitch know. :)

CODE:
100% .NET TP Worker • <strong>91,199</strong> ms 100% WorkerThreadStart • 91,199 ms • System.Threading.PortableThreadPool+WorkerThread.WorkerThreadStart() 96.2% OnDoWork • 87,763 ms • 2 calls • WealthLab.WPF.AbortableBackgroundWorker.OnDoWork(DoWorkEventArgs) 96.2% OnDoWork • 87,763 ms • 2 calls • System.ComponentModel.BackgroundWorker.OnDoWork(DoWorkEventArgs) 95.9% ViewCollection • 87,431 ms • 1 call • WealthLab8.cwStrategy.ViewCollection(Object, DoWorkEventArgs) 95.9% CustomizeCollection • 87,431 ms • 1 call • WealthLab8.cwStrategy.CustomizeCollection(StrategyExecutionMode) 94.6% RunBenchmarkBacktest • 86,235 ms • 1 call • WealthLab.Backtest.Backtester.RunBenchmarkBacktest(BarHistory) 94.6% GetBenchmarkBacktester • 86,235 ms • 1 call • WealthLab.Backtest.Backtester.GetBenchmarkBacktester(BarHistory, TimeSeries, Int32, BacktestSettings, PositionSize, StrategyExecutionMode, Strategy, String) 94.5% RunBacktest • 86,194 ms • 1 call • WealthLab.Backtest.Backtester.RunBacktest(StrategyBase, List) 94.5% RunBacktestInternal • 86,161 ms • 1 call • WealthLab.Backtest.UserStrategyExecutor.RunBacktestInternal(List) 87.1% LogoutTag • 79,448 ms • 6 313 calls • WealthLab.Backtest.UserStrategyExecutor.LogoutTag(List, DateTime) 87.1% Execute • 79,412 ms • 6 313 calls • WealthLab.Backtest.BuyAndHoldStrategy.Execute(BarHistory, Int32) 86.8% HasOpenPosition • 79,185 ms • 6 313 calls • WealthLab.Backtest.StrategyBase.HasOpenPosition(BarHistory, PositionType) 86.8% FindPosition • 79,185 ms • 6 313 calls • WealthLab.Backtest.Backtester.FindPosition(BarHistory, PositionType, Int32) 86.8% AssetIssuer • 79,184 ms • 6 313 calls • WealthLab.Backtest.Backtester.AssetIssuer(BarHistory, PositionType, Int32) 86.8% FlushIssuer • 79,184 ms • 6 313 calls • WealthLab.Backtest.Backtester.FlushIssuer(BarHistory, PositionType, Int32) <strong> 75.5% IndexOf • 68,856 ms • 3 923 793 calls • System.Array.IndexOf(T[], T, Int32, Int32)</strong> ► 9.72% Sort • 8,866 ms • 6 313 calls • System.Collections.Generic.List`1.Sort(Comparison)


Oddly, for the fast case 2024-04-23 to 2024-05-03, which has more positions than the slow case, the profiler generated stack is noticeably different in terms of what is chewing up the most time:

CODE:
100% .NET TP Worker • <strong>6,889</strong> ms 100% StartCallback • 6,889 ms • System.Threading.Thread.StartCallback() 42.7% OnDoWork • 2,942 ms • 1 call • WealthLab.WPF.AbortableBackgroundWorker.OnDoWork(DoWorkEventArgs) 42.7% OnDoWork • 2,942 ms • 1 call • System.ComponentModel.BackgroundWorker.OnDoWork(DoWorkEventArgs) 42.7% ViewCollection • 2,942 ms • 1 call • WealthLab8.cwStrategy.ViewCollection(Object, DoWorkEventArgs) 42.7% CustomizeCollection • 2,942 ms • 1 call • WealthLab8.cwStrategy.CustomizeCollection(StrategyExecutionMode) 25.2% RunBenchmarkBacktest • 1,739 ms • 1 call • WealthLab.Backtest.Backtester.RunBenchmarkBacktest(BarHistory) 25.2% GetBenchmarkBacktester • 1,739 ms • 1 call • WealthLab.Backtest.Backtester.GetBenchmarkBacktester(BarHistory, TimeSeries, Int32, BacktestSettings, PositionSize, StrategyExecutionMode, Strategy, String) 24.6% RunBacktest • 1,693 ms • 1 call • WealthLab.Backtest.Backtester.RunBacktest(StrategyBase, List) 22.7% RunBacktestInternal • 1,566 ms • 1 call • WealthLab.Backtest.UserStrategyExecutor.RunBacktestInternal(List) 14.3% GetTag • 986 ms • 1 call • WealthLab.Backtest.UserStrategyExecutor.GetTag(List) 14.3% TypeLoader..ctor • 986 ms • 1 call • WealthLab.Core.TypeLoader..ctor(String, String) 12.1% InvokeRule • 837 ms • 1 call • WealthLab.Core.TypeLoader.InvokeRule(String, String, String, Boolean) 9.90% ConductSearch • 682 ms • 2 calls • WealthLab.Core.TypeLoader.ConductSearch(String, String, Boolean) 9.85% LoadTypes • 678 ms • 87 calls • WealthLab.Core.TypeLoader.LoadTypes(Assembly) 4.13% IsDefined • 285 ms • 32 798 calls • System.Reflection.CustomAttribute.IsDefined(RuntimeAssembly, RuntimeType) ► 3.46% GetCustomAttributes • 238 ms • 18 835 calls • System.Reflection.CustomAttribute.GetCustomAttributes(RuntimeModule, Int32, Int32, RuntimeType) ► 1.59% RegisterAssembly • 109 ms • 17 182 calls • WealthLab.Backtest.CodeBasedStrategyReferences.RegisterAssembly(Assembly)


Anyhow, here is the simplified test strategy:

CODE:
using System.Diagnostics; using WealthLab.Backtest; using WealthLab.Core; using WealthLab.Indicators; namespace WealthLabStrategies.MovingAverage { public class MyStrategy : UserStrategyBase { //declare private variables below private RSI _rsi; private static readonly Stopwatch Sw = new(); //create indicators and other objects here, this is executed prior to the main trading loop public override void Initialize(BarHistory bars) { _rsi = RSI.Series(bars.Close, 4); } public override void BacktestBegin() { Sw.Restart(); base.BacktestBegin(); } public override void BacktestComplete() { Sw.Stop(); WriteToDebugLog($"Backtest took {Sw.ElapsedMilliseconds} milliseconds"); } //execute the strategy rules here, this is executed once for each bar in the backtest history public override void Execute(BarHistory bars, int idx) { var position = FindOpenPosition(-1); if (position == null) { if (_rsi.CrossesOver(30, idx)) { PlaceTrade(bars, TransactionType.Buy, OrderType.Market, 0, "Buy"); } } else { if (idx == position.EntryBar + 10) { ClosePosition(position, OrderType.Market, 0, "Sell"); } } } } }


Of course, without the WL source code (which I know I'll never have), I'm running a bit in the dark with the performance profiler. Anyhow, hope this helps.
0
- ago
#11
QUOTE:
... for the fast case 2024-04-23 to 2024-05-03, which has more positions than the slow case, the profiler generated stack is noticeably different ...

Just to clarify, are you saying the System.Array.IndexOf linear search is only part of the slow cases, but not the fast cases? That would make sense to me if it's a processor caching bottleneck.

But why WL would be doing a linear search if there aren't any Performance Visualizers involved is a mystery to me. I know the Performance Visualizer that produces the "By Symbol" tabulation can slow things down.
0
- ago
#12
QUOTE:
Just to clarify, are you saying the System.Array.IndexOf linear search is only part of the slow cases, but not the fast cases?


From the few simple tests, that seems to be the case. What's odd is a fast sample has MORE trades (552) than a slow sample (495). Of course, another variable here is the impact of the profiler. But, even when running without the profiler, the samples have the same performance characteristics. The slow case is noticeably slower than the other two (fast cases).

From my experience, I've found is applying performance enhancements is simply a matter of using a profiler, performing code examination and changes and then lather/rinse/repeat.

For example, maybe that array could be replaced by a Dictionary, but only the WL crew knows. I don't know, without the code, because I have no idea what is in that array. Without the code, I can't make any assumptions that it is a CPU cache issue because maybe its a puny array. Notice that it seems on average that an invocation of FlushIssuer is calling IndexOf 621 times (3,923,793 / 6,313).

I think the best I can do is run tests and supply the results to the WL crew should they choose to investigate further. Beyond that, without the code, I'm just guessing at what's causing the problem.
0
- ago
#13
QUOTE:
maybe that array could be replaced by a Dictionary

Well, a Dictionary must have unique keys. I was thinking a datatype where the vector table (the index) is cached in memory separately from the payload to improve the Principle of Locality within the processor cache. And the index can even be sorted with some .NET datatypes for a quicker binary search (instead of a linear search).

But I agree, the original programmers need to make the final call here because we don't know how this data is going to be accessed later.

QUOTE:
I can't make any assumptions that it is a CPU cache issue because maybe its a puny array.

Agreed, but if they're searching an Array, it probably contains both the key and the payload, which means it's not so puny. Gee, it gets slow with only 552 elements (Positions)? The payload must be huge, which is more of a reason to separate the index from the payload to improve cache hit ratio when searching the index.
0
- ago
#14
If my guess is good then B92 may fix this by virtue of obfuscator's new run.
0
- ago
#15
The slowness is still occurring in B92. I ran the three date ranges as described above and WL exhibits the same characteristics. I didn't take any profiler measurements because that takes time. I can easily tell just from running it that the slowdown occurs.

I understand this is probably not a high priority, but nevertheless, I thought you should know it did not change in B92.
0
Cone8
 ( 4.98% )
- ago
#16
I did the test you specified in Post #5 and didn't see any noticeable difference between the two runs when changing the date between 4/23 and 4/24 - both sub-second. Maybe it has something to do with the TD Data provider, or the data itself.

... used a production install with IQFeed data. Tested 7 other starting dates, all fast.
1
- ago
#17
QUOTE:
I did the test you specified in Post #5 and didn't see any noticeable difference between the two runs when changing the date between 4/23 and 4/24 - both sub-second.

But did you compare Paul's L3 processor cache size with yours? The processor cache size is definitely in the equation here, so unless both of you have exactly the same L3 (and maybe L2) processor cache size, you are going to see differences between your two machines. And that's to be expected.

My guess is Paul has more cores and less L3 cache than Cone, so Cone won't see the problem. Paul, can you simply try running your tests on a processor chip with more L3 cache (and less cores) than you have now. That should make the difference.

There "might" be a software fix if you replace System.Array with another datatype that allows the index to be cached but does not cache the payload. The payload must be very large if you're getting cache misses with only 552 elements (Positions in the array). Can you invert transpose (or reorganize) System.Array such that the search index/key is the fastest moving variable in the array to improve cache hits during the search?

Remember, this is fundamentally a hardware problem with the System.Array not fitting entirely (index and payload) in the L3 cache of the processor during a linear search algorithm.
0

Reply

Bookmark

Sort