We have a SQL 2008 Analysis Services instance on Windows 2003 that is experiencing near-daily performance issues (query timeouts) during peak activity. A second identical instance queried from the same application (using round-robin IP to load balance) doesn't have the same issue. We've even switched the two instances, and the problem seems to "stay" with the physical server. We know it's not an SSAS configuration issue. The cube is rebuilt once per day.
Here's what we've been able to determine: the timeouts occur when the Memory\System Cache Resident Bytes perfmon counter drops unexpectedly (we've been calling this a "cache flush"). For example, this morning it dropped from about 14GB to 8.4GB within about 2 minutes. When this happens, the Memory:Page Reads/sec jumps from 0 up to 800-1200, and things start going sideways.
You can see the immediate impact on the MSOLAP counters: Storage Engine Query\Data reads/sec drops dangerously from 50,000 to 5,000 (these obviously weren't physical reads, they were reads from the system cache; this is confirmed by a similar drop to Process(msmdsrv)\IO Read Operations/sec). Storage Engine Query\Avg time/query climbs from 500-2000 up to 20,000-50,000 and queries start timing out. Threads\Processing pool busy threads jump from ~50 up to the max configured value of 240, and the Threads\Processing pool job queue length starts climbing.
Queries continue timing out over the next 30 minutes or so, until (I presume) the file cache is re-filled with the missing data. Memory\Page Reads/sec goes back down to near zero, and things appear to return to "normal".
During this entire process, there is no significant change to Process(msmdsrv):Working Set (~10.5GB) , Process(Total):Working Set (~13.9GB) or MSOLAP:Memory\Memory Usage KB (~50GB). This is a 48-core server with 128GB RAM, running only SSAS (the other server that isn't having trouble is a 24-core box, we haven't found any other significant differences between them).
Instance memory settings are as follows:
<TotalMemoryLimit>80000000000</TotalMemoryLimit> <LowMemoryLimit>70000000000</LowMemoryLimit> <PreAllocate>40000000000</PreAllocate> Based especially on the Memory Usage KB, I don't believe that memory is an issue, other than figuring out why SSAS isn't using its internal memory to cache the entire cube (the cube is about 32GB).
We are currently working to optimize the cube and query design (which I highly suspect will make the problem disappear entirely).
In the meantime, we need to
- understand this windows file cache behavior
- figure out why its consistently a problem on one physical server and not the other
Here are my questions:
I assume this is "normal" behavior by SSAS? Is there any way to force the entire 32GB cube into SSAS memory, to eliminate the need to read from the windows file cache entirely?
Is there any way to determine what exactly is causing this cache flush, and why it doesn't occur on the (very similar) other host?
Any other ways to mitigate this problem??
I found some other references to SSAS and the windows file cache (the links found here for example), but these seem to imply that the file caching is a bad thing, and talk about a utility to deliberately flush the cache, which seems like the opposite of what I need.
Edit: Added a 100pt bounty