PDA

View Full Version : Suspected Memory Leak in Database Rebuild - Build 77412



ianwoodmore
July 23rd, 2015, 11:06 PM
This is a follow up report for Build 77412 (Bug originally reporteded in Build 77241 posts #36 and 37 with Chris's request for follow up at #38.)

During a Database Rebuild (DBR) of a maxi install following on from installing of build 77412 and pointing to previous TANE Userdata folder 77241 I progressively recorded data on '%CPU', '%RAM', 'Memory Commit' and 'Hard Faults/sec'.

%CPU and %RAM were taken from my Logitech keyboard LCD. TAD size was 356,000 (356K).

During early phases of DBR (ie 'Delete', 'Add' and 'Modify') no unusual behaviour and %RAM used is below max available physical memory.

As 'Check faulty' progressed there was a buildup of 'Memory Commit" to a whopping 54GB by end of this phase.

My RAM is 32GB and I have multiple pagefiles slightly larger than that on all disks both SSD and HDD.

At 198K assets of 356K checked - 85% RAM (relates to 23GB in use for all applications and services including TANE)

At 226K assets checked - 87% RAM, CPU 5-22%, 26GB memory commit, and pagefile access became noticeable with 66 Hard Faults/sec (HDF/s). The HDF/s were against 'livecom.exe' not TANE.

At 250K assets checked - Now experiencing 87% RAM still and didn't vary much right up to end of DBR. Probably because of other processes with higher priorities. However, Memory Commit is almost 32GB and HDF/s has climbed to 131. TANE is now suffering a continuous 65 HDF/s. Considerable data is being swapped out to a pagefile. According to Microsoft, when multiple pagefiles are set, anyone of them may be used on a First come First served basis. Unfortunately, for this DBR it happened to be a 6Gbps SATA 3 HDD not an SSD, and a slowing down of the DBR was becoming evident. If it had chosen either of the SSDs then results may have been significantly different. This may explain the puzzling randomness of this bug.

At 269K assets checked - The Building TAD window showed a stationery green progress bar and the log beneath it had stopped. To all intents and purposes the DBR appeared to have stalled and this is what most Trainzers would surmise I believe. BUT the Developer Show Log display showed that checking was still occurring, albeit at an ever decreasing pace.

From here until what I judged to be completion of DBR, as indicated when CM Trainz Content Installed display showed data, the DBR got slower and slower as more swapping with pagefile occurred with the final memory commit at 54GB.

This did not decrease over the next five minutes and Building TAD window was still not updated. I terminated TANE via Task Manager and restarted after the obligatory wait for CPU processes queue to complete.

On restart of CM the AND and AND NOT operator saga became evident. I'm certain this was a separate issue.

I believe we have a memory leak.

JCitron
July 23rd, 2015, 11:30 PM
Ian,

I had one in the retail version awhile ago. The message was in regard to the CTDs but who knows...

Here's the text from my post on it:

I might have stumbled on to something tonight, though not in Surveyor , but in Content Manager during a rather lengthy database repair.

As the database repair was reaching the final laps, I heard a ding sound and had a message appeared on the screen that stated that Windows is low on memory and is shutting down applications and has recovered... blah, blah, blah. I know I should have screen captured, the message, but did not. Essentially since T:ANE was the only application running, it was closed in midst of the database repair!

I did, however, check the Event Viewer after I rebooted. I figured that at that point, the application was probably compromised and wouldn't be stable so it was best to fresh brains the computer because who knows what else could be junk at that point.

Here's the text from the error in the System Event Viewer log.

Windows successfully diagnosed a low virtual memory condition. The following programs consumed the most virtual memory: TANE.exe (1368) consumed 55439831040 bytes, dwm.exe (972) consumed 4400648192 bytes, and explorer.exe (4160) consumed 314273792 bytes.

Now I have 32GB of RAM and had a system-managed swap file. Since this error, I have added two additional paging files on my two other hard drives. (I have 3 drives). I left the first to system managed, as that is needed to capture memory.dmp files, and added two others at 16535MB min - 32768MB max. This will give my system between 45 GB and 82 GB of paging files minimum. This maybe overkill, but it's worth experimenting with at this point. If I no longer have that issue, I will tweak the settings down to see what happens.

This may very well be the same issue that is causing T:ANE to quit even though Windows has not put up a warning message. The way to find out is to check the Event Viewer and the System log, not the Application log. Look for any errors and view the contents for things related to T:ANE and possibly out of memory conditions.

John

WindWalkr
July 23rd, 2015, 11:32 PM
Ian,

I'm not seeing any similar issue here. Memory usage during a large repair is hovering at about 2.7GB (in debug, so the release builds may be lower.)

A memory leak would not cause the GUI to lock up or the requirement to use Task Manager to terminate the app. It sounds feasible that one of your threads has locked up, and memory that should be released by that thread is staying resident. It could of course be something else entirely, or it could be that the "lock up" is not a legit hang but is just the user experience of the machine running extremely slowly due to the memory usage.

Please keep your eye out for a repeat of this behaviour.

chris