Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0000357XdebugUsage problemspublic2008-03-17 17:282011-03-18 09:30
ReporterPeterJanRoes 
Assigned To 
PrioritynormalSeveritymajorReproducibilityalways
StatusnewResolutionopen 
PlatformOSOS Version
Product Version 
Target Version2.2devFixed in Version 
Summary0000357: Profiler shows huge times for certain calls
DescriptionThe cachegrind.out output of the profiler contains some lines with huge values for the time spent. These values are clearly wrong and mess up the overall timing information (KCacheGrind shows weird percentages as well). The values are very suspicious as they lie close to the maximum value of an unsigned integer. Maybe some integer overflow is occurring during one of the calculations producing the erroneous values?

Please note that the script being profiled produces an output file of around 80Mb and the functions having the incorrect timing values are called thousands of times.
Additional InformationHere are some lines from the output file, please note the huge timing values:

fl=C:\Documents and Settings\Peter-Jan Roes\workspace\WikiProtein\extensions\Wikidata\OmegaWiki\Editor.php
fn=WrappingEditor->getAttribute
223 23
cfn=WrappingEditor->getAttribute
calls=1 0 0
1742 33

fl=C:\Documents and Settings\Peter-Jan Roes\workspace\WikiProtein\extensions\Wikidata\OmegaWiki\Editor.php
fn=AttributeEditorMap->addEditor
272 4293967322
cfn=WrappingEditor->getAttribute
calls=1 0 0
223 56

fl=C:\Documents and Settings\Peter-Jan Roes\workspace\WikiProtein\extensions\Wikidata\OmegaWiki\Editor.php
fn=DefaultEditor->addEditor
636 1000028
cfn=AttributeEditorMap->addEditor
calls=1 0 0
272 4293967379

fl=C:\Documents and Settings\Peter-Jan Roes\workspace\WikiProtein\extensions\Wikidata\OmegaWiki\Editor.php
fn=DefaultEditor->getAttribute
1742 2

fl=C:\Documents and Settings\Peter-Jan Roes\workspace\WikiProtein\extensions\Wikidata\OmegaWiki\Editor.php
fn=WrappingEditor->getAttribute
636 23
cfn=DefaultEditor->getAttribute
calls=1 0 0
1742 2
TagsNo tags attached.
Operating SystemWindows XP
PHP Version5.2.5
Attached Files

- Relationships
has duplicate 0000547resolvedderick Cachegrind file reports cost ~4293967300 

-  Notes
(0000829)
PeterJanRoes (reporter)
2008-03-18 12:17

I have checked out the XDebug profiler code and tried to find the source of the problem. I have not tested anything or changed any code but these are my current insights into the problem by just reviewing the code.

The profiler uses a structure "xdebug_profile" which contains two doubles: "time" and "mark". These doubles are used in various calculations and later on converted (indirectly) to an "unsigned long". Performing floating point arithmetic is always prone to numeric instability, especially when large and small numbers are used together many times in calculations. You might loose some precision somewhere along the road and that might explain the strange timings I encounter. Wouldn't it be better to store "unsigned longs" in the "xdebug_profile" structure instead? This would allow for exact calculations and hopefully resolve the problem.

Please let me know if the above isn't true and if I should provide some more information on reproducing the problem.
(0001382)
c960657 (reporter)
2010-03-15 12:09

I see this too running PHP 5.2.11 with Xcache 1.3.0 on WinXP with Xdebug 2.0.3 as well as with 2.1.0beta3 when profiling Drupal (i.e. with very large cachegrind files).

This user seems to experience this under SuSE Linux:
http://xdebug.org/archives/xdebug-general/1282.html [^]
(0001539)
eis_os (reporter)
2010-07-23 15:58

Hello

This a very hard bug to debug in an easy reliable test case, the actually problem is the unstable clock source of windows (Server 2003 / XP) together with AMD CPU.

Internally XDebug uses php gettimeofday, and that uses QueryPerformanceCounter

So magical your xdebugs microtime goes backwards a bit, and the result is a negative number.

Solution for AMD CPU:
http://support.microsoft.com/kb/895980 [^]

Quote from MSDN QueryPerformanceCounter:
"On a multiprocessor computer, it should not matter which processor is called. However, you can get different results on different processors due to bugs in the basic input/output system (BIOS) or the hardware abstraction layer (HAL)."
(0001603)
ped (reporter)
2010-11-11 13:54

I have the issue too:
SW: WinXP, PHP 5.3.3, xdebug2.1.0-5.3-vc6
HW: Intel Core2 (dual core)

The wrong times do look like negative numbers converted to uint32.

Looking at the results, they usually come in pairs, one fn= has ~4293967300, and some other near around has 1mil+xxx which is wrong too. So it looks to me like +-1000000 (exactly) happens as some sort of hiccup.

I tried to nail php.exe down to single core ASAP (by hand in task manager, I'm testing the unit test run which takes 20+ seconds without profiling, almost 1min with profiling enabled), it doesn't look like it did help to make the issue less frequent, so it's maybe not that much related to core switching, maybe it's more about unreliable perf counters.
(0001604)
ped (reporter)
2010-11-11 14:19

Checking the source vs this reference:
http://rabbit.eng.miami.edu/info/functions/time.html#gtod [^]
(see example for gettimeofday function)

usefulstuff.c lines 183 - 185:
183 if (msec >= 1.0) {
184 msec -= (long) msec;
185 }

are simply wrong? I can't figure out their purpose, the tv_usec should be less then 1mil anyway, and even if due to some inaccuracy it would be GRE 1mil, it's supposed to be added to seconds anyway, as it forms total time together, IMHO.
This would make sense to me only if the implementation of gettimeofday has known bug which requires to subtract 1sec if the tv_usec is above 1mil.

Also if I understand it correctly, the xdebug_get_utime returns double, which is then later printed into log file as integer again, doing the conversion back and forth without any good reason (except doing add/sub with doubles is more simple, but that's not good reason).

I think the timing should work with original integers pair, i.e. creating a struct to store them both and helper functions to add/subtract two of them and finally a function to output correct total number of usecs.

If the maintainers of this project are interested, I may try to produce such patch, let me know in this thread.
(0001605)
eis_os (reporter)
2010-11-11 15:28

gettimeofday of php-src/win32/time.c will be used under Windows, that is a emulation function using QueryPerformanceCounter. I tested this in xdebug code and the calls to PHP gettimeofday can vary. Using /pmtimer made the timing stable for my use case, but my be still unstable for others.

As tested, the php gettimeofday may return earlier times in later calls, and at this point the value is still a double.

Maybe useful:
http://developer.nvidia.com/object/timer_function_performance.html [^]
(0001606)
ped (reporter)
2010-11-12 08:36

eis_os: I don't understand what you talk about.
It's not important where is gettimeofday implemented, as long as it's implemented correctly and it returns always same or greater time then by previous call.
If you have a well documented problem with AMD multicore CPU, then it's obvious it may return wrong numbers, but I have Intel Core2, where such problem shouldn't arise AFAIK (at least I didn't find it documented anywhere).

I did take a quick peek at win32/time.c from PHP sources implementation, and ... it looks ok.
I would change "if (time_Info->tv_usec > 1000000) {" on line 108 to >=, but that's not critical fix in global sense (although this may be the trigger point to expose the bug in xdebug/usefulstuff.c !)

Actually this implementation probably hides even BIOS/multicore problems for the price of low accuracy of the result. Notice the L92 "if (timer < PW32G(lasttime)) {".
This is originally planned as a reinit of usec timing in case the performance counters overflow, but if you have multi core system which makes QueryPerformanceCounter unreliable, it will just cause the reinit trough getfilesystemtime(..) to happen more often. It's very likely that one with additional execution overhead will yield positive result, so no history trip happens then.

OTOH the xdebug/usefulstuff.c double xdebug_get_utime(void) is quite opposite, it is IMO obviously wrong (that if at line 183 shouldn't be there if I understand it correctly) + not well thought out, actually it looks like it was done by somebody who didn't have a clue what he's doing. (converting forth and back from integers to double, so wasting both CPU and precision without any reason)

"I tested this in xdebug code and the calls to PHP gettimeofday can vary."

How did you test that? I can't compile the source at my machine, but by reading the source I strongly doubt such behavior (maybe on multicore PC with the described BIOS problems, but as I said the reinit would very likely hide even that, unless the difference between cores is insanely huge, like over 500 ticks).

Anyway, I don't have problems with such short time shifts to history, I have +-1 second(!) variation in xdebug results, and I'm more and more sure it's because of bug in xdebug/usefulstuff.c (when converting the gettimeofday to double). Actually I'm now sure the code is wrong.

"and at this point the value is still a double"
And this doesn't make sense to me at all. There's no double value involved with gettimeoftheday (except it's internal implementation, but I'm talking about API), the first double the xdebug encounters is its own converted value, and the conversion itself causes the bug to happen. So "still a double" ... ?? still since when, since conversion?

Re-reading after myself... I sound way too aggressive and harsh, sorry for the tone. (and sorry for not rewriting it, but I don't have all day for this + factually it's ok)
(0001607)
ped (reporter)
2010-11-12 08:52

The nvidia link is nice overview... and it's interesting to see how slow the QueryPerformanceCounter function is, I did expect it to be much faster. But in the end for profiling you can't use GetTickCount() 5~16ms resolution timer, so it has no practical implication on current xdebug implementation.

Maybe the only practical implication is to drop gettimeofday on win completely and rewrite xdebug profiler to use QueryPerformanceCounter directly. I don't think it's worth the effort (putting away just one layer of call and a bit of useless double conversions, would make some sense, but it's considerable effort with little gain).

Still if maintainer of xdebug is interested, I can rewrite it (but I need somebody who can compile it to do the testing ... plus while I was looking at xdebug source, I figured out the C now extremely annoys me, I got too much used to C++ and classes encapsulation, so my source would require very likely tidying variables/API names and overall style).
(0001608)
eis_os (reporter)
2010-11-12 10:08
edited on: 2010-11-12 10:35

I did add a forced hang(loop) if oldtime > newtime in xdebug utime. And then attached a debugger to see if it hangs there, I compiled with a trunk php version ~ 5.3.1 (so it's a moving target) and Windows SDK VC9 Compiler. I am sorry you can't compile it yourself, but that is not my fault.

I am not saying xdebug is correct with it's timing code, but before you start rewriting xdebug you may need to verify if php gettimeofday and xdebug utime is reliable enough, otherwise you still have the same problem. More I didn't want to say.
Anyway, that's how it looks like in my local tree based on 2.2dev:

double xdebug_get_utime(void)
{
#ifdef HAVE_GETTIMEOFDAY
    struct timeval tp;
    long sec = 0L;
    double msec = 0.0;

    if (gettimeofday((struct timeval *) &tp, NULL) == 0) {
#if 1
        return ((double)(tp.tv_sec + tp.tv_usec / MICRO_IN_SEC));
#else
        sec = tp.tv_sec;
        msec = (double) (tp.tv_usec / MICRO_IN_SEC);

        if (msec >= 1.0) {
            msec -= (long) msec;
        }
        return msec + sec;
#endif
    }
#endif
    return 0;
}


I am quite sure I still use this build, so it may be right that xdebug_get_utime may be broken specially on win32, but I need to decompile it to verify. -edit- A recompile looks the same disassembled, so I use xdebug r3323 (2.2-dev) with php5.3 revision 301742 patched with the above code.



About Core2Duo I found this (Specially Message 30): http://www.osronline.com/showthread.cfm?link=189427#30 [^]

And this, so it seems there are different tsc features: http://www.gossamer-threads.com/lists/xen/devel/185419 [^]

(0001609)
ped (reporter)
2010-11-12 10:59

PHP bug URL for tracking purposes: http://bugs.php.net/bug.php?id=53297 [^]
(0001610)
eis_os (reporter)
2010-11-12 11:26

Yes, the output isn't problematical for my code above, for xdebug we don't need the separation at all.

The actual code for xdebug should be:

(Otherwise we lose our overflowing second here)

if (msec >= 1.0) {
   sec += (long) msec;
   msec -= (long) msec;
}
return msec + sec;

I still think my version is much better for win32:

return ((double)(tp.tv_sec + tp.tv_usec / MICRO_IN_SEC));

This compiles down to 3 FPU instructions without jump, regardless if gettimeofday is broken or not. Still PHP should be fixed...
(0001611)
ped (reporter)
2010-11-12 12:32

eis_os: yes, your version is basically what I suggested (to delete the offending if block) + written as single line. And I can just agree.

PHP bug filed, so let's see if/when they react.

I did talk with derick on IRC and he now pretty much shares the same view. (now I'm just waiting for either new PHP or xdebug DLL, whichever will react faster :))

Thanks for helping with the issue and pointing out the other problems with win32 performance counters on multi core CPUs, this may be valuable knowledge resource for people who will run into those.
(I'm not affected by this, but others may be)
(0001612)
ped (reporter)
2010-11-14 20:14

PHP fix accepted to SVN. (So 5.3.4 will contain the fix? I have no idea how long it takes to propage new SVN change to public release)
(0001693)
derick (administrator)
2011-03-17 17:59

I just read over this again, and with PHP 5.3.6 being out, this should now be resolved, right?

cheers,
Derick
(0001694)
ped (reporter)
2011-03-18 09:30

derick: the bug in xdebug is now no more triggered by bad gettimeofday result (since PHP5.3.4).
Still there was bug in xdebug as described in my note http://bugs.xdebug.org/view.php?id=357#c1604 [^] ... did you remove that already?


I can't verify the claims about AMD CPUs, but I would suggest to recreate another bug report for those anyway, because the +-1sec hiccup was purely SW problem in PHP and xdebug and it's now hidden by PHP fix.

- Issue History
Date Modified Username Field Change
2008-03-17 17:28 PeterJanRoes New Issue
2008-03-18 12:17 PeterJanRoes Note Added: 0000829
2010-03-15 12:09 c960657 Note Added: 0001382
2010-03-17 00:18 derick Relationship added has duplicate 0000547
2010-03-20 23:47 derick Target Version => 2.2dev
2010-07-23 15:58 eis_os Note Added: 0001539
2010-11-11 13:54 ped Note Added: 0001603
2010-11-11 14:19 ped Note Added: 0001604
2010-11-11 15:28 eis_os Note Added: 0001605
2010-11-12 08:36 ped Note Added: 0001606
2010-11-12 08:52 ped Note Added: 0001607
2010-11-12 10:08 eis_os Note Added: 0001608
2010-11-12 10:35 eis_os Note Edited: 0001608 View Revisions
2010-11-12 10:59 ped Note Added: 0001609
2010-11-12 11:26 eis_os Note Added: 0001610
2010-11-12 12:32 ped Note Added: 0001611
2010-11-14 20:14 ped Note Added: 0001612
2011-03-17 17:59 derick Note Added: 0001693
2011-03-18 09:30 ped Note Added: 0001694


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker