View Issue Details

IDProjectCategoryView StatusLast Update
0000357XdebugUsage problems (Wrong Results)public2016-11-27 19:52
ReporterPeterJanRoesAssigned Toderick 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionno change required 
Product Version 
Target Version2.2devFixed in Version 
Summary0000357: Profiler shows huge times for certain calls
DescriptionThe cachegrind.out output of the profiler contains some lines with huge values for the time spent. These values are clearly wrong and mess up the overall timing information (KCacheGrind shows weird percentages as well). The values are very suspicious as they lie close to the maximum value of an unsigned integer. Maybe some integer overflow is occurring during one of the calculations producing the erroneous values?

Please note that the script being profiled produces an output file of around 80Mb and the functions having the incorrect timing values are called thousands of times.
Additional InformationHere are some lines from the output file, please note the huge timing values:

fl=C:\Documents and Settings\Peter-Jan Roes\workspace\WikiProtein\extensions\Wikidata\OmegaWiki\Editor.php
fn=WrappingEditor->getAttribute
223 23
cfn=WrappingEditor->getAttribute
calls=1 0 0
1742 33

fl=C:\Documents and Settings\Peter-Jan Roes\workspace\WikiProtein\extensions\Wikidata\OmegaWiki\Editor.php
fn=AttributeEditorMap->addEditor
272 4293967322
cfn=WrappingEditor->getAttribute
calls=1 0 0
223 56

fl=C:\Documents and Settings\Peter-Jan Roes\workspace\WikiProtein\extensions\Wikidata\OmegaWiki\Editor.php
fn=DefaultEditor->addEditor
636 1000028
cfn=AttributeEditorMap->addEditor
calls=1 0 0
272 4293967379

fl=C:\Documents and Settings\Peter-Jan Roes\workspace\WikiProtein\extensions\Wikidata\OmegaWiki\Editor.php
fn=DefaultEditor->getAttribute
1742 2

fl=C:\Documents and Settings\Peter-Jan Roes\workspace\WikiProtein\extensions\Wikidata\OmegaWiki\Editor.php
fn=WrappingEditor->getAttribute
636 23
cfn=DefaultEditor->getAttribute
calls=1 0 0
1742 2
TagsNo tags attached.
Operating SystemWindows XP
PHP Version5.2.5

Relationships

has duplicate 0000547 resolvedderick Cachegrind file reports cost ~4293967300 

Activities

PeterJanRoes

2008-03-18 12:17

reporter   ~0000829

I have checked out the XDebug profiler code and tried to find the source of the problem. I have not tested anything or changed any code but these are my current insights into the problem by just reviewing the code.

The profiler uses a structure "xdebug_profile" which contains two doubles: "time" and "mark". These doubles are used in various calculations and later on converted (indirectly) to an "unsigned long". Performing floating point arithmetic is always prone to numeric instability, especially when large and small numbers are used together many times in calculations. You might loose some precision somewhere along the road and that might explain the strange timings I encounter. Wouldn't it be better to store "unsigned longs" in the "xdebug_profile" structure instead? This would allow for exact calculations and hopefully resolve the problem.

Please let me know if the above isn't true and if I should provide some more information on reproducing the problem.

c960657

2010-03-15 12:09

reporter   ~0001382

I see this too running PHP 5.2.11 with Xcache 1.3.0 on WinXP with Xdebug 2.0.3 as well as with 2.1.0beta3 when profiling Drupal (i.e. with very large cachegrind files).

This user seems to experience this under SuSE Linux:
http://xdebug.org/archives/xdebug-general/1282.html

eis_os

2010-07-23 14:58

reporter   ~0001539

Hello

This a very hard bug to debug in an easy reliable test case, the actually problem is the unstable clock source of windows (Server 2003 / XP) together with AMD CPU.

Internally XDebug uses php gettimeofday, and that uses QueryPerformanceCounter

So magical your xdebugs microtime goes backwards a bit, and the result is a negative number.

Solution for AMD CPU:
http://support.microsoft.com/kb/895980

Quote from MSDN QueryPerformanceCounter:
"On a multiprocessor computer, it should not matter which processor is called. However, you can get different results on different processors due to bugs in the basic input/output system (BIOS) or the hardware abstraction layer (HAL)."

ped

2010-11-11 13:54

reporter   ~0001603

I have the issue too:
SW: WinXP, PHP 5.3.3, xdebug2.1.0-5.3-vc6
HW: Intel Core2 (dual core)

The wrong times do look like negative numbers converted to uint32.

Looking at the results, they usually come in pairs, one fn= has ~4293967300, and some other near around has 1mil+xxx which is wrong too. So it looks to me like +-1000000 (exactly) happens as some sort of hiccup.

I tried to nail php.exe down to single core ASAP (by hand in task manager, I'm testing the unit test run which takes 20+ seconds without profiling, almost 1min with profiling enabled), it doesn't look like it did help to make the issue less frequent, so it's maybe not that much related to core switching, maybe it's more about unreliable perf counters.

ped

2010-11-11 14:19

reporter   ~0001604

Checking the source vs this reference:
http://rabbit.eng.miami.edu/info/functions/time.html#gtod
(see example for gettimeofday function)

usefulstuff.c lines 183 - 185:
183 if (msec >= 1.0) {
184 msec -= (long) msec;
185 }

are simply wrong? I can't figure out their purpose, the tv_usec should be less then 1mil anyway, and even if due to some inaccuracy it would be GRE 1mil, it's supposed to be added to seconds anyway, as it forms total time together, IMHO.
This would make sense to me only if the implementation of gettimeofday has known bug which requires to subtract 1sec if the tv_usec is above 1mil.

Also if I understand it correctly, the xdebug_get_utime returns double, which is then later printed into log file as integer again, doing the conversion back and forth without any good reason (except doing add/sub with doubles is more simple, but that's not good reason).

I think the timing should work with original integers pair, i.e. creating a struct to store them both and helper functions to add/subtract two of them and finally a function to output correct total number of usecs.

If the maintainers of this project are interested, I may try to produce such patch, let me know in this thread.

eis_os

2010-11-11 15:28

reporter   ~0001605

gettimeofday of php-src/win32/time.c will be used under Windows, that is a emulation function using QueryPerformanceCounter. I tested this in xdebug code and the calls to PHP gettimeofday can vary. Using /pmtimer made the timing stable for my use case, but my be still unstable for others.

As tested, the php gettimeofday may return earlier times in later calls, and at this point the value is still a double.

Maybe useful:
http://developer.nvidia.com/object/timer_function_performance.html

ped

2010-11-12 08:36

reporter   ~0001606

eis_os: I don't understand what you talk about.
It's not important where is gettimeofday implemented, as long as it's implemented correctly and it returns always same or greater time then by previous call.
If you have a well documented problem with AMD multicore CPU, then it's obvious it may return wrong numbers, but I have Intel Core2, where such problem shouldn't arise AFAIK (at least I didn't find it documented anywhere).

I did take a quick peek at win32/time.c from PHP sources implementation, and ... it looks ok.
I would change "if (time_Info->tv_usec > 1000000) {" on line 108 to >=, but that's not critical fix in global sense (although this may be the trigger point to expose the bug in xdebug/usefulstuff.c !)

Actually this implementation probably hides even BIOS/multicore problems for the price of low accuracy of the result. Notice the L92 "if (timer < PW32G(lasttime)) {".
This is originally planned as a reinit of usec timing in case the performance counters overflow, but if you have multi core system which makes QueryPerformanceCounter unreliable, it will just cause the reinit trough getfilesystemtime(..) to happen more often. It's very likely that one with additional execution overhead will yield positive result, so no history trip happens then.

OTOH the xdebug/usefulstuff.c double xdebug_get_utime(void) is quite opposite, it is IMO obviously wrong (that if at line 183 shouldn't be there if I understand it correctly) + not well thought out, actually it looks like it was done by somebody who didn't have a clue what he's doing. (converting forth and back from integers to double, so wasting both CPU and precision without any reason)

"I tested this in xdebug code and the calls to PHP gettimeofday can vary."

How did you test that? I can't compile the source at my machine, but by reading the source I strongly doubt such behavior (maybe on multicore PC with the described BIOS problems, but as I said the reinit would very likely hide even that, unless the difference between cores is insanely huge, like over 500 ticks).

Anyway, I don't have problems with such short time shifts to history, I have +-1 second(!) variation in xdebug results, and I'm more and more sure it's because of bug in xdebug/usefulstuff.c (when converting the gettimeofday to double). Actually I'm now sure the code is wrong.

"and at this point the value is still a double"
And this doesn't make sense to me at all. There's no double value involved with gettimeoftheday (except it's internal implementation, but I'm talking about API), the first double the xdebug encounters is its own converted value, and the conversion itself causes the bug to happen. So "still a double" ... ?? still since when, since conversion?

Re-reading after myself... I sound way too aggressive and harsh, sorry for the tone. (and sorry for not rewriting it, but I don't have all day for this + factually it's ok)

ped

2010-11-12 08:52

reporter   ~0001607

The nvidia link is nice overview... and it's interesting to see how slow the QueryPerformanceCounter function is, I did expect it to be much faster. But in the end for profiling you can't use GetTickCount() 5~16ms resolution timer, so it has no practical implication on current xdebug implementation.

Maybe the only practical implication is to drop gettimeofday on win completely and rewrite xdebug profiler to use QueryPerformanceCounter directly. I don't think it's worth the effort (putting away just one layer of call and a bit of useless double conversions, would make some sense, but it's considerable effort with little gain).

Still if maintainer of xdebug is interested, I can rewrite it (but I need somebody who can compile it to do the testing ... plus while I was looking at xdebug source, I figured out the C now extremely annoys me, I got too much used to C++ and classes encapsulation, so my source would require very likely tidying variables/API names and overall style).

eis_os

2010-11-12 10:08

reporter   ~0001608

Last edited: 2010-11-12 10:35

View 2 revisions

I did add a forced hang(loop) if oldtime > newtime in xdebug utime. And then attached a debugger to see if it hangs there, I compiled with a trunk php version ~ 5.3.1 (so it's a moving target) and Windows SDK VC9 Compiler. I am sorry you can't compile it yourself, but that is not my fault.

I am not saying xdebug is correct with it's timing code, but before you start rewriting xdebug you may need to verify if php gettimeofday and xdebug utime is reliable enough, otherwise you still have the same problem. More I didn't want to say.
Anyway, that's how it looks like in my local tree based on 2.2dev:

double xdebug_get_utime(void)
{
#ifdef HAVE_GETTIMEOFDAY
    struct timeval tp;
    long sec = 0L;
    double msec = 0.0;

    if (gettimeofday((struct timeval *) &tp, NULL) == 0) {
#if 1
        return ((double)(tp.tv_sec + tp.tv_usec / MICRO_IN_SEC));
#else
        sec = tp.tv_sec;
        msec = (double) (tp.tv_usec / MICRO_IN_SEC);

        if (msec >= 1.0) {
            msec -= (long) msec;
        }
        return msec + sec;
#endif
    }
#endif
    return 0;
}


I am quite sure I still use this build, so it may be right that xdebug_get_utime may be broken specially on win32, but I need to decompile it to verify. -edit- A recompile looks the same disassembled, so I use xdebug r3323 (2.2-dev) with php5.3 revision 301742 patched with the above code.



About Core2Duo I found this (Specially Message 30): http://www.osronline.com/showthread.cfm?link=189427#30

And this, so it seems there are different tsc features: http://www.gossamer-threads.com/lists/xen/devel/185419

ped

2010-11-12 10:59

reporter   ~0001609

PHP bug URL for tracking purposes: http://bugs.php.net/bug.php?id=53297

eis_os

2010-11-12 11:26

reporter   ~0001610

Yes, the output isn't problematical for my code above, for xdebug we don't need the separation at all.

The actual code for xdebug should be:

(Otherwise we lose our overflowing second here)

if (msec >= 1.0) {
   sec += (long) msec;
   msec -= (long) msec;
}
return msec + sec;

I still think my version is much better for win32:

return ((double)(tp.tv_sec + tp.tv_usec / MICRO_IN_SEC));

This compiles down to 3 FPU instructions without jump, regardless if gettimeofday is broken or not. Still PHP should be fixed...

ped

2010-11-12 12:32

reporter   ~0001611

eis_os: yes, your version is basically what I suggested (to delete the offending if block) + written as single line. And I can just agree.

PHP bug filed, so let's see if/when they react.

I did talk with derick on IRC and he now pretty much shares the same view. (now I'm just waiting for either new PHP or xdebug DLL, whichever will react faster :))

Thanks for helping with the issue and pointing out the other problems with win32 performance counters on multi core CPUs, this may be valuable knowledge resource for people who will run into those.
(I'm not affected by this, but others may be)

ped

2010-11-14 20:14

reporter   ~0001612

PHP fix accepted to SVN. (So 5.3.4 will contain the fix? I have no idea how long it takes to propage new SVN change to public release)

derick

2011-03-17 17:59

administrator   ~0001693

I just read over this again, and with PHP 5.3.6 being out, this should now be resolved, right?

cheers,
Derick

ped

2011-03-18 09:30

reporter   ~0001694

derick: the bug in xdebug is now no more triggered by bad gettimeofday result (since PHP5.3.4).
Still there was bug in xdebug as described in my note http://bugs.xdebug.org/view.php?id=357#c1604 ... did you remove that already?


I can't verify the claims about AMD CPUs, but I would suggest to recreate another bug report for those anyway, because the +-1sec hiccup was purely SW problem in PHP and xdebug and it's now hidden by PHP fix.

derick

2016-11-27 19:52

administrator   ~0003799

I am closing this as nobody has mentioned it for a long long time, and I could never reproduce this.

Issue History

Date Modified Username Field Change
2008-03-17 17:28 PeterJanRoes New Issue
2008-03-18 12:17 PeterJanRoes Note Added: 0000829
2010-03-15 12:09 c960657 Note Added: 0001382
2010-03-17 00:18 derick Relationship added has duplicate 0000547
2010-03-20 23:47 derick Target Version => 2.2dev
2010-07-23 14:58 eis_os Note Added: 0001539
2010-11-11 13:54 ped Note Added: 0001603
2010-11-11 14:19 ped Note Added: 0001604
2010-11-11 15:28 eis_os Note Added: 0001605
2010-11-12 08:36 ped Note Added: 0001606
2010-11-12 08:52 ped Note Added: 0001607
2010-11-12 10:08 eis_os Note Added: 0001608
2010-11-12 10:35 eis_os Note Edited: 0001608 View Revisions
2010-11-12 10:59 ped Note Added: 0001609
2010-11-12 11:26 eis_os Note Added: 0001610
2010-11-12 12:32 ped Note Added: 0001611
2010-11-14 20:14 ped Note Added: 0001612
2011-03-17 17:59 derick Note Added: 0001693
2011-03-18 09:30 ped Note Added: 0001694
2016-07-31 12:36 derick Category Usage problems => Usage problems (Crashes)
2016-07-31 12:38 derick Category Usage problems (Crashes) => Usage problems (Wrong Results)
2016-11-27 19:52 derick Note Added: 0003799
2016-11-27 19:52 derick Status new => resolved
2016-11-27 19:52 derick Resolution open => no change required
2016-11-27 19:52 derick Assigned To => derick