histogram binning - Steema vs MATLAB

TeeChart VCL for Borland/CodeGear/Embarcadero RAD Studio, Delphi and C++ Builder.
Post Reply
philip
Newbie
Newbie
Posts: 15
Joined: Tue Jun 26, 2007 12:00 am

histogram binning - Steema vs MATLAB

Post by philip » Mon Jul 26, 2010 8:59 pm

I'd appreciate some insight into why there is such a discrepancy between data binned by THistogramFunction and MATLAB's binning. The attached code uses my own method for binning (a standard method) which reproduces MATLAB output exactly. The differences are even more remarkable and disconcerting when using much larger data sets (I regularly use 70000+ raw data points), which significant consequences for statistics applied to the binned data. The differences between these methods are not just manifest at the beginning of the distributions, but appear throughout.

Clearly I'd like to discover that I'm doing something fundamentally wrong in my use of THistogramFunction which, if rectified, would match manual/MATLAB binning, but I'm not confident that this is where the problem lies. At the moment this is a major problem and cause for concern; any help would be very much appreciated indeed.

Attachment is for C++ Builder 5.1 and TChart 8.0.7; I get identical results in RAD C++ 2009.
Attachments
CB5 binning.7z
source and data file
(384.32 KiB) Downloaded 686 times

philip
Newbie
Newbie
Posts: 15
Joined: Tue Jun 26, 2007 12:00 am

Re: histogram binning - Steema vs MATLAB

Post by philip » Tue Jul 27, 2010 1:32 pm

The source of the discrepancy is in the binning algorithm of TeeHistogram.pas. In

Code: Select all

procedure Histogram(Data: TChartValues; var bins,counts: TChartValues; Min,Max: Double; nbins: Integer);
the use of Round() function seems to me to be inappropriate. Round() probably uses Banker's rounding which means odd and even numbers are treated differently. Replacing Round() with Trunc() has the desired effect, making the values compatible with C/C++, MATLAB etc. Thus:

Code: Select all

    j := Trunc((data[i]-min)*invbinwidth);
I'm also not convinced of the use of 0.5 in setting bin centerpoints a few lines above in the same file. Using

Code: Select all

    bins[i] := min + i*binwidth;
gives more intuitive results (others may disagree, and my impression may just be contextual).

One could edit TeeHistogram.pas in the source folder, or create a modified version of the file and function (and edit up some of the other source/Make/dpk files to make them aware of the new function), and run TeeRecompile.exe to rebuild and install the mod.
Attachments
trunc.jpg
using Trunc()
trunc.jpg (57.27 KiB) Viewed 16477 times
round.jpg
using Round() - the difference is obvious
round.jpg (59.73 KiB) Viewed 16465 times

Yeray
Site Admin
Site Admin
Posts: 9614
Joined: Tue Dec 05, 2006 12:00 am
Location: Girona, Catalonia
Contact:

Re: histogram binning - Steema vs MATLAB

Post by Yeray » Wed Jul 28, 2010 2:52 pm

Hi Philip,

We appreciate your effort and detailed study.
I've added it to the wish list to be revised asap and include it in a next maintenance release (TV52015054).
Best Regards,
ImageYeray Alonso
Development & Support
Steema Software
Av. Montilivi 33, 17003 Girona, Catalonia (SP)
Image Image Image Image Image Image Please read our Bug Fixing Policy

Narcís
Site Admin
Site Admin
Posts: 14730
Joined: Mon Jun 09, 2003 4:00 am
Location: Banyoles, Catalonia
Contact:

Re: histogram binning - Steema vs MATLAB

Post by Narcís » Thu Jul 29, 2010 9:08 am

Hi Philip,

Thanks for your feedback.
the use of Round() function seems to me to be inappropriate. Round() probably uses Banker's rounding which means odd and even numbers are treated differently.
Yes, that's correct, see Delphi's Round method documentation.
Replacing Round() with Trunc() has the desired effect, making the values compatible with C/C++, MATLAB etc.
It's not necessary as our current v8 and v2010 (aka v9) sources already produce same results as you'd expect. I bet this is due to a bug (TV52012772) fix which was discussed here. Actually TV52012772 was fixed for v8.07 as can be seen in the release notes :shock:. Can you please confirm you are using v8.07? Anyway, I will also send you an e-mail with our current version of TeeHistogram.pas so that you can check if it fixes the issue at your end.

I attach a Delphi example, similar to yours, which I created to be able to easily debug sources and which produces this chart:
histogram.jpg
histogram.jpg (265.82 KiB) Viewed 16426 times
Attachments
D2009_v8.zip
(20.97 KiB) Downloaded 620 times
Best Regards,
Narcís Calvet / Development & Support
Steema Software
Avinguda Montilivi 33, 17003 Girona, Catalonia
Tel: 34 972 218 797
http://www.steema.com
Image Image Image Image Image Image
Instructions - How to post in this forum

philip
Newbie
Newbie
Posts: 15
Joined: Tue Jun 26, 2007 12:00 am

Re: histogram binning - Steema vs MATLAB

Post by philip » Thu Jul 29, 2010 9:38 am

I'm using

Release Notes 13th April 2010
TeeChart VCL version 8
Build 8.07.70413

In that source package, TeeHistogram.pas is definitely using Round().

I was aware of TV52012772.

I have downloaded

TeeChart8.07SourceCode.exe
April 13, 2010
Build 8.07.70413
File size - 6,61 MB

again just now. The Round() function still appears in TeeHistogram.pas

Narcís
Site Admin
Site Admin
Posts: 14730
Joined: Mon Jun 09, 2003 4:00 am
Location: Banyoles, Catalonia
Contact:

Re: histogram binning - Steema vs MATLAB

Post by Narcís » Thu Jul 29, 2010 9:47 am

Hi Philip,

Yes, I know Round is still in Histogram method. However, there have been some recent changes in TeeHistogram.pas. Have you received the file I sent you? Does this work as expected?
Best Regards,
Narcís Calvet / Development & Support
Steema Software
Avinguda Montilivi 33, 17003 Girona, Catalonia
Tel: 34 972 218 797
http://www.steema.com
Image Image Image Image Image Image
Instructions - How to post in this forum

philip
Newbie
Newbie
Posts: 15
Joined: Tue Jun 26, 2007 12:00 am

Re: histogram binning - Steema vs MATLAB

Post by philip » Thu Jul 29, 2010 1:54 pm

Narcís wrote:Hi Philip,

Yes, I know Round is still in Histogram method. However, there have been some recent changes in TeeHistogram.pas. Have you received the file I sent you? Does this work as expected?
Yes, I received the TeeHistogram.pas file you sent as an attachment. It is identical to the one in the VCL 8.07 source package. (In case I had missed something, I recompiled the attached file into the library, and still get the same binning error.)

Narcís
Site Admin
Site Admin
Posts: 14730
Joined: Mon Jun 09, 2003 4:00 am
Location: Banyoles, Catalonia
Contact:

Re: histogram binning - Steema vs MATLAB

Post by Narcís » Thu Jul 29, 2010 2:06 pm

Hi Philip,

Do you have any Delphi version for trying the project I attached and check if it works fine for you? At the URL below you can download the exe I generated with my sample project. Can you please check if it works as expected at your end?

http://www.teechart.net/files/public/su ... amBins.zip

Thanks in advance.
Best Regards,
Narcís Calvet / Development & Support
Steema Software
Avinguda Montilivi 33, 17003 Girona, Catalonia
Tel: 34 972 218 797
http://www.steema.com
Image Image Image Image Image Image
Instructions - How to post in this forum

philip
Newbie
Newbie
Posts: 15
Joined: Tue Jun 26, 2007 12:00 am

Re: histogram binning - Steema vs MATLAB

Post by philip » Thu Jul 29, 2010 2:34 pm

Narcís wrote: Do you have any Delphi version for trying the project I attached and check if it works fine for you? At the URL below you can download the exe I generated with my sample project. Can you please check if it works as expected at your end?
yes (RAD 2009). I was just looking at it just now. It compiles and runs fine. The THistogramFunction (left) and manual binning chart (middle) are identical, with no difference (right chart).

However, the manual binning routine in

Code: Select all

procedure TForm1.Edit1Change(Sender: TObject);
uses Round(), and is bound to generate the same result as the left graph. [Changing this to Trunc() produces results in the middle graph which are compatible with using int() in C/C++ and MATLAB's hist() function.]

Narcís
Site Admin
Site Admin
Posts: 14730
Joined: Mon Jun 09, 2003 4:00 am
Location: Banyoles, Catalonia
Contact:

Re: histogram binning - Steema vs MATLAB

Post by Narcís » Fri Jul 30, 2010 10:42 am

Hi philip,

Oh, I see, thanks. We are a little bit worried because replacing Round for Trunc could change many customers charts unexpectedly for them. We will do some research on the file and consider the possibility of calculating histogram function based on truncated data but add the possibility of rounding it too, for example, adding RoundedData property set to false by default.
Best Regards,
Narcís Calvet / Development & Support
Steema Software
Avinguda Montilivi 33, 17003 Girona, Catalonia
Tel: 34 972 218 797
http://www.steema.com
Image Image Image Image Image Image
Instructions - How to post in this forum

Narcís
Site Admin
Site Admin
Posts: 14730
Joined: Mon Jun 09, 2003 4:00 am
Location: Banyoles, Catalonia
Contact:

Re: histogram binning - Steema vs MATLAB

Post by Narcís » Fri Jul 30, 2010 11:39 am

Hi philip,

Continuing with what I said above, we decided to add a new property to THistogramFunction called DataStyle of type TDataStyle which is an enum with those possible values: hdsTruncate and hdsRound; the first one being the default value. So, from now on, by default, you'll get histograms calculated as in the code imitating MATLAB you sent. To get previous versions histograms you can set DataStyle to hdsRound, for example:

Code: Select all

  TeeFunction1.DataStyle:=hdsRound;
I'll send you TeeHistogram.pas so that you can test this new feature at your end. This property has been added both in v8 and v2010.
Best Regards,
Narcís Calvet / Development & Support
Steema Software
Avinguda Montilivi 33, 17003 Girona, Catalonia
Tel: 34 972 218 797
http://www.steema.com
Image Image Image Image Image Image
Instructions - How to post in this forum

philip
Newbie
Newbie
Posts: 15
Joined: Tue Jun 26, 2007 12:00 am

Re: histogram binning - Steema vs MATLAB

Post by philip » Fri Jul 30, 2010 2:45 pm

Narcís wrote:Hi philip,

Continuing with what I said above, we decided to add a new property to THistogramFunction called DataStyle of type TDataStyle which is an enum with those possible values: hdsTruncate and hdsRound; the first one being the default value. So, from now on, by default, you'll get histograms calculated as in the code imitating MATLAB you sent. To get previous versions histograms you can set DataStyle to hdsRound, for example:

Code: Select all

  TeeFunction1.DataStyle:=hdsRound;
I'll send you TeeHistogram.pas so that you can test this new feature at your end. This property has been added both in v8 and v2010.
Received. The DataStyle property isn't available after I recompile 8.07 with the new TeeHistogram.pas file, either in C or Delphi. The compiled headers show the variable. Something's awry but I can't figure out what it is.

Narcís
Site Admin
Site Admin
Posts: 14730
Joined: Mon Jun 09, 2003 4:00 am
Location: Banyoles, Catalonia
Contact:

Re: histogram binning - Steema vs MATLAB

Post by Narcís » Fri Jul 30, 2010 3:18 pm

Hi philip,

Really? That's strange, it works fine for me here in v8 directly referencing the sources from Delphi. You could try adding the source code path at Tools -> Options -> Environment Options -> Delphi Options -> Library - Win32 -> Library path. Does this work for you?
Best Regards,
Narcís Calvet / Development & Support
Steema Software
Avinguda Montilivi 33, 17003 Girona, Catalonia
Tel: 34 972 218 797
http://www.steema.com
Image Image Image Image Image Image
Instructions - How to post in this forum

philip
Newbie
Newbie
Posts: 15
Joined: Tue Jun 26, 2007 12:00 am

Re: histogram binning - Steema vs MATLAB

Post by philip » Fri Jul 30, 2010 3:52 pm

Narcís wrote:Hi philip,

Really? That's strange, it works fine for me here in v8 directly referencing the sources from Delphi. You could try adding the source code path at Tools -> Options -> Environment Options -> Delphi Options -> Library - Win32 -> Library path. Does this work for you?
Paths were ok. I purged the compiler of add-in components, and checked .bpr for vestigial v2010. Whatever non-standard thing it was I did worked. You can safely assume it was a local issue.

Anyway, it [your .pas modification] works fine (ChartEditor still to do) - as it should.


I want to say something about Steema - and that simply is that your customer support is exceptionally good - it differentiates you from the others; professional, courteous, thorough and timely. As well as having a damn fine product in your hands, you care about it and it's a pleasure to use it.

Yeray
Site Admin
Site Admin
Posts: 9614
Joined: Tue Dec 05, 2006 12:00 am
Location: Girona, Catalonia
Contact:

Re: histogram binning - Steema vs MATLAB

Post by Yeray » Mon Aug 02, 2010 5:40 pm

Hi philip,

We are very pleased to hear positive opinions like yours. Thank you.
Best Regards,
ImageYeray Alonso
Development & Support
Steema Software
Av. Montilivi 33, 17003 Girona, Catalonia (SP)
Image Image Image Image Image Image Please read our Bug Fixing Policy

Post Reply