Axis labeling code described in “An Extension of Wilkinson’s Algorithm for Positioning Tick Labels on Axes”.
Updated version of the paper with corrections to typos in the weighting formula in section 4 and the density formula in section 4.3. Thanks to Ahmet Karaham for help catching them.
R package containing our implementation of Heckbert’s labeling algorithm, Wilkinson’s, our extensions, and a number of others. This does not yet include the legibility component. You should be able to install it from within R using: install.packages("labeling", repos="http://R-Forge.R-project.org").
matplotlib code (change the extension to .py). Has not been tested in any detail. I’m not a matplotlib user, so I’m not sure that I’ve got the integration with matplotlib correct. I’d appreciate feedback on how to improve this.
A C# implementation is available on github that implements the full algorithm in the paper including format changes. I had to refactor this from a much larger code base, so it may have some small bugs. Let me know if you run into any problems.
A Java implementation, not including the formatting stuff, by Ahmet Karahan.
Hi Justin,
I just read your paper on axis labeling and was very impressed by the results of the algorithm. I am a recent Biochemistry Ph.D. and am interested in developing a specific type of scatter plot program. I have no background in CS other than what I’ve been able to teach myself. I am writing to ask if your axis labeling code will be available in an open source Java implementation? I would like to try to implement your code in my program if possible. Thanks, and good luck with getting your Ph.D.!
I don’t have a Java implementation at the moment. I’m preparing a C# implementation for release in the next few weeks. I have a limited R version working as well. It should be pretty straightforward to port the C# implementation to Java. I’ll let you know when it’s available.
Hi,
I’m working on a Java implementation of your paper “extended Wilkinson axes labeling”. However in your pseudo code of searching algorithm I could not decipher the different notations of the same functions. You gave a definition of simplicity, density and coverage, and I wrote functions with appropriate parameters. But what about the functions of the same name in the inner most loop of your pseudo code.
What you mean for example with a simplicity calculation simplicity(q,j,lmin,lmax,lstep) I can calculate it with Q, i, j, v() as you defined in the paper, What about the unneccary parameters q (if it’s not Q), lmin, lmax, lstep. Same questions about the other two…
Please clarify all such things to be able to usefull. I think your article of Extented Wilkinson algorithm is full with excellent ideas, but it hides a lot due to strange notations and extremely terse sentences. One more example the parameter i. The first question was if it starts from 0 (like C/Java) or from 1. The answer is not found in your paper, but in Wilkinson’s book of GoG.
Sorry you’re having trouble; you point out some valid problems with the description in the paper. I’ll try to get my code out quickly since that would be the clearest answer. However, I’ll try to address your specific questions.
The simplicity, density, and coverage functions in the inner loop are the functions defined in sections 4.1-4.3. For your question about the parameters, I agree that the paper is not completely clear. However, the parameters are used. For example, in the simplicity function I use lmin, lmax, and lstep to determine the value of *v* (whether or not the sequence includes 0). The variable q is the element of Q being used to generate the sequence and it is necessary to compute *i* (e.g. q = Q[i]). However, depending on how you iterate through Q in the outer loop, you may have i already available and you can pass it in directly.
The simplicity_max, density_max, and coverage_max functions used in the outer loops are approximations to the actual functions. They are used to restrict the search space, but not in the final score (that’s why the actual functions have to get called in the inner loop). The approximations are described in the text of section 5. This, I think, is one of the harder things to get directly from the paper.
i does start at 1. I doubt that starting at 0 would have a large impact on the results, but that is a detail I should have included.
Good luck on the implementation. I’ll let you know when my code is out.
Thank you for your relatively rush answer. It clears up my questions some degree. I use a helper function v() in my implementation and that’s why I never thought that these parameters required to calculate v(). My implementation determines v as following:
public int v() {
return (offset == 0 || offset-stepSize == 0) ? 1 : 0;
}
I have “i” already available, and do not require q.
In my current work I’ve done with the nice step size generator, and tested it comparing with Table 1 in a unit test. And I had trouble with search algorithm as I wrote here. If you’ evaluate the search algorithm within your code, please send me this section earlier.
I’ll send you my Java implementation when I’ve done with it, so the people benefit both of them approximately at same time.
Thank you again.
Would this algorithm be included in your sister project ‘Protovis’
We’ve discussed it, but have not made any actual progress incorporating it yet. It should be trivial to port from R to JavaScript, I just haven’t had time yet.
Hi,
I’ve translated the R code to C, after reading your very interesting paper.
I don’t know how to translate k<Inf … sure I will not pretend to understand your algo
but well, using <1000 for <Inf, I get very good results !
Could you explain these break conditions?
And what about le Q numbers… how to set them ? Could you detail in a more simple way
Well, after achieved a better quality code, I could post it for anyone interested. I use Qvector class from Qt framework but it can be easily translated in STL, BOOST or other
regards,
Michel
Hi Justin
When will the c# code be available?
Thanks
Could you please share the dataset you used to generate the graphs in the paper.
Thanks
TIm
Tim, the dataset was randomly generated using code that you can find the in R library that’s been released.
Real soon, I promise.
Inf is just infinity. In C, you could either use the true floating point infinity, or just set it to a really large value like you did.
The break conditions are used to shortcut the search. It’s pretty straightforward to prove that if the break condition succeeds that any further labelings generated by that loop would be worse than the best one found so far. So we quit out of that loop and go on to the next iteration of the containing loop.
Id be glad to see your code.
Hi,
I am interested in using your labeling approach for our visualization framework which is in Java. I was wondering if there is a Java implementation might be available sometime soon, or I should try to port the C# version myself. Also wondering if Ahmet Karahan has finished his Java implementation and if he is willing to share it.
Thanks!
so… the answer is yes?
Here’s Ahmet’s Java code which he’s given me permission to post. I haven’t looked at it too carefully, yet. If you run into any problems, let me know and I’ll pass the message to Ahmet.
very interesting work. i have implemented the majority of your stuff in matlab and if interested will send you a copy (each method extended, wilkinson… are separate functions) if you are interested.
minor bug in the gnuplot portion code (at least the R), towards the bottom where graphmin & graphmax are set you have the values min & max vs dmin & dmax. I have see numerous references to R before but this is the first time i have played with it.
have you attempted to apply any of this to log scales? (i simply tried each one and using the actual values 1.0e-12 to 1.0e-6 did not work and trying -12 to -6 sometimes gives me non integer exponents). thanks
Hi: In hopes to get some insight, I would like to know if there is a way to force the zero value to be an axis label. I have found that in some types of charts it looks weird if the zero value along the y axis is not displayed. This is the case for the bar chart, amongst others, specially if you have both positive and negative values.
I know it is possible to tweak the settings, but this will depend on the data ranges being assigned. If anyone would like to share information on how this could be achieved predictably, independent of the other settings, I would be immensely grateful.
The simplicity component of the optimization score includes a term for whether or not the labeling includes 0. You can increase the weight on this term if you like.
Thanks, I’ll check the gnuplot code. I haven’t used this for log scales, but using the exponents (e.g. -12 to -6) should provide plausible, if not optimal, results. You’re right that for log scales, non-integer numbers are much less nice than for standard scales. You would have to adjust the niceness score to account for this.