Optimizing the vector math

Discuss issues specific the Scala port of Box2D
Villane
Posts: 101
Joined: Mon Sep 01, 2008 6:32 am

Optimizing the vector math

Postby Villane » Fri Jun 26, 2009 1:26 am

Like I've said earlier, I would like to optimize ScalaBox2D (mostly to reduce vector creations), while leaving the math code as readable as possible. Unfortunately, so far I've gotten most improvement from manually inlining the math.

I was really pleasantly suprised how much JBox2D could reduce vector creations by using *ToOut methods in calculations. But I don't want to do the same in Scala, I want to keep the math code as clean as I can, still use binary operators (and get rid of the manual inlining at some point). I'm not really happy that the current choice in the Scala port is to either write clean math code or to write fast math code, not both at the same time.

So I think there are basically 3 options:
  • Write an optimizing compiler plug-in which does roughly the same optimizations as manual inlining. This means getting to know the Scala compiler. Might be a lot of work, but in the end it also might be easier than the other options.
  • Write a byte-code manipulator (with ASM?) that does the same thing that the compiler would. This is initially easier (more documentation & samples available) but might turn out more difficult in the end due to working on a lower level than the result of some intermediary compiler phase.
  • Try to come up with some mutable vector API that remains similar to the current one. The only one I've come up with so far uses Lazy calculations structures, but they can easily create even more temporary objects and may end up having worse performance in some cases.

What do you think? I started playing with ASM, but while it is easy to get started with it, it looks like too low level and I'm currently leaning towards compiler plugin.

The Lazy calculations would look something like this:

Code: Select all

trait V2 {
  def x: Float;
  def y: Float;
  def +(v: V2) = AddVV(this, v)
  def -(v: V2) = SubVV(this, v)
  def +(a: Float) = AddVF(this, a)
  // ...
}
abstract class BinOpVV(v1: V2, v2: V2)
case class AddVV(v1: V2, v2: V2) extends BinOpVV with V2 {
  def x = v1.x + v2.x
  def y = v1.y + v2.y
}
abstract class BinOpFV(a: Float, v: V2)
class BinOpFV(a: Float, v: V2)
case class CrossFV(a: Float, v: V2) extends BinOpFV with V2 {
  def x = -a * v.y
  def y = a * v.x
}
// ... lots of calculation classes
class MutableV2 extends V2{
  var x = 0f; var y = 0f
  def :=(v: V2) = { x=v.x; y = v.y; }
  def :=(_x: Float, _y: Float) = { x= _x; y = _y; }
}

val dv = new MutableV2
// this expression
dv := v2 + w2 × ccp.r2 - v1 - (w1 × ccp.r1)
// would then turn into this lazy calculation structure:
dv := SubVV(SubVV(AddVV(v2, CrossFV(w2, ccp.r2)), v1), CrossFV(w1, ccp.r1))


So this is currently even slower. But this might be faster with the inlining the compiler/JIT does (haven't tested yet):

Code: Select all

dv := v2
dv += w2 × ccp.r2
dv -= v1
dv -= w1 × ccp.r1

toucansam
Posts: 356
Joined: Mon Jun 08, 2009 12:21 pm
Contact:

Re: Optimizing the vector math

Postby toucansam » Fri Jun 26, 2009 10:13 pm

Villane wrote:I'm not really happy that the current choice in the Scala port is to either write clean math code or to write fast math code, not both at the same time.

It's very hard to find a way to write clean yet fast math code.

Villane wrote:So I think there are basically 3 options:
  • Write an optimizing compiler plug-in which does roughly the same optimizations as manual inlining. This means getting to know the Scala compiler. Might be a lot of work, but in the end it also might be easier than the other options.
  • Write a byte-code manipulator (with ASM?) that does the same thing that the compiler would. This is initially easier (more documentation & samples available) but might turn out more difficult in the end due to working on a lower level than the result of some intermediary compiler phase.
  • Try to come up with some mutable vector API that remains similar to the current one. The only one I've come up with so far uses Lazy calculations structures, but they can easily create even more temporary objects and may end up having worse performance in some cases.

What do you think? I started playing with ASM, but while it is easy to get started with it, it looks like too low level and I'm currently leaning towards compiler plugin.


First of all, the how would the lazy calculations speed things up? You are still creating objects.

Hm... a compiler pluggin sounds very interesting, you could even create it as a separate project, and I'm sure people would love to use it on their projects. I can see it getting pretty hairy though.

Code: Select all

dv := v2
dv += w2 × ccp.r2
dv -= v1
dv -= w1 × ccp.r1

This is basically what all the ToOut methods did, where you're always putting the result into an existing object.

In terms of asm, I've been thinking recently about creating something to pool chosen objects as "static final" in the class that uses them. That's what I'm doing manually right now, and although it's strictly single threaded (especially because multiple objects are using that same pool object), it really speeds things up. But I always end up using the ToOut methods with these objects, so this line of approach wouldn't really work for you.

In the end, you need some way to reuse objects, or just inline all the math. If you don't like the ToOut methods, (which seem pretty clean to me), then maybe you can achieve the same result with an operator. Maybe something like

Code: Select all

v2 -~ v1, temp;
temp x~ v2, dv;

Where the squigly signifies that the result be put in the second argument.
Or, can you do something like this?

Code: Select all

v2 - v1 ~ temp;
temp x a ~ dv;
mat * dv ~ dv;

Where the squigly signifies that the previous operation be stored in the next variable, without creating a new object. Maybe that's something you could write in asm or the compiler plugin.

ewjordan
Posts: 803
Joined: Sun Sep 23, 2007 2:35 pm

Re: Optimizing the vector math

Postby ewjordan » Sat Jun 27, 2009 4:13 am

Re: ASM/compiler plugin, if you go this route my suggestion would be to annotate variables which can be stack allocated/scalarized and then do so in the plugin based on the annotation. It's usually pretty easy to get the current block scope with any sort of bytecode framework, so that will tell you where to put the get/release calls.

The real trick is to make anything like this be functionally equivalent without running the instrumentation task, at least in my opinion. From what I remember, this is a slight problem with JStackAlloc, it doesn't degrade gracefully and if the task is not run, the program can't run either (maybe I'm wrong about that now? I only saw an early version). To me, at least, a better situation would be that if someone compiles the stuff without running the optimization task it will still run perfectly, even if it doesn't use the optimizations. Variable annotations are ideal for that kind of task, though I'm not sure if they add to the object overhead or not...

Scala doesn't have any mechanism for destructors either available or in the works, does it? A real destructor, called immediately when a reference goes out of scope, would make this sort of thing absolutely trivial within the language, and can be useful in other situations as well (I've always liked having the choice to use scoped locks with Boost threads, for instance, since it cuts the code you need in half).

Villane
Posts: 101
Joined: Mon Sep 01, 2008 6:32 am

Re: Optimizing the vector math

Postby Villane » Sun Jun 28, 2009 11:27 am

I decided to take the compiler plug-in route. It's quite hard to understand what is going on in that code at times, but I'm starting to get some idea how it works. Well, I've never studied compilers so it should be hard, I guess :)

toucansam, yes you are right. The lazy calculations can't speed things up. It was a stupid idea, but a very simple test I did made it seem like it could work. But in a real situtation it only got worse.
(My idea was that maybe the JIT/Escape analysis can better determine about the temp. calculation objects that they do not escape the scope and do some magic.)

Villane
Posts: 101
Joined: Mon Sep 01, 2008 6:32 am

Re: Optimizing the vector math

Postby Villane » Mon Jun 29, 2009 12:54 pm

ewjordan wrote:Re: ASM/compiler plugin, if you go this route my suggestion would be to annotate variables which can be stack allocated/scalarized and then do so in the plugin based on the annotation.


I'm trying scalarizing at the moment. Currently I'm thinking of scalarizing every variable in an annotated method. But if it turns out suboptimal, I'll see about variable annotations. I would prefer as little annotations as possible, but method level seems sensible currently.

Villane
Posts: 101
Joined: Mon Sep 01, 2008 6:32 am

Re: Optimizing the vector math

Postby Villane » Mon Jun 29, 2009 2:12 pm

Damn, it's actually pretty fun working on this optimizer, after I got over the initial confusion and not understanding the errors I got from the compiler. I've already got a few very basic cases working.
Vector +-*/ Float
Vector +- Vector

But it doesn't change references in the rest of the code yet so I've not gotten to the hard parts yet. These are some tests already working:

Code: Select all

  @inline def compute(center: Vector2) = {
    val v1 = center * 2f - 2f
    v1
  }

  @inline def compute2(v1: Vector2, v2: Vector2) = {
    val v = v1 * 2f - v2 * 2f
    v
  }

BorisTheBrave
Posts: 1911
Joined: Mon Jan 07, 2008 10:51 am
Contact:

Re: Optimizing the vector math

Postby BorisTheBrave » Tue Jun 30, 2009 2:32 pm

Vector + float?

Villane
Posts: 101
Joined: Mon Sep 01, 2008 6:32 am

Re: Optimizing the vector math

Postby Villane » Tue Jun 30, 2009 3:53 pm

You're right, that doesn't make a lot of sense. Except as a convenience method when computing AABB's (or similar things).

Villane
Posts: 101
Joined: Mon Sep 01, 2008 6:32 am

Re: Optimizing the vector math

Postby Villane » Wed Jul 01, 2009 2:56 pm

Making some progress. The optimizer is already almost able to optimize an actual method in Box2D (except I have some work to do to support scoping). I'm using an annotation @sr (for Scalar Replacement) to mark methods for optimization.

Also, note that I'm separating the vecmath package from the engine to a new project. This is done because
* I always planned to do that actually
* I think a vecmath library usable for both physics, graphics, game engine is a good idea (of course, this needs a lot of work and addition of 3D math)
* Otherwise there is a circular dependency: box2d <- optimizer <- box2d. With this separation it becomes vecmath <- optimizer <- box2d.

Here's Segment.testSegment as it is originally (except more verbose, as this is the output after compiler has passed some phases already:

Code: Select all

@sr def testSegment(segment: Segment, maxLambda: Float) = {
  val s: Vector2 = segment.p1;
  val r: Vector2 = segment.p2.-(s);
  val d: Vector2 = Segment.this.p2.-(Segment.this.p1);
  val n: Vector2 = d.cross(1.0);
  val slop: Float = 100.0.*(Settings.Epsilon);
  val denom: Float = r.dot(n).unary_-;
  if (denom.>(slop))
    {
      val b: Vector2 = s.-(Segment.this.p1);
      val a: Float = b.dot(n);
      if (0.0.<=(a).&&(a.<=(maxLambda.*(denom))))
        {
          val mu2: Float = b.cross(r);
          if (slop.unary_-.*(denom).<=(mu2).&&(mu2.<=(denom.*(1.0.+(slop)))))
            return SegmentCollide.hit(a./(denom), n.normalize)
        }
    };
  SegmentCollide.Miss
}

And this is the optimized code (with the scoping-related strangeness removed):

Code: Select all

@sr def testSegment(segment: Segment, maxLambda: Float) = {
  val s$x: Float = segment.p1.x;
  val s$y: Float = segment.p1.y;
  val r$x: Float = segment.p2.x.-(s$x);
  val r$y: Float = segment.p2.y.-(s$y);
  val d$x: Float = Segment.this.p2.x.-(Segment.this.p1.x);
  val d$y: Float = Segment.this.p2.y.-(Segment.this.p1.y);
  val n$x: Float = d$y.*(1.0);
  val n$y: Float = d$x.*(1.0.unary_-);
  val slop: Float = 100.0.*(Settings.Epsilon);
  val denom: Float = r$x.*(n$x).+(r$y.*(n$y)).unary_-;
  if (denom.>(slop))
    {
      val b$x: Float = s$x.-(Segment.this.p1.x);
      val b$y: Float = s$y.-(Segment.this.p1.y);
      val a: Float = b$x.*(n$x).+(b$y.*(n$y));
      if (0.0.<=(a).&&(a.<=(maxLambda.*(denom))))
        {
          val mu2: Float = b$x.*(r$y).-(b$y.*(r$x));
          if (slop.unary_-.*(denom).<=(mu2).&&(mu2.<=(denom.*(1.0.+(slop)))))
            return SegmentCollide.hit(a./(denom), n.normalize)
        }
    };
  SegmentCollide.Miss
}


BTW, with this kind of access to an intermediary compiler phase and the fact that you can print source code from that phase, one could first write the higher-level code, then just make it print out/save the modified source and apply the optimizations to source code. But of course that's not what I want to do.

BorisTheBrave
Posts: 1911
Joined: Mon Jan 07, 2008 10:51 am
Contact:

Re: Optimizing the vector math

Postby BorisTheBrave » Thu Jul 02, 2009 12:54 pm

That's AWESOME. Can it handle vectorizing expressions ( a*b+c) and matrices?


Return to “Scala”



Who is online

Users browsing this forum: No registered users and 2 guests