In my last blog I found out that Qt is being evil when using QPainter::eraseRect() with a QImage based textured brush. How evil? Well, calling QPainter::fillRect() with the same brush results in something like a 30-50% speedup while achieving the exact same results. Not only that, but the QPainter::eraseRect() codepath makes QImage not thread safe for painting outside the main thread because it is silently using QPixmap behind the scenes. However, this isn't the whole story. I was surprised that even with all this fixed the algorithm is still not optimal.
In fact, you can see a very substantial increase in performance using a fractal based filling and tiling algorithm. Here are the results for a 10000x10000 QImage for tiling operations:
Algorithm Brush msecs
----------------------------------------
naive checkered brush 1186
eraseRect checkered brush 4445
fillRect checkered brush 704
fractalFill checkered brush 284As you can see, the QPainter::eraseRect() is evil. Just changing to QPainter::fillRect() provides a very big decrease in the amount of time to paint a pattern. However, the fractalFill algorithm is still much faster. It cuts that time by more than half. So what is this algorithm?
void ...
{
QImage image(WIDTH, HEIGHT, FORMAT);
/* create our pattern */
QImage base(20, 20, FORMAT);
QPainter p1(&base);
p1.setCompositionMode(QPainter::CompositionMode_Source);
p1.fillRect(0, 0, 10, 10, Qt::gray);
p1.fillRect(10, 0, 10, 10, Qt::white);
p1.fillRect(0, 10, 10, 10, Qt::white);
p1.fillRect(10, 10, 10, 10, Qt::gray);
QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
/* draw the pattern once at 0,0 */
p.drawImage(0, 0, base);
const int imageW = image.width();
const int imageH = image.height();
int w = base.width();
int h = base.height();
while (w < imageW || h < imageH) {
if (w < imageW) {
/* Copy and draw the existing pattern to the right */
p.drawImage(QRect(w, 0, w, h), image, QRect(0, 0, w, h));
/* Update width of our pattern */
w *= 2;
}
if (h < imageH) {
/* Copy and draw the existing pattern to the bottom */
p.drawImage(QRect(0, h, w, h), image, QRect(0, 0, w, h));
/* Update height of our pattern */
h *= 2;
}
}
}This would be even faster if Qt were smarter about the overloads of QPainter::drawImage(). The real inline drawImage overload that does the heavy lifting takes a QRect. This forces the other overloads to create and destroy a QRect which is minimal overhead, but in a tight loop this cost starts starts to grow.
Implementing this algorithm and seeing the improvement led icefox and me to investigate whether we could use the same idea to speedup the general case of QPainter::fillRect() with a solid brush. We didn't have much hope it could be sped up as this is a fundamental painting routine in Qt, but surprise this algorithm does give a nice speedup 
Algorithm Brush msecs
----------------------------------------
fillRect solid brush took 508
fractalFill solid brush took 293This is a large performance increase in a fundamental painting routine. This same idea could be used inside of QPainter::eraseRect(), QPainter::fillRect(), and hopefully a future QPainter::drawTiledImage(). Hopefully the Trolls can investigate and see if something like this should go into Qt 4.5.
Here are the two methods side by side:
QImage fillRectSolid()
{
QImage image(WIDTH, HEIGHT, FORMAT);
QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.fillRect(QRect(0, 0, WIDTH, HEIGHT), Qt::red);
return image;
}
QImage fractalFillSolid()
{
QImage image(WIDTH, HEIGHT, FORMAT);
QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.fillRect(0, 0, 1, 1, Qt::red);
const int imageW = image.width();
const int imageH = image.height();
int w = 1;
int h = 1;
while (w < imageW || h < imageH) {
if (w < imageW) {
p.drawImage(QRect(w, 0, w, h), image, QRect(0, 0, w, h));
w *= 2;
}
if (h < imageH) {
p.drawImage(QRect(0, h, w, h), image, QRect(0, 0, w, h));
h *= 2;
}
}
return image;
}A few notes: this speedup only occurs with non-alpha channel QImage::Format's or at least pre-multiplied. With the inline overloads rearranged again it'd be even faster. I hope this is useful. Read on for the source to all of these tests...
Here you go. Happy playing!
#include <QtGui>
#define WIDTH 10000
#define HEIGHT 10000
#define FORMAT QImage::Format_RGB16
QImage naiveCheckered()
{
QImage image(WIDTH, HEIGHT, FORMAT);
QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
int w = image.width();
int h = image.height();
QBrush brush(Qt::gray);
for (int j = 0; j < h; j+= 10)
for (int i = (j % 20 == 0) ? 0 : 10; i < w;i += 20)
p.fillRect(i, j, 10, 10, brush);
return image;
}
QImage eraseRectCheckered()
{
QImage image(WIDTH, HEIGHT, FORMAT);
QImage base(20, 20, QImage::Format_RGB16);
QPainter p1(&base);
p1.fillRect(0, 0, 10, 10, Qt::gray);
p1.fillRect(10, 0, 10, 10, Qt::white);
p1.fillRect(0, 10, 10, 10, Qt::white);
p1.fillRect(10, 10, 10, 10, Qt::gray);
QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.setBackground(QBrush(base));
p.eraseRect(QRect(0, 0, image.width(), image.height()));
return image;
}
QImage fillRectCheckered()
{
QImage image(WIDTH, HEIGHT, FORMAT);
QImage base(20, 20, QImage::Format_RGB16);
QPainter p1(&base);
p1.fillRect(0, 0, 10, 10, Qt::gray);
p1.fillRect(10, 0, 10, 10, Qt::white);
p1.fillRect(0, 10, 10, 10, Qt::white);
p1.fillRect(10, 10, 10, 10, Qt::gray);
QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.fillRect(QRect(0, 0, image.width(), image.height()), QBrush(base));
return image;
}
QImage fractalFillCheckered()
{
QImage image(WIDTH, HEIGHT, FORMAT);
QImage base(20, 20, FORMAT);
QPainter p1(&base);
p1.setCompositionMode(QPainter::CompositionMode_Source);
p1.fillRect(0, 0, 10, 10, Qt::gray);
p1.fillRect(10, 0, 10, 10, Qt::white);
p1.fillRect(0, 10, 10, 10, Qt::white);
p1.fillRect(10, 10, 10, 10, Qt::gray);
QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.drawImage(0, 0, base);
const int imageW = image.width();
const int imageH = image.height();
int w = base.width();
int h = base.height();
while (w < imageW || h < imageH) {
if (w < imageW) {
p.drawImage(QRect(w, 0, w, h), image, QRect(0, 0, w, h));
w *= 2;
}
if (h < imageH) {
p.drawImage(QRect(0, h, w, h), image, QRect(0, 0, w, h));
h *= 2;
}
}
return image;
}
QImage fillRectSolid()
{
QImage image(WIDTH, HEIGHT, FORMAT);
QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.fillRect(QRect(0, 0, WIDTH, HEIGHT), Qt::red);
return image;
}
QImage fractalFillSolid()
{
QImage image(WIDTH, HEIGHT, FORMAT);
QPainter p(&image);
p.setCompositionMode(QPainter::CompositionMode_Source);
p.fillRect(0, 0, 1, 1, Qt::red);
const int imageW = image.width();
const int imageH = image.height();
int w = 1;
int h = 1;
while (w < imageW || h < imageH) {
if (w < imageW) {
p.drawImage(QRect(w, 0, w, h), image, QRect(0, 0, w, h));
w *= 2;
}
if (h < imageH) {
p.drawImage(QRect(0, h, w, h), image, QRect(0, 0, w, h));
h *= 2;
}
}
return image;
}
void makeTab(QTabWidget *w, const QImage &image, const QString &text)
{
QLabel *label = new QLabel;
label->setPixmap(QPixmap::fromImage(image));
w->addTab(label, text);
}
int main (int argc, char **argv)
{
QApplication application (argc, argv);
qDebug() << "starting tests...";
QTime time;
time.start();
QImage naiveCheckImage = naiveCheckered();
qDebug() << "naive \t\tcheckered brush took\t" << time.restart();
QImage eraseRectCheckImage = eraseRectCheckered();
qDebug() << "eraseRect \tcheckered brush took\t" << time.restart();
QImage fillRectCheckImage = fillRectCheckered();
qDebug() << "fillRect \tcheckered brush took\t" << time.restart();
QImage fractalFillCheckImage = fractalFillCheckered();
qDebug() << "fractalFill \tcheckered brush took\t" << time.restart();
QImage fillRectSolidImage = fillRectSolid();
qDebug() << "fillRect \tsolid brush took\t" << time.restart();
QImage fractalFillSolidImage = fractalFillSolid();
qDebug() << "fractalFill \tsolid brush took\t" << time.restart();
#if 0
QTabWidget w;
makeTab(&w, naiveCheckImage, QLatin1String("naive checkered"));
makeTab(&w, eraseRectCheckImage, QLatin1String("eraseRect checkered"));
makeTab(&w, fillRectCheckImage, QLatin1String("fillRect checkered"));
makeTab(&w, fractalFillCheckImage, QLatin1String("fractalFill checkered"));
makeTab(&w, fillRectSolidImage, QLatin1String("fillRect solid"));
makeTab(&w, fractalFillSolidImage, QLatin1String("fractalFill solid"));
w.show();
return application.exec();
#else
return 0;
#endif
}
The reason the fractal fill
The reason the fractal fill is faster with ARGB32_PM for you, is (I suspect) because you still have the RGB16 images as the source. This means that there will be conversions from RGB16 -> ARGB32 during the blending. The fracal routine doesn't have this as it continously draws ARGB32 into ARGB32. If you switch the source images to be ARGB32, you should see normal fillrect performing roughly the same.
As for drawTiledImage(), there are no plans. The feature is already there with fillRect() using an image brush.
The only reason I would suggest drawTiledImage...
Is so that eraseRect doesn't silently use drawTiledPixmap behind your back which is just wrong as noted in previous blog.
Also, I was using ARGB32_PM for both the source and destination images so there was no conversion going on. You can see from the source that there is a define at the top: FORMAT which sets the format for both the source and destination.
Cheers,
Adam
My point was your
My point was your fillRectCheckered and some of the other functions hardcodes the format of "QImage base" to RGB16, disregarding the FORMAT define. Also, the pixmap conversion in erase() will be removed.
Numbers on my system
Maybe of little interest now, since it turns out that Qt 4.5 will have a faster implementation yet, but I ran the benchmark and thought you might be interested in seeing the numbers. Of course then I had to wait for my account to clear before I could post a comment
The numbers are at:
http://pastebin.com/m40bd6a5a (MacOS X, Macbook Pro Core 2 Duo)
Cool that you are providing
Cool that you are providing these benchmarks, we're always looking for points of improvement in the Qt graphics system.
For 4.5 we are investigating several performance areas and this case is already covered. At least on my local build of Qt 4.5 optimization branch, the fractal filling runs slower than a plain fillRect() or drawTiledPixmap when done on a QImage.
Also, I noticed that you are using RGB16 as the format. Which format you use usually have a lot to say. For instance the ARGB32_Premultiplied format is highly optimized and runs fillRect() roughly twice as fast, because of MMX and SSE, at least on x86.
Formats
Yes, I ran it on ARGB32_Premultiplied and the fractal fill is still faster. I'm glad to hear that your local branch of Qt 4.5 optimization is even faster still. That's awesome! Do you have any idea when this will hit the snapshots?
Also, are you planning on introducing drawTiledImage? As no matter if it is faster, it is just plain wrong to convert a QImage to a QPixmap for tiling purposes especially when the canvas device you are drawing to is QImage.
When playing with KDE4 I
When playing with KDE4 I noticed a massive CPU usage when simply hovering over listviews (not sure what exact class that is), for instance the plugin lists in KRunner config or KWin effects config, also the process list in the process manager (accessible via KRunner). Simply hovering over the lists elements makes my CPU (Pentium M 1,4GHz) go up to 80-90% (half Xorg and half application). Could this be the same cause?
What???
What???
You talked about inefficient
bug
Did you file a bug at Qt's bugtracker? I suppose it isn't possible to add a patch over there. Maybe something Nokia should think about adding, after all, opening up is good, right?