I believe Matthias is referring to your original tile method. For instance, if you have 6 images you want to stich together, that is 6 texture binds using the tile method. If you load the images as the 3rd coord in a 3D texture, you save yourself the extra texture binds (which is expensive, many various engines do a secondary sort of objects to draw as many objects using the same texture as possible). You then only need to specify which "tile" to use as the 3rd texture coordinate for each quad.
The most consuming part of each frame is the rendering of the map, which is done by layers. The map has 3 layers, read in three for loops (layer > x > y). The tileset texture is bound once for each layer, so it surely isn't a bottleneck. Between each layer, sprites are drawn, but I have never tested my game with more than 3 sprites at the same time, usually only 1 (the player's sprite).
Each sprite has one texture with all the "poses", and only a part of the texture is drawn depending on the animation frame. Either case, it is always one bind per sprite here.
Regarding using TextureRectangle, how much slower are we talking here? If it's reeaaaally slow, is there any chance you are accidently reloading the texture in your loop?
FPS normally in the log-in scene of my game: ~670
FPS using the ARB thingy to draw the background: ~660
The difference is pretty much insignificant in this case, maybe for the drawing of the map it could be significant.
But I was expecting to see a bigger difference when drawing, a significant increase of the FPS. At least 20 more FPS for this one image.
Note that the background in this case is drawn without resizing. Normally the background is resized to fit the screen. This should probably be the only image being resized in the entire game.
Things drawn in the log-in scene: an 800x600 32bpp background, a "log in box" with an image background (346x92 32bpp), two text input boxes which are in fact just two rectangles, and the logo (608x236 32bpp).
slow TextureRectangle rendering could also be a sign of bad support by your GPU. Some HW has issues with NPOT textures. Which GPU do you use and which OS?
GL_VENDOR: NVIDIA Corporation
GL_RENDERER: GeForce 9400 GT/PCI/SSE2
GL_VERSION: 3.2.0
GL_MAX_TEXTURE_SIZE: 8192
OS: Windows 7 64 bits
Available Processors:4
Available VRAM: 512MB
Available RAM: 2GB
Processor: AMD Phenom 9950 Quad-Core Processor 2.60 GHz
Texture stacks are available on nVidia starting with the 8800 series.
My game is running at over 600 FPS in my computer so it will run at 60FPS in older computers.
It will be an MMO RPG run in an Applet, and the target public is teenagers, who generally can't buy a new computer and may have a rather old one. Also, if a player is not at his own computer and wants to play in whatever computer, he should be able to. So 8800 series would be too high a requirement for my game.
Another way to speedup rendering is by sorting your sprites/tiles accordingly, either try to draw stuff sorted based on their texture - this only works when they don't overlap, or try to group the sprites/tiles based on their usage. This of course will only help if you remove redundant state changes - like glBindTexture, glBegin/End etc.
In the tested game scene, this is irrelevant, as there are only about 3 images, using a different texture each. This scene is pretty much irrelevant to the game itself, too. The rest of the game consist of rendering of tiles and sprites, which I optimized as much as I could think of.
Also make sure that you render your tiles 1:1 (without scaling down) - as texture rectangles don't have mipmaps. When you minimize a texture without mipmaps you get a) alias effects b) slower rendering compared to rendering with mipmaps.
Nothing is scaled. I'm 100% sure of this. Although, in other games I'm planning, I'd like zoom in and out, like in Yoshi's Story or Metal Slug 6. (2D as well)
It also helps to disable unused features like alpha blending, depth tests/updates, stencil etc. Also never disable individual color channels (glColorMask).
All check.
---
Tried to answer everything on this one, phew.
Now, considering my game seems to rune fine in GeForce 6000 series and I want to keep it that way, and that from the test, the ARB thingy didn't seem to make any difference, should I recode a lot what is already working to
use texture stacks?
use the ARB thingy?
Is it worth it?