Monday, 3 December 2012

WME Testing: Chivalry is not dead

As I've been a bit busy the past few months, the Wintermute-engine port for ScummVM hasn't seen as much progress as I would have liked, but I still think it's time to get the ball rolling a bit again: 

One of the games I tested against this summer was Chivalry is not dead. So far I haven't had any issues with this game, BUT, this game is also very open-ended, and thus a bit hard to test thoroughly. 

Thus any help testing this game would be greatly appreciated. 

Just grab a copy of the game from the link above, and a nightly build of ScummVM, and test the game. Report any issues to the ScummVM Bug-Tracker, in the group "Wintermute". I would also love to hear in this thread if you played through the game without any issues at all. 

Note: Chivalry uses the Arial-font currently, which we haven't yet worked around (eventually a similar FreeFont will be used), so to get decent fonts in the game, you'll have to put arial.ttf in the game's folder (on macs you can find it in /Library/Fonts, on Windows it should be in C:\Windows\Fonts, linux-users can either download a replacement from FreeFonts/OpenFonts, or get some solution involving Microsoft Core Fonts.)

Saturday, 8 September 2012

Aftermath of GSoC

Ok, so GSoC is over, I passed and a few things has happened since then:

Some games had issues with interlaced PNGs and TGAs, this should be less of an issue, as I posted pull-requests for using libpng (instead of the internal variation that was in use before), as well as a TGA-loader, both of which have been accepted and merged now. (This means that Five Magical Amulets and Five Lethal Demons now atleast start).

The Wintermute-engine was opened as a pull-request, which after some adjustment and discussion, ended up getting merged into the master-branch for ScummVM. This means that the Wintermute-engine is available in the latest daily builds, although it is still marked as unstable, and no specific games are officially supported at this point. (BUT, from personal experience atleast Dirty Split and Chivalry is not Dead should be completable). The development of the engine will continue there instead of in my own fork.

Speaking of which, I am now joining the ScummVM-team as a developer, to continue working on the Wintermute-engine (and hopefully getting a few games release-worthy). The road ahead for Wintermute isn't short, as there are plenty of bugs left to squash, and plenty of performance left to improve upon, so I guess I'll have my hands full for a while longer.

So, if you want to try out the results of this summer's work, go ahead and grab a daily build if you are on Windows, or build your own if you are on OS X or Linux.

Oh, and while I won't be accepting bug-reports for specific games for a while yet, feel free to send me detection-entries for the multitude of games I haven't gotten around to adding yet.

Einar Johan

Tuesday, 7 August 2012

A slow week

This has been a rather slow week, my exams are starting up, and I only got in some time to start looking into the slowness that still persists with this engine.

The major source of slowness at the moment, stems from the fact that the SDL-backend in ScummVM seems to want to down-convert 32bpp to 16bpp, and since the engine is still doing full-screen updates, that means a full-screen down-conversion for every frame, which takes quite a bit of time. The problems from this can be avoided by using the OpenGL-backend, although that still has problems of it's own (like not supporting the PixelFormat I originally started out with (RGBA)), it does a decent job as a way of testing the other parts of the drawing pipeline without drowning them out in conversion.

Getting the down-conversion out of the way, revealed that the single blitting-function I'd been using was indeed rather heavy, and I'd split it out earlier to two separate functions, where one supported colour-masking, and the other didn't, making it easier to see in a profiler how much of the drawing actually used colour-masking (very little). I added in an additional function for handling opaque drawing, which helped quite a bit on the speed (skipping atleast 3 multiplications, 3 shifts and 6 additions per pixel).

For Dirty Split the numbers I get on my computer are roughly (Using the OpenGL-backend):
27 % Opaque-draws
20 % Scale
9 % Alpha-blitting

Where percentages are of the total CPU-load for the process (which now actually can be different from the CPU-load for the entire core it runs on, at max FPS). Now, there is obviously room for improvement here, in particular, the scaling, which as a result of the changes I did to implement dirty rects, ended up being done for every draw (basically, the non-dirty rect version would create renderTickets that were instantly drawn, and then discarded, and since the renderTickets were responsible for keeping the scaled copies, they were rescaled every frame). Thus I ended up partially adding renderTickets to the full-screen update system:

Upon a render call, the render-queue is checked for a matching render-ticket, if one exists, that ticket is used for drawing, otherwise, a new ticket is generated, drawn and added to the render-queue. Any unused ticket in the queue (a misnomer really, as it's really a list, but hey...) is deleted when the screen-buffer is flipped onto the screen.

This gives the benefit of not having to rescale every frame, now the results after doing that weren't exactly comparable, as I didn't exactly set forward a specific enough test (simply watching a bit of the intro movie, and then letting it settle for a while in the first frame), but, atleast the scaling is almost nowhere to be seen:
26.6% Opaque-draws
25.8 % Alpha-blits
2% Scale

This makes for quite an improvement, and should possibly make the games playable on a bit slower computers than what has been the case so far.

Monday, 30 July 2012

Autumn Cleaning

The past week has seen quite a few deletions from the codebase, I've removed the debugger, the registry and a lot of smaller pieces here and there in the code, all stuff that isn't really necessary in a ScummVM-version of the engine (like in-engine fullscreen-switching-handling).

I've also found time for quite a bit of cleanup, with the assistance of the nifty astyle tool I added braces to all if-, for- and while-statements, breaking single-line ifs that looked like this:
if (conditional) doStuff();

into:
if (conditional) {
    doStuff();
}

which made for quite a bit more readable and consistent code.

I've also removed Base as super-class for a few classes that didn't really need to reference BaseGame (formerly CBGame), thus easing the cross-file-dependencies that were making the compile-times horrible.


File Management


The package management in the file manager has been refactored into a subclass of Common::Archive, which at this point means that it's quite close to being able to be added to SearchMan, and then just using SearchMan directly, one of the blockers for that is the filename-handling for absolute paths (i.e. redirecting C:\Windows\fonts somewhere usefull, another is that quite a few classes use the "readWholeFile"-function (those that just got a bad case of "endianness"-itch, can relax, it's only used for text files)).

Save games
Where formerly save games were taking ages and ages to load/save, they now should take quite a bit shorter, the speed-issue here came from the fact that the "load-progress-indicator" was drawn for every instance of every class that was loaded/saved, this in turn made a full screen update (as everything does for the moment, for lack of fully working dirty rect-updates). This meant that for every millisecond of usefull load/save-work, the engine would be doing hundreds of milliseconds of work updating the screen just to possibly move a progress bar another pixel.

I changed this behaviour to only update the exact region of the screen where the indicator is, and only as far as the indicator has progressed (and, only when the progress bar has changed at all). This might result in minor differencies in how the progress bar looks (as it isn't redrawn on top of itself for every single instance any longer). But that's a cost I think most users will be willing to take when the end result is a decrease in save/load-time by a factor of 10.

Lazy loading of images
The engine used to take a few seconds to startup a game on my computer, the reason for that was simply that all the images in the entire game would be loaded. I looked into this, and the only reason most of these images were loaded, was simply that their width and height was needed to set the default size of _rect in BaseSubFrame. I changed _rect to private and made ALL access to it use getters and setters (even internally in the class). Then I added the field _wantsDefaultRect, that kept track of whether _rect should have the width/height-values or something else. When getRect is triggered, and _wantsDefaultRect is true, the image will be loaded, to get at those values.

Now, this might sound like a rather odd approach for a simple getter, but the end result is that the state of the class is preserved WITHOUT loading every single image on startup, thus reducing both the load time and the memory footprint by quite a bit. (For Dirty Split I halved the memory footprint of the first screen, and reduced the load-time to between a fifth and a tenth).

One side-effect of this that might prove this solution quirky, is that there is no guarantee that all frames of an animation reside in memory now, which might result in some animations being slow or late the first time they are played.

Singleton and separation
Instead of passing the File Manager around to everyone and everything, it's now moved to a singleton that is supposed to keep it, and other stuff that survive between loading savegames. Since BaseGame is already quite a bit on the heavy side (clocking in at over 4 KLOCs), I think it's appropriate to try to move as much functionality as possible OUT of that class and put it elsewhere.

Minor stuff

  • I had forgotten to add the space-bar as a "printable" key in the events-checks, which made typing anything but rather long words in i.e. save-game titles impossible. That is fixed now.
  • Sounds that were playing when saving now resume properly on load.
  • Screen-fading now works.
  • Save game thumbnails now scale properly (thanks to some scaling code I got from clone2727), where before they would repeat the first column in the last few rows, owing to some integer-division gone haywire.
  • The settings that were formerly in the settings.xml-file have now been moved to ConfMan, which is responsible for the ScummVM-settings-file (.scummvmrc/ScummVM Preferences/scummvm.ini), any game-specific settings will be stored there with the prefix priv_ (for instance the subtitles-flag in Dirty Split is stored as priv_Subtitles).


Monday, 23 July 2012

Start your engines

Last week I listed a few issues that were on my "TODO"-list, well of course some progress has been made, so without further ado:

Volume-settings:

While formerly volume-settings were completely ignored, now I've removed the internal settings in the engine for sfx/speech/music-volume, and mapped those directly to the matching ScummVM-settings, which means that changing the volume in the GMM, or in-game will show up nicely in both places. Some games are a tad aggressive on updating the sliders, which makes for a bit of jumpiness on their sliders, but it works atleast. 

The engine also has a concept of "master-volume", while ScummVM doesn't, so I ended up deciding that the most consequent way to solve that, was to simply apply that as a multiplication on the other values, thus setting Master to 50% when Speech is at 50% would yield 25%, and still get that updated when the values are changed through the GMM.

Sound-format-support:

I added in RAW-WAVE support (no MSADPCM-support yet), which means that the white chamber now also plays all it's sound-effects. The engine will error out if it meets any other formats, a solution I chose to be able to pick up on missing format-support if some game needs any more than this.

TTF-Fonts:

As I said last week, the fonts now draw without being too dark, it remains to be checked if the line-lengths still match well (seems that Rosemary has a few problems with lines overflowing without increasing the background for the text.

Detection:

Ok, so this is where I've put a lot of my work the past week. While most of the games I have are added with proper MD5-detection entries, one of the bigger selling points of this engine is the ability to make your own games, and, well having to recompile ScummVM with new MD5s for every iteration of your development would be kind of a hassle, which is why I added a fallback-detector. Currently this works like follows for any given folder.:
  1. If the folder doesn't contain a "data.dcp"-file ignore the folder
  2. Otherwise, start up the basic initialization of the engine.
  3. Add in the DCPs in that folder to the engine (although, "data.dcp" should be enough)
  4. Parse "startup.settings" to get the GAME-variable (usually "default.game")
  5. Parse the file defined from "startup.settings"'GAME-variable, looking for NAME and CAPTION.
  6. Then work with those vars to create a gameid and a game-description (using the extra-field of the detection entry to fill in the Caption if it differs from the game-name)
This gives a bit of overhead for each game, as quite a bit of the engine needs to be loaded to use the file-manager to read from a DCP, and the same holds true for the parser used, in all fairness this only happens whenever a "data.dcp" is found in a folder AND that data.dcp doesn't match a known MD5, but still, long-term I guess a more clearcut solution would be in order. The only file that really gets parsed by the engine-parser right now is startup.settings though, while default.game gets put through Common::StringTokenizer simply because I only need 2 fields of that file.

One problem with this solution, is that I don't use the StringTables in the engines, thus any localized captions will not be localized but instead contain the placeholder-string. In any case this isn't too big of an issue, as using non MD5-listed games should become the rare case for developers, and the game name can always be added explicitly to the MD5-based detection.

File-access:

Changing the engine to be able to do detection without loading up the ENTIRE kitchensink of classes, while working with an explicit folder, lead me to investigate the file manager-system further. I had to change the use of SearchMan into a system of explicit FSNodes, which means that now paths for archive-less files are parsed with FSNodes, and all DCP-Packages now carry a FSNode to their location. (Instead of opening them with SearchMan when needed).

This might sound like an odd solution, but it has a decent use case, a few games have a "language" subfolder, that contains localizations for multiple languages, with a SearchMan-based solution I ended up registering ALL packages there, thus often ending up with czech ingame-text (alphabetic ordering of files...) To avoid this I needed a special-case for the "languages"-subfolder, filtering the dcp's there, to only load the really necessary language. Right now this is hardcoded to be "english.dcp" only, but it should be quite possible to use the language specified in the game descriptor to select properly.

Variable-renaming:

While I still find the odd variable with the wrong convention, I have handled most of them now, one of the bigger changes was replacing the semi-global variable "Game", which basically is a reference to the main "CBGame"-object, that every object carries around with it, that was left behind after the big rename a few weeks ago, mainly to make sure no big issues arose. But it has now been renamed to _gameRef.

I also changed all the class-name-conventions, removing the C-prefix and expanding on the classnames, thus any CBClass became BaseClass, and CAdClass became AdClass, and CSysClass became SystemClass, and so on. This was also followed up with renaming the entire file-hierarchy, to follow one filename-convention (i ended up with all_lower_case.{h,cpp}, as I'd had far too many case-typos when jumping between OS X and Linux in this project, this should atleast make it easy to remember the case-choice for any wme-filename).

Additional game detections added:

I added in quite a few more MD5-entries, to have more games to test with, here are the games I remember of the top of my head:
Note: Not all these games actually work, for reasons stated below in the "Remaining-issues" section.

Remaining-issues:

  • Videos still desynch, something I won't be looking much further into in this GSoC, as the videos are a bit outside the main scope of this project, and well, the same issues still happen in Sword25, from where I got the video_decoder I'm using anyhow.
  • Dirty-rect rendering is still sketchy at best. (and thus disabled by default.
  • Interlaced PNGs are not yet supported, but that is also a bit outside of my control, I did open up a semi-complete pull-request that added more LodePNG-code to the PNG-decoder, but it still has a few files it chokes on.
  • 32 bpp BMPs are not supported in ScummVM, shouldn't be impossible to implement, as I already did some tweaks to work past the crash that ensued (simply skipping the fourth byte of every pixel in the BMP atleast yields some results).
  • Non v.1.1 JPEGs are not supported in ScummVM, this I haven't worked or looked any on, but it blocks a few games from even starting.

Monday, 16 July 2012

What's left?

I'm back again, after a week of harvest-work, I'm now ready to start the second half of GSoC, a big question then, is what's left to do?


  • Volume-settings are currently ignored (well, mute works, but that's it)
  • Videos desynch a bit (I used the video-code from the Sword 2.5-engine, which states this to  be a known issue)
  • Only OGG-audio is supported at the moment, while games like "the white chamber" also use WAV.
  • Detection still needs a bit of work:
    • Detection is hardcoded for the games I have tested with, and doesn't allow for user-games with changing/unknown MD5's.
    • Detection is set with a common target, and Savegames use hardcoded filenames, thus making ALL WinterMute-games share savegame-namespace. This means that slot 3 in J.U.L.I.A. also is slot 3 in Dirty Split...
  • TTF-fonts still need a bit of adjusting:
    • TTF-fonts currently lack a decent fallback to theme-fonts (in ANY case text will be drawn, just not with a font that looks at all similar to what was intended when the games were made)
    • TTF-fonts drew a bit on the dark side (cheap Star Wars-jokes aside, this was because I drew 16bpp for some reason, and then converted that to 32bpp, I have a fix for it now)
  • Variable renaming:
    • There are still the odd variable name here and there that follows either the VarName-naming convention, or the style_where_you_put_in_a_bunch_of_these. I think I got most of the former. And the latter, well it will have to change as well.
  • Drawing is slow, or as _sev put it: 
<_sev> i don't agree it is slow 
<_sev> it is ├╝ber-slow
Well, this is the bit that I talked a bit about last time, sadly I have still not solved the dirty rect-thing (since I haven't really been working on GSoC the past week, owing to the mentioned harvest-work). I did go through the blit-function I use right now, and noticed two things: 

As fuzzie mentioned back when I originally tried to refactor the blit-code from the Sword25-engine for common usage, I changed the constant shifts to variable shifts (i.e. "pix << 24 & 0xff" became "pix << blueShift & 0xff"), which would disable any compiler-optimizations that could change that to byte-access, instead of a shift, accordingly I also changed the byte-writes to format.colorFromARGB-calls, which add even more shifts. Doing some profiling on thise code revealed that ~30% of the total runtime was spent in this function. Simply changing the shifts back to constant shifts, and the format.colorFromARGB-calls to byte-writes, reduced this to ~19%. Which now makes flip() the heaviest part of the entire code (since it does a complete redraw of the frame every frame), counting in at ~57% of the total runtime spent in g_system->updateScreen(). Finishing up dirty rects should reduce that load by quite a bit more.

I did do some minor game testing the past week, as meta kindly provided me a retail version of J.U.L.I.A. I was able to play on from the demo, the downside being that I found J.U.L.I.A. to be using sprite rotation, which is not supported in WME Lite, from which I have based this port. This makes it rather unlikely that the full J.U.L.I.A.-version will become completable with this port during GSoC, although, it isn't unlikely that I'll end up adding in sprite rotation at some future point.

Going forwards, I'll try to get the things listed above fixed, and hopefully end up with a reducing my list of TODOs, instead of finding new roadblocks.

Wednesday, 4 July 2012

Time to clean those rects up.

My current solution to the rendering in WME is rather slow, capping out at around 30 fps on my i7, which means I get lower frame rates in Dirty Split than I do in Diablo III. The major difference between the two would of course be partially explained by Dirty Split being rendered entirely in software. But that alone shouldn't excuse a non-changing 2D scene not managing to get more than 30 fps.

Now there are a few places that the rendering can be improved: First of all, is there any reason to redraw the screen if NOTHING has changed? Probably not. Is there any reason to produce as many frames per second as possible, maxing the CPU-load? Probably not.

So, I put in a framerate cap, for the time being this is capped at 25 fps, to keep my CPU from maxing (and thus my laptop from getting quite hot). In practice it should probably go a bit higher than that, but since the rendering itself limits it's possibility at the moment, that discussion is rather moot.

To get the rendering to go a bit smoother, I had to detect if the new frame was at all different from the last frame, and preferably also WHAT was different from the previous frame, and then only redraw the parts that were changed. (The "dirty rectangles")

The old renderer

Originally, all surfaces that wanted to be rendered would apply a scale to themselves and then pass the scaled surface to the renderer (the surfaces would also cache the last scale, to avoid reapplying the scale every frame if it was only drawn at one size). Every frame would start off with clearing the screen-buffer, then drawing the surfaces one by one. Thus there was almost no difference between drawing a very different frame, or the same frame again, as all the surfaces would need to be redrawn anyhow.

Replacing this means I have to take care to keep the same behaviour intact, which means:

  • Any area that was drawn last frame, but isn't drawn now needs a redraw
  • Any area that doesn't get anything drawn in it, should get filled with the clear-colour (which was originally drawn into the screen-buffer on clear anyhow)
  • If any element was drawn before element X, it still needs to be redrawn before element X if it needs to be redrawn.
  • If any element was drawn after element X, it still needs to be redrawn before element X f it needs to be redrawn.

Render-tickets

The name I chose for my solution is "renderTickets". Where before any draw-call would mean an immediate update to the screen-buffer, in the new system, a renderTicket makes note of the operation that was asked for (a ticket), as well as a copy of the data that was to be used as a source for the operation.

When adding a ticket, a search is done to see if the surface that asked to draw issued any tickets last frame, and if they are the same as the one's this frame. Any tickets that are unchanged will be reused, while any new tickets will trigger an update in their target screen region.

Additionally the renderer keeps track of the order of the tickets, a ticket is only accepted for reuse if it is asked for at the same point it was asked for last frame, with one exception: A new ticket that arrives as ticket N, will increment the expected order of the tickets that should arrive as N+1 and on, since a new ticket will get it's region redrawn completely anyhow, this should still keep the Z-order, but avoid having to redraw uneccesarrily areas that arrive in-order, but with new tickets before them. Any ticket
that triggers a redraw, has it's target-position added as a dirty rect (at this point, that means that the single rect used is scaled to include it's area)

Finally, when the engine asks for a flip from the back-buffer to the screen-buffer, the actual drawing starts:
  1. First, the list of render tickets is purged of all items that were drawn last frame, but did not receive requests for draw this frame.
  2. The dirty rect is filled with the clear-colour
  3. Any tickets that intersect the dirty rect get's that section redrawn
  4. Finally the back-buffer is copied to the screen buffer and onto the screen.

Issues

The current implementation has a few issues with Z-order, and uses a bit more memory than the old solution. The memory usage was expected, as every ticket has to keep a copy of the section it wants to draw (as the scale can differ between draws, or the Surface that asked for a draw can be destroyed before screen-flip). Thus the implementation isn't enabled by default at this point. 

There also is no code done yet for Fading/Line-drawing with dirty-rects, and I also might want to use multiple dirty rects, instead of scaling a single update-area. Happily though, the current solution allows the engine to idle without problem, which means that a screen that doesn't change doesn't even trigger a buffer-copy, and puts the CPU-usage down from 100% to less than 10% when nothing is happening.

Other developments

I did a quick test of the engine on my PPC-machine, and fixed an endian-assumption in the scripts, which means that I possibly might be the first person to ever start up J.U.L.I.A. and Dirty Split on a PPC Mac :D That did expose the need for a way more efficient render-solution though, because while the game did RUN fine, it wasn't fast enough to be very playable.

In some free moments the past week I have also done quite a bit of variable and function-renaming to move towards following the ScummVM convention (ScummVM uses "funcName()" and "varName", while WME uses "FuncName()" and "VarName"), thanks to _sev's earlier help, the "m_MemberVar" -> "_memberVar" rename is already done, although a few of those might need a bit of lookthrough to catch cases like _iD (which probably should either be _id or _ID).


Current engine status

While there are no fancy pictures this time around, I thought I'd list my estimates on the engine-status:
  • Graphics: 80-90% Works completely, but is very slow
  • Sound: 50-60% Works, but lacks volume-control and WAV-file support
  • Fonts: 70-80% Works fine, but is a bit darker than in the original, also no solution is in place for replacing system-fonts (that won't necessarily be available on non-Windows-platforms)
  • PPC-support: Can't guess at a number, but the engine starts, and seems to run OK
  • Video: Works, but has the same issues as Broken Sword 2.5 lists, video desyncs from audio, and is very slow
  • Savegames: 80-90% Currently broken by a mis-setting of version-numbers, but that should be a quick fix, otherwise works fine, and has no noticed memory-leaks/issues.
  • Renaming to ScummVM-convention: 40-50% would be my estimate, but I would guess at closer to 40 than 50.
  • Sprite mirroring: 100%