Thursday, September 17, 2009

In parser news.....

The new version (dubbed 4.0.0) is in testing. The biggest change is support for the compendium scrape (big surprise). I did add the "append" option and the ability to only extract certain types of information which were commonly requested during the SSPX Gen Con test.

So with the merger complete SSPX officially becomes dead. Alas, we barely knew you:

There are a few things I'd like to fix/adjust in the release beyond 4.0.0:

* Traps are still rough but the output is at least 80% of the way there. Getting that last 20% will take some effort and I don't want to hold it up for that as I feel extracting powers, monsters, feats, items, etc is worth a release.

* I skipped skills. It's one block of information from the PHB and the file format I use is crude to say the least (skills were the first formatted item I did).

* Classes were skipped as well. This is more a function of how the compendium lists the data and how I'd like it to be listed. The compendium lists everything by first appearance. So Fighter shows up in the PHB. As do all the fighter options like Battle Rager and Tempest. Really those should show up in Martial Power. I'm trying to figure out a good solution that doesn't involve scraping the entire class section every time.

* The selection method for combo items from the tree works poorly with a large number of choices (i.e. the source book selection) I'll have to rewrite that to use a context based combobox instead of just a context menu. The method provided is klunky but functional.

* Scraping requires you to enter your password before each scrape. This is so I don't have to store it, but it gets a bit annoying to key it in if you're going to scrape 50 different source books. (Believe me I know) Really after the initial release, I can't see people scraping more than 5 source books a month so it might be a non-issue.

* There are no scrape command files. I've considered this but at the moment I don't see a lot of need for anyone other than me to rescrape the same source book over and over. I've done it repeatedly for debugging purposes, but aside for errata or parser updates I can't see pulling the data over and over. For larger sets (like a PHB/AV/AV2) it's going to be way faster to change 4 or 5 powers by hand than to perform a complete rescrape. if you don't believe me, you will the first time you scrape the AV or AV2.

* I'd like to knock out bare bones class and race files based on the data scraped. It's unlikely it will include the dynamic keywords. I can probably add the Power name translations (i.e. powers with the Martial keyword are displayed as Exploits, Arcane as Spells, etc). It should be simple but my time is limited so it didn't make the cut.

I touched upon what I find to be an annoyance, but others may not care. Everything in the compendium is listed by source book that it was First Published in. I feel it should be listed under EVERY source book that contains it. If a monster appears in H2 and then was reprinted in the MM2, I feel it should show up if I tell the compendium I want to see the monsters in MM2. It doesn't. This is particularly annoying in modules/dungeon issues/RPGA adventures because you only get the "non-stock" NPCs.

If you're like me and want to build complete accurate representations of the source material, this will frustrate you. WOTC would have to change the way they organize the compendium. If this bothers you, I suggest you let them know.

Finally there is just no way for me to verify everything. Different source books sometimes have different underlying HTML formats (hell different powers in the same source book sometimes do). I think I've done most of the big books but I didn't extract every single issue of dungeon dragon, rpga adventure, etc. I certainly didn't sit there and verify every single stat block of the stuff I did extract and parse. It'd be foolish to expect them all to work without issue.


Olodrin said...

Fantastic! And hey, don't sweat the small stuff; if everything worked perfectly, you wouldn't have the joy of answering every tedious question on the 4E FUM board.

I don't want to deny you that singular pleasure.

Olodrin said...

BTW, will this version of the parser etc have the master list functionality?

Olodrin said...

NM, I just updated the parser and saw the Catalog option