One of the biggest first steps of the Alsherok project, beyond basic high level architecture and categorization, is understanding the raw data from the world. Parsing the data into a format that can be easy manipulated, searched, and aggregated is key to several aspects of the project.
- We’re going to need models for mobs
- We’re going to need models for objects (weapons, armor, bags, potions, food, etc.)
- We’re going to need models for the general environment
- I like playing with data
- Understanding aggregates can help with prioritization
- Cataloged data can be transformed to meet any need
Unfortunately, all of this data is stored in individual flat files for each area. Sure, I could combine all of the files and make a huge list, but that would be unwieldy and harder to look through. Besides, if I just wanted data in one place to do basic searches, I could just use grep against the folder.
Instead, I decided to parse all of the files into a set of PostgreSQL database tables. After playing with doing it in C#, I decided the iteration time was too slow so I switched over to Python. I was able to get one set of data (the mobs) completed after a few evenings of poking it with a sharp stick, and I found that HOLY CRAP THAT IS A LOT OF MOBS! In fact, right at 2255! I won’t need that many models, but it certainly does make the task of weeding through them a bit more time consuming.
That being said, I did have some fun playing around with aggregating data on different points such as gender, race, class, and zone just to see what I was working with. Unfortunately, I’ve already blown away all the tables because I am working on parsing out the objects now, and the script is set up as an all-or-nothing deal. I may refactor it here soon to be more selective, but I am working on this with a priority on speed over re-usability.
Speaking of speed… I did have some issues with the length of time it was taking to parse, somewhere around 3 minutes for 111 files. I posted on r/Python for some feedback and got some excellent pointers on places I didn’t need to use regex, and an excellent referral to a Python profiler that instantly pointed out my bottleneck: the database connection. I was being slick and using a with statement so I wouldn’t have to worry about cleanup, but that required me to open the connection for every write. Once I moved that database connection to the top of the function and handled everything with a try:except block, I got the execution time of the script down to just over 3 seconds. Yay! But now I have to do objects, which will likely add a few more seconds.
Once my parsing code is complete, I will release it along with the DDL for the database tables so that anyone else playing around with the AFKMud codebase will have a jump start on ways to analyze their data or as a first step in switching the code to store and retrieve from a database rather than a flat file.
Yay Python! Also, comment testing.:P
Haha. I love Python! And it just plays nicely with so many things out there. Also, test worked! 🙂