On flat files and parsing...

One of the biggest first steps of the Alsherok project, beyond basic high level architecture and categorization, is understanding the raw data from the world. Parsing the data into a format that can be easy manipulated, searched, and aggregated is key to several aspects of the project.

We’re going to need models for mobs
We’re going to need models for objects (weapons, armor, bags, potions, food, etc.)
We’re going to need models for the general environment
I like playing with data
Understanding aggregates can help with prioritization
Cataloged data can be transformed to meet any need

Unfortunately, all of this data is stored in individual flat files for each area. Sure, I could combine all of the files and make a huge list, but that would be unwieldy and harder to look through. Besides, if I just wanted data in one place to do basic searches, I could just use grep against the folder.

Instead, I decided to parse all of the files into a set of PostgreSQL database tables. After playing with doing it in C#, I decided the iteration time was too slow so I switched over to Python. I was able to get one set of data (the mobs) completed after a few evenings of poking it with a sharp stick, and I found that HOLY CRAP THAT IS A LOT OF MOBS! In fact, right at 2255! I won’t need that many models, but it certainly does make the task of weeding through them a bit more time consuming.

That being said, I did have some fun playing around with aggregating data on different points such as gender, race, class, and zone just to see what I was working with. Unfortunately, I’ve already blown away all the tables because I am working on parsing out the objects now, and the script is set up as an all-or-nothing deal. I may refactor it here soon to be more selective, but I am working on this with a priority on speed over re-usability.

Speaking of speed… I did have some issues with the length of time it was taking to parse, somewhere around 3 minutes for 111 files. I posted on r/Python for some feedback and got some excellent pointers on places I didn’t need to use regex, and an excellent referral to a Python profiler that instantly pointed out my bottleneck: the database connection. I was being slick and using a with statement so I wouldn’t have to worry about cleanup, but that required me to open the connection for every write. Once I moved that database connection to the top of the function and handled everything with a try:except block, I got the execution time of the script down to just over 3 seconds. Yay! But now I have to do objects, which will likely add a few more seconds.

Once my parsing code is complete, I will release it along with the DDL for the database tables so that anyone else playing around with the AFKMud codebase will have a jump start on ways to analyze their data or as a first step in switching the code to store and retrieve from a database rather than a flat file.

On flat files and parsing…

2 Comments

Submit a Comment Cancel reply

Find Stuff…

Recent Posts

Recent Comments

Categories

Archives