The opposite of good news…

One of the developers that was to be at the core of the Alsherok project is unable to devote the time necessary to focus on the project. What this means is that I am placing the project on hold based on project size, complexity, and available resources. The work I have completed getting the flat files parsed out and into a database is not completely wasted as I learned a good deal about both the process of parsing non-standard flat files and the flexibility of Postgres. This project will stay on my list, and I still plan on tackling it. Unfortunately that will happen slightly further down the road.

Finished parsing! Whew…

I finished the bulk of the flat-file parsing, at least the ones that matter at this stage. I made the process repeatable and as generic as I could in a short time. I am not entirely happy with the code as it stands, but it works for what I needed, and I think I can release it as open source here shortly after I clean up some testing portions.

I have some internal things to sort out. I will have an update out by the end of the week.

Oh what a flexible database

So apparently I’ve been doing way too much work with my flat file parsing of the Alsherok files. I forgot a very key component of my database of choice, PostgreSQL: ability to act as a non-relational database such as MongoDB while retaining the power of an RDBMS! I know, that last sentence is kinda geeky. Essentially, I just needed to grab the section I wanted, turn it into JSON, and shove it directly into the database. After that, I can call the data back out using a hybrid of traditional SQL such as:

Which brings back the fields

as expected.

I can even ensure uniqueness by adding a unique index on a particular branch of the JSON:

This provides a unique entry and fast searching because.. you know.. indexes and stuff.

Needless to say, I felt the need to refactor the other flat files I’ve finished and make them all spit out JSON instead of complex table structures 😉 Thankfully that went much faster than writing them originally. I may actually meet my end of the month deadline after all.

On flat files and parsing…

One of the biggest first steps of the Alsherok project, beyond basic high level architecture and categorization, is understanding the raw data from the world. Parsing the data into a format that can be easy manipulated, searched, and aggregated is key to several aspects of the project.

  1. We’re going to need models for mobs
  2. We’re going to need models for objects (weapons, armor, bags, potions, food, etc.)
  3. We’re going to need models for the general environment
  4. I like playing with data
  5. Understanding aggregates can help with prioritization
  6. Cataloged data can be transformed to meet any need

Unfortunately, all of this data is stored in individual flat files for each area. Sure, I could combine all of the files and make a huge list, but that would be unwieldy and harder to look through. Besides, if I just wanted data in one place to do basic searches, I could just use grep against the folder.

Instead, I decided to parse all of the files into a set of PostgreSQL database tables. After playing with doing it in C#, I decided the iteration time was too slow so I switched over to Python. I was able to get one set of data (the mobs) completed after a few evenings of poking it with a sharp stick, and I found that HOLY CRAP THAT IS A LOT OF MOBS! In fact, right at 2255! I won’t need that many models, but it certainly does make the task of weeding through them a bit more time consuming.

That being said, I did have some fun playing around with aggregating data on different points such as gender, race, class, and zone just to see what I was working with. Unfortunately, I’ve already blown away all the tables because I am working on parsing out the objects now, and the script is set up as an all-or-nothing deal. I may refactor it here soon to be more selective, but I am working on this with a priority on speed over re-usability.

Speaking of speed… I did have some issues with the length of time it was taking to parse, somewhere around 3 minutes for 111 files. I posted on r/Python for some feedback and got some excellent pointers on places I didn’t need to use regex, and an excellent referral to a Python profiler that instantly pointed out my bottleneck: the database connection. I was being slick and using a with statement so I wouldn’t have to worry about cleanup, but that required me to open the connection for every write. Once I moved that database connection to the top of the function and handled everything with a try:except block, I got the execution time of the script down to just over 3 seconds. Yay! But now I have to do objects, which will likely add a few more seconds.

Once my parsing code is complete, I will release it along with the DDL for the database tables so that anyone else playing around with the AFKMud codebase will have a jump start on ways to analyze their data or as a first step in switching the code to store and retrieve from a database rather than a flat file.