Impact of URL index

March 3rd, 2008 | by bryan |

As part of the performance changes for the site, I changed the way that urls are stored and looked up. This had been done in the database, but when ReadPath reached 12 million items stored, the memory required to maintain that index got to over 3Gb. There are two other indices on the Content_Item table for primary key and Subscription_Id, but since we were changing the way we would go about looking up the urls it was no longer necessary to keep the URL index.

After dropping the index mysql was reporting that indices for that table were taking up only about 400Mb, ~10x reduction in memory requirements with actually a speed up in overall throughput through the system.

The database is not always the best place to store all of the data in your system. Distributable flat files can be very promising. The problem comes with the overhead in having to maintain the files, but there can be a huge payoff in performance and scalability.


Post a Comment