Archive for March, 2008

Found a small error in checking subscriptions

Tuesday, March 4th, 2008

I found a small error in how subscriptions were checked for updates today. A list of subscriptions that required updating was selected from the database. Initially there was no sorting placed on this as it was assumed that all items would be checked. However, as the number of subscriptions has ...

Another update to duplicate detection

Tuesday, March 4th, 2008

So after letting the latest experiment run for awhile it's become apparent that while the new duplicate detection is better than the last setup, we're still not where we need to be. Originally ReadPath used a duplicate detection system based on shingle comparisons of the stories. This system was incredibly effective ...

Changes to Duplicate detection

Monday, March 3rd, 2008

This weekend I also loosened the requirements for duplicate detection. It seems that we were being a bit too agressive. I'll let the new rules run for awhile and see how it compares.

Impact of URL index

Monday, March 3rd, 2008

As part of the performance changes for the site, I changed the way that urls are stored and looked up. This had been done in the database, but when ReadPath reached 12 million items stored, the memory required to maintain that index got to over 3Gb. There are two other ...

Performance Update

Monday, March 3rd, 2008

I spent most of the weekend working on performance updates. Several new systems have been put into place to reduce overall load on the site. Readpath is now archiving items older than two months to a separate system. For users, there shouldn't be any visible difference, but since the vast ...