Link graph roll back

January 13th, 2010 | by bryan |

ReadPath needs a way to keep track of which items link to other items. To accomplish this there is some code that builds up a link graph between the items. The original implementation of this was based on java bdb and worked quite well for a long time. After awhile, I had to move it from bdb to MySQL because of some issues with maintainability. Now, the current system is based on an innodb database where each url uses a hash of the url as a key and has a list of the items that reference it. This system is actually very similar to the reverse index used in a search engine.  I’ve contemplated using lucene as a base, but there were some areas that didn’t match up.

Over the last week I attempted the conversion of the link graph system from MySQL to HBase. I was having some scalability issues and was constantly having to clean the dataset to remove dead end nodes with 0 links.  Since HBase was working quite well for the content and dictionary systems, I was hopeful that moving from a single MySQL node to an 8 node HBase cluster would give me some room to grow.

I did some initial testing with HBase as a backend and found that for the processing portion of the system, the 8 node HBase cluster (quad core, 8Gb mem, 500Gb disk) was approximately equivalent to a single MySQL node with the same hardware. The chart below shows a test run of data processing. The line is the number of items processed per minute. As you can see from the chart, the rate starts out fairly high at ~75,000 / minute. As time progresses though the rate quickly drops down to ~5,000 / minute (this rate is equivalent to the MySQL backend). As I watched the HBase nodes during this run I could see that the cache percentage was rock bottom and that all of the machines had their disk IO maxed out trying to handle the completely random read / write access pattern. So while there was an initial advantage to using HBase when the dataset fit in memory, that advantage is quickly lost.

Even without the advantage of using HBase for the data processing, I wanted to test out the production access pattern which would include lots of clients reading from the table. Maybe HBase could handle this additional read load better than MySQL. So last week I converted the ReadPath code to use the HBase backend and flipped the switch in production. It quickly became apparent that I was going to have to roll the change back.

The chart below shows the average amount of time in milliSeconds that it takes to complete a request. The red lines indicates when the conversion was made. As you can see the avg time to complete a request jumped from the 5mS range to several hundred mS.

The next chart shows the number of requests / min that were being made during the same time period. The same amount of work is attempting to go through, it just isn’t moving as quickly. You can see the large spike that occurs after the conversion period where the system, after being restored to the MySQL backend, is catching up again.

I’ve found that HBase works great for certain parts of ReadPath’s infrastructure.

  • It’s ideally suited to the content table since the vast majority of requests are for items from the last two weeks. This allows HBase’s internal caching and file structures to work quite well. It also makes available the entire content table for Map/Reduce jobs when necessary.
  • The dictionary system also works very well. The processing can be done with a Map/Reduce job that is much faster than previous methods.
  • Creating a static link graph works very well. I wrote a simple Map/Reduce job that can process 250 million content items and output the result to HBase in a matter of a couple hours. This could be very useful for deeper analysis of the content.
  • As a link graph that is updated in real time as content comes in it falls short. Here the requirement is for very fast  random read access. Once the dataset grows larger than what HBase can hold in memory, performance begins to degrade. There is of course the caveat that, if I had more funds to run the site and could power on enough servers to hold the dataset in memory, then it would work as a solution and would be incredibly fast.

In conclusion, with my current hardware availability, HBase just isn’t a candidate for the link graph. My primary option is to look at partitioning MySQL, splitting the table across servers. This solution will scale linearly with the number of servers. I’ve just found from personal experience with other jobs that it can be a real hassle to get the code working well and maintain.

  1. 2 Responses to “Link graph roll back”

  2. By stack on Jan 13, 2010 | Reply

    Nice writeup Bryan. Can us hbaser’s learn more about the cluster state and the load profiling so we can learn from your experience.

  3. By bryan on Jan 13, 2010 | Reply

    Of course, a couple of others have pinged me over email. I’ll write up something and post to the email list.

Post a Comment