Content update

February 14th, 2010 | by bryan |

Just a quick update. The latest dictionary update job for ReadPath just completed.

835,029 feeds monitored.

299,113,314 content items.

28,914,230,869 words for the dictionary. This works out to 123,036,971 distinct two word pairs.

Entire job ran just over 7 hours on an 8 node hadoop/hbase cluster. The job read just under 400Gb of data and output 462Gb of data.


Post a Comment