Abstract: A method and apparatus for compressing query logs is provided. Multiple leves fo user-specifiable compression include character-based compression, token-based compression, and subsumption.An efficient method for performing subsumption is also provided. The compressed query logs are then used to train a statistical process such as a help function for a computer operating system.