Capitol Words are determined by capturing the full text of the House, Senate and Extension of Remarks sections of the Congressional Record for every day, dating back to the second session of the 106th Congress (January 20, 2000), via GPO Access and storing it on Sunlight's LOUIS database. Sunlight then runs a query on LOUIS to calculate the most commonly used words for a given day, with some exceptions, (described in more detail below). Each afternoon, the daily counts for the previous day are added to the Capitol Words database.
The word count calculated by Sunlight does not include the Daily Digest section in the Congressional Record. This section summarizes the daily activities of Congress. The word count also excludes several sets of commonly used words that do not have substantive meaning. This includes words of two letters or less, a list of common congressional procedural words that was determined by Sunlight and a list of commonly used 'stop-words'. (Stop-words are terms that are commonly ignored by search engines and other data indexers to ensure that only valuable content is queried.) Capitol Words' stop-word list is based on the stop-word list provided by the text indexing engine, Onix, in its full indexing toolkit. Capitol Words' list of stop-words is dynamic and may be modified, as needed.

