Thursday, November 02, 2006

Google Patent - Analysed

I've spent a good deal of time of the past two days analyzing the Google patent.

I believe the patent expresses three things:

  1. Factors Google thinks are important and may be in the current algorithm
  2. Factors Google thinks are important and want to incorporate into the algorithm in the next 3-5 years
  3. Factors Google would like to stake an early claim to, so competitors don't use them.
I've included the sections where I drew my conclusions from for reference.

Domain Factors

  • Length of domain registration (section 0099)
  • Domains are monitored for changes in expiration (section 38,39)
  • Nameserver, and Whois data is monitored for changes and valid physical addresses (same technology used in google maps)
  • Name servers and possibly class C networks should have a mix of whois data, registrars, and keyword and non-keyword domains (section 0101)
  • Documents/websites are given a discovery date when they are discovered through any of the following means
    • external link
    • user gathered data(sections 1,2,3,4, 38)
  • Websites must have more than one document (section 5)
  • Change in the weighting of key terms for a domain are monitored for changes (section 50)
  • Changes in a domain to topics that don't match prior content are an indicator of change of focus, existing prior links will be discounted (section 0084)
Documents and Pages
  • Documents are compared for changes in the following
    • frequency (time frame)
    • amount of change
    • (section 6,7,8, 9, 11, 12)
  • Number of new documents (internal ?) linked to document is recorded (sections 9,13)
  • Change in the weighting of key terms for the document is recorded (section 10, 14)
  • Documents are given a staleness (lack of change?) rating (section 19)
  • The rate at which content of a document changes and it's anchor text changes are recorded (section 31, 33)
  • Outbound links to low trust or affiliate websites may be an indicator of low quality (section 0089)
  • Don't change the focus of many documents at once ( section 0128)
Links
  • A links anchor text and discovery date are recorded (sections 54, 55, 56, 57, 58)
  • Links are given a discovery date and monitored for appearance and disappearance over time(section 22,26, 58)
  • Links and anchor text are monitored for growth rates (section 48)
  • Links are monitored for changes in anchor text over a given period of time (sections 27, 30, 54, 55, 56, 57, 58)
  • Links are weighted on trust or authoritativeness of the linking document, as is the newness or longevity of the link (section 28, 58, 0074)
  • Link growth of independent peer documents (different class C networks?) are monitored.
  • The rate at which new links to a document appear or disappear is monitored (sections 23, 24)
  • A freshness rating of new links is recorded (section 32)
  • It is determined whether a document has trend of appearing or disappearing links (section 25)
  • A distribution rating for the age of all links is recorded (section 29)
  • Links that have a long lifespan are more valuable over links that have a shorter lifespan (section 59)
  • Links from stale pages are devalued where links from fresh pages are given a boost (section 60)
  • Link churn is monitored and recorded (section 61, 62)
  • New websites are not expected to have a large number of links (section 0038)
  • Link growth should remain constant and slow (section 0069, 0077)
  • Burst link growth may be a strong indicator of search engine spam ( section 0077)
  • If a document is stale (not changed) but is still acquiring new links it will be considered fresh ( section 0075)
  • If a document is stale and has no link growth or has a decrease of inbound links it's outbound links will be discounted (section 0080)
  • A spike in links would be acceptable if document has one or more links from authority documents (section 0110)
  • Anchor text should be varied as much as possible (sections 0120, 121)
  • The growth of variation in anchor text should remain consistent (section 0120, 0121)
Search Results
  • Volume of searches over time are recorded and monitored for increases (sections 17, 18)
  • Information regarding a documents rankings are recorded and monitored for changes (sections 41, 42, 43)
  • Click through rates are monitored for changes in seasonality, or burst increases, or other spike traffic (section 43, 44)
  • Click through rates are monitored for increase or decrease trends (section 51, 52, 53)
  • Click through rates are monitored to see if stale or fresh documents are preferred for a search query (sections 20, 21)
  • Click through rates for documents for a search term is recorded (sections 15, 16, 37, 43)
User Data
  • traffic to a document is recorded and monitored for changes (possibly through toolbar, or desktop searches of cache and history files) (section 34, 35)
  • User behavior is websites are monitored and recorded for changes (click through back button etc)(section 36, 37)
  • User behavior is monitored through bookmarks, cache, favorites, and temp files (possibly through google toolbar or desktop search) (section 46)
  • Bookmarks and favorites are monitored for both additions and deletions (section 0114, 0115)
  • User behavior for documents are monitored for trends changes (section 47)
  • The time a user spends on website may be used to indicate a documents quality of freshness (section 0094)
miscellaneous
  • Documents that change frequently in ranking may be be considered untrustworthiness (0104)
  • Keywords with little or no change in results should match domains with stable rankings (section 0105, 106, 107)
  • Keywords with high volatility of change should have domains with more volatility (section 0105, 106, 107)
Again what and how much of this is actually in place is open for debate. If you think I interpreted something incorrectly please let me know. If you have other ideas let me know I'd be glad to add them here.
 

No comments: