Google's prominence has made it an important source of traffic for most sites. This has made inclusion in Google's index an important marketing tool for most sites, and sites with higher rankings are the most likely to benefit.
Getting crawled by Google is easy. All you do is submit the top-level URL of your site. There's usually no need to submit individual pages. Google uses an automated site-crawler to find the pages within your site. If you move or delete pages, it will discover this on subsequent visits, and these URLs will be deleted from the index.
If you have multiple domains, or a site that lives under several top-level URLs, you'll want to submit each of the top-level URLs. That will ensure that all areas of your site get crawled in a timely fashion.
Occasionally, some sites will not get added automatically. This may be a result of a site being unavailable or slow when Google tries to index it. Sites that are linked to by other sites, or that are listed in Yahoo or DMOZ's directories may be more likely to be indexed quickly, because this gives Google's crawler multiple ways to find your site.
Reasons why pages may not be listed
There are several reasons your pages may not show up, including the way they are generated, their availability, and content.
According to information at Google's site, they limit the number of dynamically generated pages that they index at any site. Site crawlers can generate a tremendous amount of site traffic in a short amount of time. Google limits the number of dynamic pages it requests to avoid monopolizing a server's resources.
Frames can also cause problems. This is because it's difficult for an automated system to know what should be indexed: the frame, the individual panes, the entire page or the alternate content. If your site uses frames, it's very important that your frames include "noframes" content. This is alternate content for browsers that can't handle frames. It is also used by search engines.
If your site doesn't provide an internal path to all of your pages, search robots may not find them. These pages may also not be indexed if they are only linked to by pages that are not in the index for other reasons, such as being dynamically generated. This problem can be avoided by including an index page that provides direct links to pages that have convoluted navigation paths.
Pages may also be left out if they are not available when search crawlers visit. Many sites do periodic maintenance at off hours, or cycle servers during the night. If your site is not fully redundant, it may be unavailable at these times, which can result in your site being left out of the index. This will normally be corrected the next time the crawler visits your site, but should be avoided because it makes it difficult for users to find your site.
In rare cases, page content can cause your pages to be left out. If Google identifies elements on your pages that it thinks are inappropriate for general audiences, the pages will get filtered out. The content will be indexed, but may not show up in search results, depending on the searcher's SafeSearch settings.
Pages are also filtered out if they do not meet Google's "quality standards". Google doesn't detail what its quality standards are, but things like page cloaking, invisible text, or other techniques designed to fool search engines can cause pages to be excluded. Pages are also occasionally excluded for copyright reasons, or because the page content is inflammatory. Fortunately, it's rare for any of these problems to affect typical sites.
How pages are ranked
One of Google's most important features for web builders is that it is hard to fool. This can make it frustrating for new sites that would like to get high rankings, but it means that pages don't need to be filled with hacks to get higher page rankings. It also means that your page is less likely to be lost in search results behind pages that have been hacked together to get high search engine rankings.
Google does not disclose its methods for ranking pages, because of the competitive value of this information. However, they do explain that a large part of the page ranking process is based on analyzing links that point to individual pages. Instead of trusting the content of a page, Google looks at what others have said about that page. If important, high-quality sites about a given topic point to your site, your site will rank highly for searches on this topic. On the other hand, if your site is a great resource, but nobody links to it, it will rank lower.
Because of this, it's important to remember that your site doesn't exist in a vacuum. Google attempts to view your site within the context of the billions of pages of content available on the Internet.
Getting your highest ranking
Google is tough to trick. Even if you could trick it, this would probably be unwise, because your site could end up on the excluded list. This largely frees webmasters from spending time on tricks and page hacks designed to get higher rankings.
Instead, companies should look for opportunities to increase their status within the context of the web. Obviously, good content is important. The best content is not only informative, but is full of words that are common search terms for your topic area. People that may be interested in your product or service are likely to be searching using common phrases related to your business area, rather than the name of your product.
Incoming links are heavily weighted in current search algorithms because they are difficult to fake. Google considers sites that link to yours, the prominence of the sites that link to yours, and how sites link to yours. The most important links to your site will come from prominent sites related to your topic, and will link to your site using common search phrases.
Because incoming links are so important, it's important that companies devote some of their resources to increasing the number and quality of incoming links. You can analyze what sites link to your site by using the "link:" prefix in your search. Put "link:" before the URL you want to check, and Google will return a list of sites that link to your URL.
Text-based advertisements are important to consider, too. Text is easily indexed, and text ads provide additional incoming links to your site. If your text ads show up on sites that relate directly to your content, they provide additional relevant incoming links.
It's easy to identify good places to advertise. Do a Google search for the topic area or search phrase that you are targeting. The sites that are returned are ones that have already achieved prominence in your topic area, so they are worth considering as advertising locations.
* Have a clear hierarchy to your site, and use text links - a site index can help.
* Make sure that your site is full of informative text, and that the text uses common search terms.
* Use accurate page titles.
* Avoid pages with hundreds of links - this may penalize you.
* View your pages with a text browser, such as Lynx, to see your site like a search engine.
* If you use dynamic content or a CMS, make sure that your content can be crawled.
* Avoid hidden text or links.
* Avoid doorway pages, pages loaded with irrelevant words, or multiple copies of the same page.