Search Engine Bots
Search engines are, for the most part, entities that rely on automated software agents called spiders, crawlers, robots and bots. These bots are the seekers of content on the Internet, and from within individual web pages. These tools are key parts in how the search engines operate.
(Please check out Metamend's search engine marketing services).
To be able to index the Internet, the search engines need a tool that is able to visit websites; navigate the websites; discern information about the website; decide what the website is about; and add that data to its index. This tool also has to be able to follow leads, or links from one website to the next, so that it can infinitely continue to gather information, and learn about the Internet. If it does its job properly, then the search engine has a good, valuable database, or index, and will deliver relevant results to a visitors query.
Unfortunately, the tools that the search engines depend upon to add content to their databases are neither cutting edge nor are they incredibly powerful. Search engine robots have very limited functionality, similar in power to that of early web browsers in terms of what they can understand in a web page. From the information that is visible to them, these spiders grab information like page titles, meta tags and meta data, and textual content to be included in the search engine's index or database.
How Do Search Engine Robots Work?
Think of search engine robots as very simple and automated data retrieval programs, traveling the web to find information and links. They only absorb what they can see, and while a picture is worth a thousand words to a person, its worth zero to a search engine. They can only read and understand text, and then only if its laid out in a format that is tuned to their needs. Ensuring that they can access and read all the content from within a web site must be a core part of any search engine optimization strategy.
When a web page is submitted to a search engine, the url is added to the search engine bots queue of websites to visit. Even if you don't directly submit a website, or the web pages within a website, most robots will find the content within your website if other websites link to it. Thats part of a process referred to as building reciprocal links. This is one of the reasons why it is crucial to build the link popularity for a website, and to get links from other topical sites back to yours. It should be part of any website marketing strategy you opt in for.
When a search engine bot arrives at a website, the bots are supposed to check to see if you have a robots.txt file. This file is used to tell robots which areas of your site are off-limits to them. Typically these may be directories containing files the robot doesn't need to concern itself with. Some bots will ignore these files. However, all search engine bots do look for the file. Every website should have one, even if it is blank. Its just one of the things that the search engines look for.
Robots store a list of all the links they find on each page they visit, and follow those links through to other websites. The original concept behind the Internet was that everything would organically be linked together, like a giant relationship model. This principle is still a code part behind is how robots get around.
The smart part behind search engines actually comes in the next step. Compiling all the data that the bots have retrieved is part of building the search engine index, or database. This part of indexing websites and web pages comes from the search engine engineers, who devise the rules and algorithms which are used to evaluate and score the information the search engine bots retrieved. Once the website is added into the search engine database, the information is available for customers who are querying the search engine. When a search engine user enters a query into a search engine, the search engine performs a variety of steps to ensure that it delivers what it estimates to be the best, most relevant response to the question.
How Do The Search Engines Read Your Website?
When the search engine bot visits a website, it reads all the visible text on the web page, the content of the various tags in the source code (title tag, meta tags, Dublin Core Tags, comments tags, alt tags, attribute tags, content, etc.), as well as the text within the hyperlinks on the web page. From the content that it extracts, the search engine decides what the website, and web page is about. There are many factors used to figure out what is of value and what matters. Each search engine has its own set of rules, standards and algorithms in order to evaluate and process the information. Depending on how the bot was set up by the search engine, different pieces of information are gathered, weighted, indexed and then added to the search engine's database. Manipulation of the keywords within these webpage elements form part of what is know as search engine optimization.
After it is added, the information then becomes part of the search engine and directory ranking process. When the search engine visitor submits their query, the search engine digs through its database to give the final listing that is displayed on the results page.
The search engine databases update at varying times. Once a website is in the search engine database, the bots will keep visiting it regularly, so as to pick up any changes that are made to the websites pages, and to ensure they have the most current data. The number of times a website is visited will depend on how the search engine sets up its visits, which can vary per search engine. However, the more active a website, the more often if will get visited. If a website varies frequently, the search engine will send bots by more often. This is also true if the website is extremely popular, or heavily trafficked.
Sometimes bots are unable to access the website they are visiting. If a website is down, the bot may not be able to access the website. When this happens, the website may not be re-indexed, and if it happens repeatedly, the website may drop in the rankings.