Search Marketing


How Search Engines Look at Information

Thu Jun 23, 2011 8:41 pm
<<     >>
Comments: 0 Views: 1015
This is the third in a series of articles attempting to look at how programs are built and how one would begin to think about building their own search engine. This article is about how information on the web is laid out and how the major search engines must look at the context the creator provides.

At its core a search engine needs to find information locked up in documents, databases or web pages. After finding the information in those documents a search engine needs to deliver the information to someone based on a question. Hopefully the question itself can provide some context to help in delivering the proper result.

You could look at it as if all this data in the world were flat, it starts off equal and depending on the way someone chooses to organize these bits, the more context they have. If I place something in a pdf document, or an html document, or an image, I have a different set of organizational context surrounding it.

A web page has meta tags that help to make sense of it, hypertext links to and fro that help to define it, and html and css that help to provide emphasis to certain parts, and the actual text information. This is at least the way a human could interpret it. So far search engine bots are not likely to interpret the document in the same way a human would. And that is understandable since two different humans would likely interpret the layout and context of the document differently also.

Pdf's, word documents, and text style documents, may or may not have any formating to provide emphasis, and these tend not to be linked to in the same way as an html document. Although both html documents and pdf's do have url's, so the domain, or site itself may provide some context.

But information is not really flat, since those individuals that create a document inherently provide context. If we consider a web site, there are two different elements, the site as a whole and the individual document. The site as a whole may be made up of thousands of urls and may or may not be related to a single concept. You would expect the homepage, or root directory files to contain the main concept of the whole domain. This does not include subdomains, which can be related to different computers.

The basics of any search engine are how to interpret what someone types in. If someone types in “car”, then what should come up? What if a human were asked that same question, what would they come up with? But it is likely the question would be longer and more specific. To a certain extent people have been trained to speak (type) the way search engines want them to, since not many people in the real world would just say “car”. But on the web people type in “car” and other relatively nonsensical phrases. And since this web language has developed, many search engines and site owners understand this and have tailored their titles and results to reflect this development.

URL: http: (ex.
Math (27 + 2)
* required

© 2019 Christonium LLC
Terms of Use