Monday, October 31, 2016

Article Summary for Lecture #10- Northedge


Google and beyond: information retrieval on the World Wide Web


                        The invention of the World Wide Web has brought new challenges, but also many great improvements for information retrieval. The growth of available information has made it impossible for humans to maintain and catalog all of the vast number of resources out there. The limitation of controlled vocabularies and human indexing does not exist with search engines. They are also available 24/7 online so we do not deal with the limitation of a library’s open hours either.
                        Search engines function by having a software agent, or computer program, scan and analyze web pages to index them. The software agents do this continually to add more and more pages to the search engine’s index. When someone submits a search query in the search engine, the engine uses these indexes to quickly retrieve web pages that fit what the user is searching for. Google is known for their gigantic index as well as what they call “PageRank”, which is a algorithm system that Google founders have developed to weed out unimportant web pages that would clog up a user’s search with “bad” sources at the top of their list. For example, if you Google “Facebook”, Facebook.com would be the first result with the most relevant and popular pages just below it. As you went through page after page and page of results for Facebook, because there would be a lot, you would come across pages that may just mention “Facebook” on it, but is not a site you would ever be looking for like a blog or a random organization’s website.
                        A problem with search engines and how they accumulate keywords or tags to be used when finding results is that website creators can insert metatags that are irrelevant to their site just to be included on more search results and bring traffic to their page. A website for a dog breeder in Colorado should not have the metatag “Chicago Cubs” just because they want to gain attention from the world series hype. A breast cancer awareness site should not use the tag “election 2016” to put themselves on search engine results list just because that is a popular search term right now. Search engines are far more beneficial than harmful though and they will only get better from here. They make finding information faster and easier than ever and are here to stay.



Reference


Indexer 25:192-195.

No comments:

Post a Comment