Stephanie in Libraryland: October 2016

Monday, October 31, 2016

Article Summary for Lecture #10- Northedge

Google and beyond: information retrieval on the World Wide Web

The invention of the World Wide Web has brought new challenges, but also many great improvements for information retrieval. The growth of available information has made it impossible for humans to maintain and catalog all of the vast number of resources out there. The limitation of controlled vocabularies and human indexing does not exist with search engines. They are also available 24/7 online so we do not deal with the limitation of a library’s open hours either.

Search engines function by having a software agent, or computer program, scan and analyze web pages to index them. The software agents do this continually to add more and more pages to the search engine’s index. When someone submits a search query in the search engine, the engine uses these indexes to quickly retrieve web pages that fit what the user is searching for. Google is known for their gigantic index as well as what they call “PageRank”, which is a algorithm system that Google founders have developed to weed out unimportant web pages that would clog up a user’s search with “bad” sources at the top of their list. For example, if you Google “Facebook”, Facebook.com would be the first result with the most relevant and popular pages just below it. As you went through page after page and page of results for Facebook, because there would be a lot, you would come across pages that may just mention “Facebook” on it, but is not a site you would ever be looking for like a blog or a random organization’s website.

A problem with search engines and how they accumulate keywords or tags to be used when finding re

sults is that website creators can insert metatags that are irrelevant to their site just to be included on more search results and bring traffic to their page. A website for a dog breeder in Colorado should not have the metatag “Chicago Cubs” just because they want to gain attention from the world series hype. A breast cancer awareness site should not use the tag “election 2016” to put themselves on search engine results list just because that is a popular search term right now. Search engines are far more beneficial than harmful though and they will only get better from here. They make finding information faster and easier than ever and are here to stay.

Reference

Northedge, R. (2007). Google and beyond: Information retrieval on the World Wide Web. The

Indexer 25:192-195.

Monday, October 24, 2016

Article Summary for Lecture #9- Shiri, Revie, & Chowdhury

Thesaurus-Enhanced Search Interfaces

An important issue search engines face is user inquiries matching the vocabulary in surrogate records. If a term doesn't exactly match, that source will not come up as a search result even if it is exactly what the searcher is looking for. If I am looking for a book on prehistoric reptiles, if I put that in, I may have gotten better results if I had searched for "dinosaurs" instead. This is a problem catalogers have faced and with emerging technologies, they have been able to fix this for the most part. Early version

s of thesaurus enhanced search interfaces were introduced in the 1970’s to largely help with filling the gap between what the user thinks they are looking for and what they are really looking for in a catalog. Catalogers realized it would be much easier to provide this tool within the system instead of hoping that one day users will just be able to figure out what to put in and what will give good results and what will not. Without having to exactly match terms in a catalog system, users are able to get much better search results. The 1980’s brought artificial intelligence and expanded thesauri use within information systems. Thesaurus-enhanced systems use a mapping technique in which the user’s term is linked with terms found in the system’s thesaurus and the results are arranged with the user’s term first and then the other located terms after that. Some systems even recommend different terms while typing your inquiry or after the search has been submitted with the number of results that you would get based on those terms. While integrating thesauri into search interfaces has greatly improved how users are able to search using these systems, they are far from perfect. Going back to the prehistoric reptiles example, after looking up “reptile” in a thesaurus, the definition the thesaurus comes up with is “a person who is very dishonest” and offers terms such as weasel, cheater, snake, and rascal. The only one that is maybe going to get me results I am looking for is snake, as in “prehistoric snake”, but still not exactly the dinosaurs I am really looking for. The problem with using a thesaurus in a search interface can be that the term you are searching more may not be used in the way the thesaurus thinks you are using it. Of course that would be an example of a very ineffective search system and while thesauri systems may not be flawless, they are helping users and are being improved all the time.

References

Shiri, A.A., Revie, C., & Chowdhury, G. (2002). Thesaurus-enhanced search interfaces. Journal

of Information Science 28:111-122.

Monday, October 17, 2016

Article Summary for Lecure #8- Chan & Hodges

Entering the Millennium: A New Century for LCSH

The Library of Congress Subject Headings (LCSH) have been around since the late nineteenth century. Originally a modified version of a subject headings list published by the American Library Association, the Library of Congress chose this list to begin their transformation to a dictionary form catalog. The LCSH list started off as a subject access system for the Library of Congress, but over the past century has evolved into a tool used by libraries throughout our country and around the globe.

The growth and acceptance of the LCSH can be attributed to the fact that the Library of Congress made their cataloging records available to other libraries. Beginning in 1902, LC began distributing its printed cards and in 1993 made their collection of records available online. Being able to share resources electronically made cataloging with the LCSH a breeze and cut down on manual labor since catalogers did not have to create aa record from scratch for every single item their institution acquired. Since catalogers had access to this vast resource, why would they create their own way of cataloging when the work is already done for them by the Library of Congress? The LCSH is one of the largest non-specialized controlled vocabularies in the world. Many libraries and commercial institutions that don’t use the LCSH at least use the list as a model for their own systems. The LCSH can be used as is or can be modified or translated to be used in a variety of specialized settings.

While the LCSH list may not be perfect, many catalogers agree that this list is one of the best retrieval tools available today. Due to dependable authority control and a large vocabulary, there is a high retrieval recall rate for the LCSH list. It’s structure is also dynamic in that is can easily be expanded based on the institution’s needs.

At the end of the twentieth century, we were able to see a great change in how the LCSH was versus how it began in the late nineteenth century. With the advancing online tools, we began to see bibliographic records using subject headings from multiple schemas. Library users’ behavior changed with the online world. Called the principle of least effort, library users were no longer willing to do much work to find resources and when they get their electronic search results, patrons are likely to only consider the first few results. Therefore, in order to get the best results possible, schemas were combined so that whatever the user put into the search box would give the best results in finding a match.

In the future, who knows what will be in store for LCSH. As with anything, the system will have to be adapted to the changing times and technology that is to come. LCSH has come a long way and I have no doubt that it will continue to evolve and grow to accommodate library needs in the forseeable future.

The LCSH is one of the largest non-specialized controlled vocabularies in the world. Many libraries and commercial institutions that don’t use the LCSH at least use the list as a model for their own systems. The LCSH can be used as is or can be modified or translated to be used in a variety of specialized settings.

Reference

Chan, L., & Hodges, T. (2000). Entering the millennium: A new century for LCSH. Cataloging & Classification Quarterly 29(1/2):225-234

Monday, October 10, 2016

Article Summary for Lecture #7- Taylor

On the Subject of Subjects

Subject cataloging has apparently been a hot debate topic among librarians for centuries. Some believe that people know what they are looking for and there is no need to have works sorted by subject, while others claim that most people only know that they need a book on a particular subject with no specific title in mind. Studies have shown that searching by subject in online catalogs has decreased over time with patrons instead favoring keyword searches.

Keyword searches are very similar to subject searches, but keywords give you a little more wiggle room with what you're searching for, where subject searches require specific and certain terms to be searched or controlled vocabulary. Keywords are somewhat like today's Instagram hashtags in my opinion. If you were looking for a non-fiction book about zombies and searched zombies in a keyword search, you'd get tons of results because those results were tagged with "Zombie" as a keyword. If you were to do the same term as a subject search though, you might not get any results because the subject about zombies may be considered "Haitian Folklore", but because that wasn't the exact term we searched for, we didn't get any results. The problem with keyword searches is that these searches are typically too broad and can give too many results of varying quality. Keywords may also be taken from words that have multiple meaning and bring up completely irrelevant search results.

Beginning in 1992, to make subject searches easier, The Library of Congress, OCLC, and Research Library Group teamed together to standardize how subjects are cataloged. These groups created what it called a Core Record that has all kinds of rules on what can be in it and has a code so that people can know what they're reading, similar to a MARC record. For monographs, each Core Record is required to have a classification number recognized by USMARC as well as at least two subject headings from an established thesaurus and also recognized by USMARC.

Those opposed to Core Records say that it makes sense to go through the effort of classifying physical items and grouping them on shelves by subject, but with electronic records, what is the point? Classifying records by subject can allow records to be related to each other and can be used to link terms across thesauri. I would think that this means that if you search "Haitian Folklore" as a subject, you will also come across "Haitian Mythology" because "mythology" and "folklore" are synonyms and the database would use a thesaurus to find these synonyms. With keywords, this would become way too chaotic and give way too many search results.

I personally like keyword searches. In my experience, when using my library's OPAC, keyword gives the best results and the item I'm looking for are usually at the top. I would say I use keyword searches 85% of the time, otherwise using author or title searches. I don't think I've ever used subject search. When I search "sewing" in keyword search, I get all of the results I'm looking for. I can see how keyword searches may be less useful in databases containing journal articles like EBSCO or PsychInfo because there are typically more keywords per record than library OPAC's.

Reference

Taylor, A. (1995). On the subject of subjects. Journal of Academic Librarianship 21:484-91.