Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.22 MB, 72 trang )
<span class='text_page_counter'>(1)</span><div class='page_container' data-page=1>
The Inter national Workshop on Global Collaboration of Infor mation Schools, WIS2012, was held on 15
November 2012 as part of International Confer ence on Asia-Pacific Digital Libraries (ICADL2012) which
took place at the GIS, NTU Convention Centre, from 12-15 November 2012 in Taipei, Taiwan
(www.icadl2012.org). The first half of the Workshop comprised research paper sessions which wer e
organized jointly with the Graduate Students Consortium (GSC) of ICADL2012. The second half of the day
was composed of a round table discussion for information schools collaboration and research paper
presentations by students and young researchers of the infor mation schools community, e.g., CiSAP
(Consortium of information Schools in Asia Pacific) and iSchools Community.
This proceedings is a collection of the research papers presented at WIS2012. All papers had been reviewed
by the WIS2012 committee before they wer e selected for presentation.
<b>About WIS 2012 </b>
WIS 2012 consists of thr ee parts,
Graduate Students Forum (ICADL 2012 main confer ence): Presentations by graduate students
Roundtable Discussion: Discussion by invited delegates
Research Forum: Presentations of r esearch papers reviewed by the organizing committee based on the
quality
Res earch Forum
The topics of the Research Forum include
Digital Libraries, Archives, Curation and Preservation
Infor mation Access, Discovery and Retrieval
Metadata and Knowledge Organization
LIS Education, Professional Practices, and LIS and Society
Web, Social networking and Infor mation management
Chairs
Gobinda Chowdhury (University of Technology Sydney, Australia) and Vilas Wuwongse
(Thammasat University, Thailand)
Organizers
Chern Li Liew (Victoria University of Wellington, New Zealand), Gary Marchionini (University of
North Carolina, USA) Ronald L. Larsen (University of Pittsburgh, USA), Edie Rasmussen
(University of British Columbia, Canada), Emi Ishita (Kyushu University, Japan), Nisachol
Chomnongsri (Suranaree University of Technology, Thailand), Shigeo Sugimoto (University of
Tsukuba, Japan), Shalini Urs (University of Mysore, India), Hsueh-hua Chen (National Taiwan
University, Taiwan), Lampang Manmart (Khon Kaen University, Thailand), Hao-Ren, Ke (National
Taiwan Nor mal University, Taiwan), Schubert Foo (Nanyang Technological University, Singapor e),
Chutima Sacchanand (Sukhothai Thammathirat Open University, Thailand), Hideo Joho (Universit y
of Tsukuba, Japan)
Background
Previous workshops
WIS 2010:
WIS 2011:
CiSAP (Consortium of iSchools - Asia Pacific) . edu.sg/cisap/about.htm) was established on
5 December 2008 as a not-for-profit organisation to promote collaboration a mong iSchools in the Asia Pacific
region. Currently, 24 institutions from 11 differ ent countries have joined CiSAP. Members of the
Consortium are academic institutions for higher education inter ested and involved in education and research
in the area of 'information'. Being part of CiSAP and being involved in the consortium's activities provides
member institutions an important opportunity for international engagements and for raising the profile of their
information school programmes and research. CiSAP is founded as a voluntary organization and does not
requir e a membership fee.
Information & Knowledge Management, University of Technology Sydney, NSW, Australia
{}{}
<b>Data Collection: </b>First, SCOPUS database was chosen because of its being the largest abstract and citation database of
peer-reviewed literature [10]. A search for DL publications (Search Terms: “Digital Librar*” in the field: Keywords
with Date range “1990 - 2010” ) was conducted with a result of 7905 DL publication records. Second, 1015
subtopics from the DL knowledge map (1990-2010) [1] were used as Search Terms in the field: Keywords for
searching publications within the 7905 DL publication records. For each subtopic, all of details (publication numbers
by years, first time appearance of the subtopic) were recorded and transferred to Microsoft Excel 2007 for later
<b>Calculating R-Squared values: </b>R-Squared value is a number ranging from 0 to 1 that reveals how closely the
estimated values for a trendline (a straight line relationship) correspond to a set of actual data (a trend line is most
reliable when its R-Squared value is at or near 1 and vice versa, if near or at 0, it is least reliable). In fact, in linear
regression, the trend line is a regression line drawn on a scatter graph and used to fit a predictive model to an
observed data set of <i>y </i>(value on <i>y axis</i>) and <i>x </i>(value on <i>x axis</i>). After developing such a model, if an additional value
of <i>x </i>is then given without its accompanying value of <i>y</i>, the fitted model can be used to make a prediction of the value
of <i>y</i>. In Excel, the R-Squared value is calculated by the equation for the Pearson product moment correlation
coefficient. The formula for R is:
R
and R-Squared returns R2<sub>, which is the square of this correlation coefficient. In our research, to measure the trends in </sub>
the DL research (1990-2010), the R-Squared values were calculated in Excel 2007 based on the degree of association
between variables (variable “Publication” or “Subtopic Number” on <i>y axis</i>; variable “Year” on <i>x axis</i>). The trend lines
showing the DL research trends were classified into 3 types: Increasing Trends (Positive Association), Decreasing
Trends and Not Identified Trends (No Association)
Chart 1: Trend in Total Publication Numbers of
DL Research (1990-2010)
Chart 2: Trend in Total Subtopics Numbers of
Table 1: Publication Numbers vs. R-Square Numbers Table 2:Subtopic Numbers vs. R-Square Numbers
Core Topics No
of
Publications
Core Topics <b><sub>R</sub>2</b> Core Topics No of
Subtop
ics
Core Topics <b><sub>R</sub>2</b>
#8.Architecture -
Infrastructure
<b>15339</b> #7.User Studies
(Increasing Trend)
<b>0.92</b> #8.Architecture -
Infrastructure
<b>144</b> #8.Architecture –
Infrastructure
(Decreasing Trend)
<b>0.38</b>
#19.DL Research &
<b>0.24</b>
#4.Information Retrieval <b>5365</b> #13.Semantic Web
(Web 3.0)
(Increasing Trend)
<b>0.84</b> #16.Digital Library
Applications
<b>64</b> #3.Information
Organization
(Decreasing Trend)
<b>0.23</b>
#1.Digital Collections <b>4593</b> #2.Digital Preservation
(Increasing Trend)
<b>0.84</b> #6.Human - Computer
Interaction
<b>61</b> #1.Digital
Collections
(Decreasing Trend)
<b>0.23</b>
#16.Digital Library
Applications
<b>3987</b> #18.Cultural,Social,Le
gal,Economic Aspects
(Increasing Trend)
<b>0.83</b> #7.User Studies <b>59</b> #13.Semantic
Web(We b 3.0)
(Increasing Trend)
<b>0.19</b>
#6.Huma n - Computer
Interaction
<b>2582</b> #16.Digital Library
Applications
(Increasing Trend)
<b>0.83</b> #9.Knowledge
Manage ment
<b>58</b> #4.Information
Retrieval
(Decreasing Trend)
<b>0.18</b>
#10.Digital Library
Services
<b>2571</b> #10.Digital Library
Services
(Increasing Trend)
<b>0.82</b> #15.Digital Library
Manage ment
<b>53</b> #9.Knowledge
Manage ment
(Increasing Trend)
<b>0.18</b>
#7.User Studies <b>2485</b> #9.Knowledge
Manage ment
(Increasing Trend)
<b>0.82</b> #19.DL Research &
<b>48</b> #19.DL Research &
Developme nt
(Decreasing Trend)
<b>0.17</b>
#2.Digital Preservation <b>2141</b> #19.DL Research &
Developme nt
(Increasing Trend)
<b>0.82</b> #1.Digital Collections <b>48</b> #11.Mobile
Technology
(Increasing Trend)
<b>0.12</b>
#15.Digital Library
Manage ment
<b>1705</b> #6.Human - Computer
Interaction
(Increasing Trend)
<b>0.80</b> #2.Digital Preservation <b>46</b> #5.Access
(Decreasing Trend)
<b>0.09</b>
#9.Knowledge
Manage ment
<b>0.80</b> #10.Digital Library
Services
<b>30</b> #17.Intellectual
Property, Privacy,
Security
(Decreasing Trend)
<b>0.05</b>
#18.Cultural, Social,
Legal, Economic
Aspects
<b>1193</b> #4.Information
Retrieval
(Increasing Trend)
<b>0.79</b> #13.Semantic
Web(Web 3.0)
<b>30</b> #10.Digital Library
Services
(Decreasing Trend)
<b>0.03</b>
#14.Virtual
Technologies
#17. Intellectual
Security
<b>764</b> #12.Social Web
(Web 2.0)
(Increasing Trend)
<b>0.75</b> #18.Cultural,Social,
Legal, Economic
Aspects
<b>25</b> #16.Digital Library
Applications
(Decreasing Trend)
<b>0.02</b>
#13.Semantic Web
(Web 3.0)
<b>590</b> #5.Access
(Increasing Trend)
<b>0.74</b> #11.Mobile
Technology
<b>22</b> #7.User Studies
(Increasing Trend)
<b>0.01</b>
#5.Access <b>544</b> #8.Architecture –
<b>0.69</b> #12.Social Web
(Web 2.0)
<b>21</b> #6.Huma n -
Computer
Interaction
(Decreasing Trend)
<b>0.01</b>
#11.Mobile Technology <b>359</b> #1.Digital Collections
(Increasing Trend)
<b>0.69</b> #14.Virtual
Technologies
<b>20</b> #15.Digital Library
Manage ment
(Decreasing Trend)
<b>0.01</b>
#12.Social Web
(Web 2.0)
<b>298</b> #20.Information
Literacy
(Increasing Trend)
<b>0.57</b> #20.Information
Literacy
<b>0.54</b> #5.Access <b>14</b> #2.Digital
Preservation
(Increasing Trend)
<b>0.00</b>
#21.Digital Library
Education
<b>180</b> #21.Digital Library
Education
(Increasing Trend)
<b>0.13</b> #21.Digital Library
Education
<b>5</b> #21.Digital Library
Education
Trend)
<b>#DIV</b>
<b>/0!</b>
In Table 1, it can be noted that although Architecture – Infrastructure (15339), DL Research & Development (14210),
Information Organization (6036), Information Retrieval (5365) and Digital Collections (4593) are top 5 core topics
with highest publication numbers, they are not the most trending core topics with R2 <sub>values = 0.69; 0.82; 0.80; 0.79; </sub>
and 0.69 respectively. Vice versa, User Studies (2485), Mobile Technology (359), Virtual Technologies (1105),
Semantic Web(Web 3.0) (590), and Digital Preservation (2141) are 5 core topics having less number of publications
than the top 5, they get the highest R2 <sub>values = 0.92; 0.92; 0.87; 0.84; and 0.84 respectively. It should be noted that </sub>
values of publication numbers by years just tell us how DL research trends happened in the past while R2 <sub>values show </sub>
how the trends will happen in future (future predictions of the trends). In other words, based on calculating the actual
data of two variables “Year” and “Publication”, R2 <sub>numbers reveal how closely the estimated values for a trend line ( a </sub>
straight line relationship) correspond to a set of actual data.
In Table 2, based on the calculation of the actual data of two variables “Year” and “Subtopic Number” of 21 core
topics, there are 7 increasing trend core topics, 13 decreasing trend core topics and 1 not indentified trend core topic.
Although, Architecture – Infrastructure; Information Organization; Information Retrieval; Digital Library
Applications; and Human - Computer Interaction were top 5 core topics with highest subtopic numbers, viz. 144, 141,
78, 64, and 61 respectively, their future as shown by R2 <sub>values were decreasing trends, such as: Architecture – </sub>
Infrastructure (0.38); Information Organization (0.23); Information Retrieval (0.18); Digital Library Applications
(0.02); and Human - Computer Interaction (0.01). With regard to top core topics with increasing trends in subtopics
Overall, there are strong increases in 21 core topics of DL research (1990-2010) with their total future growth
prediction as R2 <sub>= 0.836 (very reliable). Despite the decreasing in most of subtopic numbers, the future declining </sub>
trends of the subtopics are not reliable for having R2 <sub>value = 0.0383. Most remarkably, there are some topics showing </sub>
their future growths in R2 <sub>values of both numbers of publications and subtopics, viz. User Studies, Mobile Technology, </sub>
Semantic Web(Web 3.0), Social Web (Web 2.0), Knowledge Management, Digital Preservation which will be the
major research interests for DL communities in the future. However, the core topic: Digital Library Education with
least publications, subtopic numbers and R2 <sub>value should be paid more interest so that it would enhance the activities </sub>
of research, education and implementation within the DL domain.
1. Nguyen, H.S. & Chowdhury, G. 'Digital Library Research (1990-2010): A Knowledge Map of Core Topics
and Subtopics', ICADL 2011 vol. 7008, ed. F.C. C. Xing, and A. Rauber (Eds.), Springer-Verlag Berlin Heidelberg
2011, Beijing, pp. 367-371 (2011)
Chunsheng Huang1,2
1 <sub>School of Information Studies, University of Wisconsin-Milwaukee, Milwaukee, WI, USA </sub>
2 <sub>Library, National Chung Hsing University, Taichung, Taiwan </sub>
<b>Abstract. </b> Learning style has been identified to be influential in users'
preferences of information searching systems. However, little is known about
how learning styles may have an impact on users’ help-seeking interactions.
This proposal reports preliminary results of a dissertation study investigating
the effects of learning styles on help-seeking behaviors in the digital library
environments. Index of Learning Styles was employed to measure users’
different dimensions of styles. Multiple data collection methods, including
questionnaires, think-aloud protocols, transaction logs, and interviews, were
employed to collect data from 37 participants. Findings of this study
demonstrate that participants demonstrated different approaches of help-seeking
as well as the influences of users’ learning styles on their corresponding
interactions with help features of digital libraries.
<b>Keywords: </b>Learning styles, help-seeking, digital libraries, interactions.
User’s information need has to be fulfilled by providing well-designed system.
However, end users usually encounter various problems when interacting with
information retrieval (IR) systems and it is even more so for novice users. The most
common problem reported from previous research is that novice users do not know
how to get started even though most IR systems contain help mechanisms. Since
digital libraries were developed during the past decade, most users are unfamiliar with
them. Novice users, who never use or rarely use digital libraries, need to learn how to
use new digital libraries by interacting with help features to fulfill their searching
need. However, many research studies have demonstrated that the existing help
Cognitive preferences unconsciously serve as an adaptive control mechanism
between the inner self-need and external interacting environment. In learning
activities, individuals’ preferred ways of processing information is called learning
styles. Learning styles have been confirmed by previous studies to deeply influence
on how users process information in their search process. Different style users apply
their particular ways of chosen search strategies and their preferred system features.
While most learning styles theories classify learners into few groups, the Index of
Learning Styles (ILS) describes learners in more detailed dimensions:
Active/Reflective, Visual/Verbal, Sensory/Intuitive, and Sequential/Global [1].
Several researchers studied the effects of learning style and the associated
dimensions on users’ reactions to information organization and representation, search
strategy, and search performance [2]. Help-seeking represents a mini information
search process. The factors, that influence information seeking and retrieving, also
affect users’ help-seeking. Help is defined as assistance or clarification from either an
IR system or a human in the search process when people encounter problems[3].
Although previous research has addressed the issues of help-seeking and various
cognitive factors, they were investigated separately. There are two main limitations
associated with the previous research: 1) most studies focus on how cognitive factors
affect search behaviors, yet less research focuses on the influence on help-seeking; 2)
A user study was designed to address the proposed research questions and
associated hypotheses. Two digital libraries were selected for this study: Library of
Congress Digital Collections and University of Wisconsin Milwaukee Digital
Collections. Both digital libraries provide diversified academic content in various
topics and formats. Most importantly, both digital libraries facilitate information
seeking of novice users with complete and different types of help features. The
context of the study is designed to be in an academic setting with real academic users
and real academic problems. Sixty novice users are expected to be recruited in this
study, including undergraduate and graduate students. Multiple methods were
employed to systematically collect data, including pre-questionnaires, cognitive
measures, think-aloud protocols, transaction log, and post- interviews. The mixed
methods design consists of two major components: qualitative illustration followed by
quantitative testing. Results from the two components will be connected and
interpreted to provide a better understanding of novice users’ help-seeking behaviors.
The preliminary findings focus on descriptive and qualitative analysis. Since novice
The finding of this study emphasizes on answering the research questions in regard
to what are the help-seeking approaches of the users and how learning styles affect
their corresponding help-seeking interactions with help functions. The preliminary
results of the study showed that participants demonstrated different approaches for
help-seeking to deal with various types of help features, such as Interactive Help,
Visual Help, Overview Help, Step-by-step Help, Channeling Help, Viewing Help .
Results of this study also showed that learners with different learning styles exhibit
various dimensions of help-seeking interactions when searching information in digital
libraries. In selecting and using help features, active and reflective learners showed
their preferred approaches of engagement. Visual and verbal learners had their
preferred presentation formats of help features. While perceiving help features,
sensing and intuitive learners had different preferences in relation to help content,
structure, and design. Sequential and global learners applied their preferred strategies
to make sense of and understand digital libraries and their functions. The
characteristics of interactions offer practical implications for the design of digital
libraries to support different types of learning styles. In particular, the results
suggested that digital libraries need to support different learning styles by offering
different types of help features, different formats of help, and different organization
and presentation of help content. Further research will continue to quantitatively test
the exploratory results identified by the qualitative analysis.
1.Felder, R. M., & Silverman, L. K.: Learning and teaching styles in engineering education.
Engineering Education 78(7) 674-681 (1988)
2.Ford, N., Wilson, T. D., Foster, A., Ellis, D., & Spink, A.: Information seeking and mediated
searching. part 4: Cognitive styles in information seeking. JASIST 53(9) 728-735 (2002)
3. Xie, I., & Cool, C.: Understanding help seeking within the context of searching digital
libraries. JASIST 60(3) 477-494 (2009)
Soohyung Joo1
1 <sub>School of Information Studies, University of Wisconsin-Milwaukee, </sub>
P.O. Box 413, Milwaukee, WI 53201, USA
Abstr act. This study aims to investigate in what ways and to what extent digital
library (DL) systems support user-system interactions focusing on search
process. Based on previous interactive information retrieval (IIR) models, a
multi-tiered evaluation framework has been developed to assess system support
for users’ application of search tactics at physical, cognitive and affective
dimensions. In addition, the study plans to investigate how system support
would influence IR outputs and outcomes. This proposal summarizes the
conceptual IIR evaluation framework for DLs, research design and methods,
and some preliminary findings.
In digital library evaluation, major concerns have been usage, service quality,
interface design and usability, and collections in both research areas and operational
DL practices [1]. Less research has been done in the context of IIR evaluation in DL
settings. Interactive Information Retrieval (IIR) evaluation focuses on users’
behaviors and experiences at physical, cognitive and affective levels, and the
interactions that occur between users and systems [2]. This study intends to propose a
process-driven IIR evaluation framework that assesses system support for users’
application of search tactics focusing on search process, not limited to predominant
search results evaluation. The evaluation framework that this study will develop is to
assess in what ways and to what extent a DL system supports different types of search
tactics applied in achieving an information search task. Here are the research
questions (RQs) for this study: 1) What are the types of system supports users need to
apply search tactics during the search process? In what ways does the system support
users’ application of search tactics during the IR process?; 2) To what extent do DLs
support users’ application of search tactics at three hierarchical levels—physical,
cognitive and affective—during the search process?; 3) How does system support
affect IR outputs and outcomes, such as overall satisfaction, usefulness of search
results, knowledge change, and aspectual recall?
The conceptual evaluation framework of this study is based on three aspects of IIR: 1)
user engagement and system support; 2) search tactics in IR process; and 3)
hierarchical levels of interactions.
First, the conceptual framework viewed an IR process as interactions between user
engagement and system support. Both user engagement and system support play
important roles in applying search tactics [3]. To apply search tactics in IR processes,
Second, the evaluation framework of this study assumes that an IR process consists
of users’ application of multiple search tactics. A search tactic refers to a move or
moves, including search choices and actions that users apply to achieve a specific
objective in IR processes [4]. For this study, thirteen types of search tactics were
adopted based on Xie and Joo’s [5] identification of search tactics: Creating search
statement [Creat], Modifying search statement [Mod], Evaluating search results
[EvalR], Evaluating individual items [EvalI], Access Forward and Backward [AccF /
AccB], Exploring[Xplor], and Obtaining [Obt], among others. This study’s evaluation
framework covers a variety of user-system interactions by incorporating thirteen
different types of search tactics into the evaluation practice.
Third, the evaluation framework poses three hierarchical levels in measurement.
Based on Kulhthau’s [6] ISP model, this study attempts to measure user-system
interactions in DLs in three hierarchical levels: 1) at physical level, users’ application
of search tactics and corresponding system support will be explored; 2) at cognitive
level, users’ perceptions of system support and engagement will be measured; and 3)
at affective level, user satisfaction to system support will be investigated. Specific
evaluation criteria and feasible measures are suggested in this study.
To answer the three research questions and to empirically test the conceptual
evaluation framework, a user study was designed. This study applies the identified
framework into the actual IIR evaluation of the US Library of Congress Digital
Collections (LOCDL). LOCDL represents one of the national-level DLs in academia
For data analysis, both qualitative and quantitative methods are used. Qualitative
analysis, in particular open coding and content analysis, will be applied to identify
different types of user engagements and associated system supports. Quantitative
analysis, including inferential statistical tests, will be employed to analyze search
tactic patterns, to numerate the amount of system support at different hierarchical
levels, and to examine the effects of system support on IR outcomes and outputs.
As of June 20th 2012, thirty eight subjects participated in this study. This proposal
summarizes some preliminary findings from the tentative analysis of those thirty-
eight subjects. As to RQ1, initial types of user engagements and system supports for
each type of search tactics were identified based on open coding method. For
example, in Creat tactics, types of users engagements can be “Convert user need to a
search statement,” “Determine search strategies,” “Manipulate search statements” and
others, while corresponding system supports are “Provide interactive search
mechanisms,” “Suggest different search strategies,” and “Offer different search
fields/facets,” among others. For all thirteen types of search tactics, this study will
identify types of user engagements and associated system supports from the
observations of users’ search processes. Moreover, specific system features and
interaction examples will be presented in the final results.
As to RQ2, user engagement was explored by exploring users’ application of
As to RQ3, we found that system support would be associated with IR outcomes
and IR outputs. In particular, according to tentative correlation analysis results,
system supports for Creat, Mod, EvalR, EvalI, and Xplor tactics would be
significantly correlated with knowledge increase, aspectual recall, usefulness of
search results and satisfaction to search results.
This ongoing study will continue to collect data up to at least sixty subjects to
achieve adequate statistical power for multiple regression.
1. Joo, S.; Xie, I.: Evaluation Constructs and Criteria for Digital Libraries: A Document
Analysis. In: Cool, C., Ng, K.B. (Eds). Recent Developments in the Design, Construction
and Evaluation of Digital Libraries. Hershey, PA: IGI Global. (in press)
2. Kelly, D.: Methods for Evaluating Interactive Information Retrieval Systems with Users.
Foundations and Trends in Information Retrieval 3(1-2), 1-224 (2009)
3.Xie, I.: Supporting Ease-of-use and User Control: Desired Features and Structure of Web-
based Online IR Systems. Information Processing and Management 39, 899-922 (2003)
4. Xie, I.; Joo, S.: Factors Affecting the Selection of Search Tactics: Tasks, Knowledge,
Process, and Systems. Information Processing and Management 48(2), 254-270 (2012)
5.Xie, I.; Joo, S.: Transitions in Search Tactics during the Web-based Search Process. Journal
of American Society for Information Science and Technology 61(11), 2188-2205 (2010)
6.Kuhlthau, C.C.: Inside the Search Process: Information Seeking from the User’s Perspective.
Journal of the American Society for Information Science 42, 361–371 (1991)
Ya-Ning Chen and Hao-Ren Ke
Graduate Institute of Library and Information Studies, National Taiwan Normal University,
Taipei, Taiwan
,
<b>Abstract. </b> This study aims at exploring the similarity of information
organization behaviors between users and experts through lens of mental model.
16 journals, 1,491 articles, 3,978 tags of CiteULike, and 6,717 descriptors of
LISA were selected for analysis between 26 February and 2 March 2011. Four
in-depth research questions presented below were investigated to examine the
similarity of mental models for information organization: 1) correspondence
between social tags and article titles, and descriptors and article titles in
scholarly journals, 2) similarity between social tags and descriptors in scholarly
journal articles, 3) the usage of keyword categories for social tags and
descriptors and similarity between them, and 4) implicit patterns and structures
of used keyword categories embedded in social tags and descriptors.
<b>Keywords: </b>social tags, descriptors, mental models, information organization
With widespread applications of Web 2.0, social networking platforms offer users
social tags and folksonomy to organize personal information resources. These
platforms not only provide an opportunity to study how users organize resources for
personal information management, but also aggregate information organization in a
collective intelligence seamlessly. The approach of social tag is distinctive from that
of information organization in library and information science (hereafter LIS) in that
information organization is conducted in a top-down manner but social tagging is
grass-rooted. Therefore, it becomes an emergent issue to study behaviors of
information organization in order to bridge the gap between users and experts.
In a study of student’s behaviors of database query, users regard keywords as
concepts for retrieving documents from a database [1]. In addition to treating tags as
keywords, taggers also use tags as concepts to organize personal information
resources in the tagging process [2]. From the perspective of information organization,
keywords have been regarded as concept and tool for information organization such
as subject headings, authority file and thesaurus.
Mental models are defined as “people’s mental representation of information
objects, information systems, and other information related processes”[3]. In addition
to applications of information retrievals and information systems, mental models are
also adopted in information organization to explore users’ cognition of how
information is organized [4]. Mental models are further employed to examine whether
users and information organization experts have the similar cognitive understanding
of FRBR for metadata description [5-6]. Therefore, if users and information
organization experts can share similar mental models, then the gap between users and
information organization experts would be harmonized to facilitate more effective
information organization and retrieval.
This study aims to explore the similarity and difference of information
organization practices and behaviors between users and experts in order to provide
suggestion for information organization through lens of mental models. Research
questions are proposed by this study in the following:
• RQ1: What is the correspondence between social tags and article titles, and
descriptors and article titles in scholarly journals?
• RQ2: What is the similarity between social tags and descriptors in scholarly
journal articles?
• RQ3: What is the usage of keyword categories for social tags and descriptors
and similarity between them?
• RQ4: What are the patterns and structures of used keyword categories embedded
in social tags and descriptors, and similarity between them?
This study selected related LIS journal articles from CiteULike and LISA as target
subject. Journals were selected according to the following criteria: relative
prominence in LIS as indexed by the Journal Citation Report and LISA, overall
coverage in theoretical and practical, and advantage of author’s domain knowledge.
Thus this study selected social tags of LIS journal articles from CiteULike, and
descriptors from LISA between 26 February and 2 March 2011. Journals were sorted
in an alphabetical order, and then each article was given with a sequential number.
Each article’s tags were placed adjacent to article title in another column, and then
descriptors put next to tags in a new column. In a total of 16 journals, 1,491 articles,
3,978 tags, and 6,717 descriptors were selected for analysis.
Based on the concept of application profile [7], the study adopted Tag Category
Model [8] and matching categories [9] as a framework to develop two sets of
classification schemes. One is used to classify categories for tags and descriptors, and
the other is to compare term’s similarity between tags and descriptors. Three
individuals with LIS background were divided into three groups in order to undertake
the in-depth analysis for the above three parts of this study (RQ1-RQ3) and authors
did all of three parts. The values of agreement have arrived at substantial level of
Cohen’s kappa values for consistency [10]. If there any difference exists between LIS
individuals and authors, a meeting was convened to discuss to achieve agreement for
inter-reliability. Furthermore, social network analysis (SNA) and frequent pattern (FP)
free were employed to investigate the implicit patterns and structures of used keyword
categories embedded in social tags and descriptors (RQ4).
31.95% of CiteULike tags and 18.37% of LISA descriptors were identical to
corresponding keywords in article titles. An inverse J shape shows that both of used
tags and descriptors follow a Zipfian power-law distribution, and the usage of tag and
descriptor categories also echoes the similar distribution. Both of the top-8 used tag
categories and descriptor categories are in line with 80/20 rule of distribution
accounting for majority of usage, but the ranking orders of used categories are
different. According to the usage, the Zipfian curve of used descriptor categories is
much steeper than that of used tag categories and elicits that the usage of descriptor
categories centered on fewer categories than that of tags. The most popular category
of tags is category 01 (i.e., title-exact), the usage of which is 31.95%, whereas
category 09 (i.e., topic-general) is the most popular category of descriptors and usage
is 41.16%. If related categories are grouped together, the ranking order and usage of
categories are distinctive between tags and descriptors.
In light of term’s comparison between tags and descriptors, non-matches were the
most popular category and usage was over half. The partial matches were the second
rank of popular category and higher than exact matches. In terms of SNA, centrality,
grouping clusters, co-used groups and structural equivalent role of used keyword
categories between tags and descriptors were different. Based on path-based rules of
FT tree analysis results, taggers’ collective mental model of keyword selection is
much shallower than that of information organization experts. It means that experts
are inclined to use more keywords and categories than taggers to represent concepts
of information objects.
Initially, this study is successful in developing two sets of classification schemes to
examine the similarity of behaviors for information organization between users and
experts. However, an in-depth study is needed to explore the similarity of mental
model in keyword association for information organization in the future.
1. Holman, L.: Millennial students’ mental models of search: Implications for academic
librarians and database developers. Journal of Academic Librarianship, 37(1), 19-27. (2011)
2. Smith, G.: Tagging: People-powered metadata for the social web. Berkeley, CA: New
Riders. (2008)
3. Zhang, Y.: Undergraduate students’ mental models of the web as an information retrieval
system. Journal of the American Society for Information Science and Technology, 59(13),
2087-2098. (2008)
4. Ahlstrom, V., Allendoerfer, K.: Information organization for a portal using a card-sorting
5. Pisanski, J., Žumer, M.: Mental models of the bibliographic universe. Part 1: mental models
of descriptions. Journal of Documentation, 66(5), 643-667. (2010)
6. Pisanski, J., Žumer, M.: Mental models of the bibliographic universe. Part 2: Comparison
task and conclusions. Journal of Documentation, 66(5), 668-680. (2010)
7. Heery, R., Patel, M.: Application profi les: mixing and matching metadata schemas. Ariadne,
25. (2000), les
8. Heckner, M., Mühlbacher, S., Wolff, C.: Tagging tagging: Analysing user keywords in
scientific bibliography management systems. Journal of Digital Information, 9(2). (2008)
9. Carlyle, A.: Matching LCSH and user vocabulary in the library catalog. Cataloging &
Classification Quarterly, 10(1/2), 37-63. (1989)
10.Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data.
Biometrics, 33(1), 159-174. (1977)
Luke Kien-Weng Tan
Wee Kim Wee School of Communication and Information
Nanyang Technological University, Singapore
Weblog or Blog, is a specialized web site that allows individuals to express their
thoughts, voice their opinions, and share their experiences and ideas. The easy access
and availability of blog sites (e.g., www.blogspot.com) have encouraged web-users to
change from consumers to providers of information. Providers of such content exert a
certain level of influence on the receivers and this is evident from blog sites having
effect on their readers’ purchase decisions (e.g., www.engadget.com), attitudes and
approaches to life (e.g., www.lifehack.org), political viewpoints (e.g.,
www.huffingtonpost.com), and others. Merriam-Webster dictionary (www.merriam-
webster.com) defines influence as <i>“the power or capacity of causing an effect in </i>
<i>indirect or intangible ways”</i>. Influence is a characteristic of an individual that defines
the capacity of exerting some effect on other individuals [3]. A blogger is influential
if he has the capacity to affect the behavior of fellow bloggers. The ability to detect
influence in the blogosphere could be used to identify influential blogger and the
chain of information flow. Through this, further stimulus could be added to aid the
flow of positive information, or pre-emptive and preventive actions taken to minimize
any negative impact.
Previous studies linked information propagation and influence to blog features,
which are mainly graph-based, such as the number of in and out-link [1, 2]. However,
the use of blog features alone to detect influence in the blogosphere may not yield
highly accurate results. This is because influence is a subjective concept and often
depends on the context of the posting. More recent studies had used sentiment
analysis on links between blogs to detect influence [8, 9]. These studies had focused
on a single notion that influence exists in the blog posts links, and had not studied the
details in the influence types and styles of the blog sites. Moreover, influence is a
complex concept that cannot be described using simple directional quantity derived
from the presence or absence of links.
The aim of this study is to develop an influence detection model to automatically
• What are the blog features that could indicate the influence within the
blogosphere?
• Will automatic influence style analysis help detect influence propagation
within the blogosphere?
To answer the research questions, the following research objectives will be carried
out:
• Determine the blog features that show influence within the blogosphere.
• Establish a linguistic approach to improve sentiment analysis between linked
blog posts.
• Determine the blog sites and bloggers influence styles.
• Develop an influence detection model to detect the influence in the
blogosphere.
In the study by Agarwal and Liu [2], an influential blogger is defined based
on the number of in-links to the post, and the post length. A high authority value
which refers to a larger number of in-links to the blog postings may indicate higher
readership. However, having higher readership may not necessary infer influence. For
Sentiment analysis is a type of content analysis which aims to identify opinions,
emotions, and evaluations expressed in natural language [13]. Earlier studies in
sentiment analysis at the sentence or phrase level had used the notion that an opinion
word associated with its aspect or feature would appear in its vicinity [7]. However,
opinionated text could be written in elaborate styles where sentences have nested
clauses with the related opinion and subject words hidden in separate clauses.
Methods using word distances [5] or part-of-speech patterns [14] will not be able to
detect the relationship of the opinionated words as these methods typically assume
both the opinion and aspect words to appear within a certain distance of one another.
Moreover, the grammatical relationship between the words had largely been ignored.
Recent sentiment analysis research has focused on the functional relations of words
using typed dependency parsing, which provides a refined analysis on the grammar
automatically learn the typed dependency patterns for sentiments prediction. More
recent studies had used sentiment analysis on links between blogs to detect influence.
Li et al. [9] considered the positive and negative edges of the nodes in their attempt to
detect influence in the social network. Leskovec et al. [8] similarly adapted a
framework of trust and distrust in an attempt to infer the attitude of one user toward
another using the observed positive and negative relations. These studies had focused
on a single notion that influence exists between the links of the blog posts, and had
not studied the details in blog sites influence styles. In this study, the influence styles
of the blog sites and bloggers are determined to better describe the influence exerted.
A blogosphere could have numerous posts resulting in a complex network of
influence between the blogs. In our study, the focus is on detection of influence
propagation between a linked blog post and the linking blog post. To understand the
influence propagation characteristics of blog posts, the archives from product review
blog sites would be used in the study. The overview of the research methodology is
shown in Figure 1.
Linked
Blog (A)
Linking
Blog (B)
<b>(2) Sentiment Analysis</b>
1.Typed Dependency Rules
2.Sentence-Level Sentiment
Polarity Prediction Rules
3.Semantic Analysis
<b>(1)Blog Features </b>
<b>Analysis</b>
<b>(4) Influence Propagation Detection </b> <b>(3) Influence Styles Detection</b>
<b>Fig. 1. Overview of research methodology </b>
In step (1), a study to explore the blog features that are useful in detecting
influence between the linked blogs would be done. An analysis on the blog content to
predict the sentiments expressed within the blog content in step (2) is performed to
take into account the contextual consideration of the blog posts. A linguistic approach
that leveraged on the typed dependency rules, and further consider the complex
phrase relationships found in a sentence is proposed to improve sentiment analysis
performance. Subsequently in step (3), blog features analysis and sentiment
predictions would be combined to profile the blog sites and bloggers using influence
styles. The influence styles further describe influence through the engagement,
persuasion, and persona of the blog sites and bloggers. Engagement style refers to the
participation and involvement level of the bloggers towards the blogs. Cialdini and
Goldstein [4] defined persuasion as a process of influence through appeals to reason
or emotion, which we evaluate in the persuasion style analysis. Kelman [6] defined
compliance as an influence process, referring to the agreement expressed between
linked blog sites. The blog site’s persona is a measurement of compliance in the
persona analysis. The influence styles together with the relevant blog features are then
to be used in an approach to detect influence propagation in step (4).
The contribution of the research would be the novel approach of detecting
influence propagation within the blogosphere through analyzing the sentiments
expressed in the blog posts and influence styles of blog sites and bloggers. Unlike
previous studies, the proposed approach would automatically generate the influence
styles of the blog sites and bloggers. This is done using identified blog features and a
linguistic based sentiment analysis. Further to that, the novel idea of using influence
style as a parameter to detect influence propagation within the blogosphere would be
explored. As influence is a subjective and complex concept, it is believed that
describing influence in details through influence style would improve influence
detection performance by providing an in-depth analysis of influence.
1. Adar, E., Adamic, L.A.: Tracking Information Epidemics in Blogspace. In: Conference on
Web Intelligence, 207-214 (2005)
2. Agarwal, N., Liu, H.: Blogosphere: Research issues, tools, and applications. In: SIGKDD
Explorations Newsletter<i>, </i>10(1), 18-31 (2008)
3. Agarwal, N., Liu, H., <i>Modeling and Data Mining in Blogosphere</i>, Morgan & Claypool,
San Rafael, CA. (2009)
4. Cialdini, R.B., Goldstein, N. J.: Social Influence: Compliance and Conformity. Annual
Review of Psychology, 55, 591-621 (2004)
5. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: International Conference
on Knowledge Discovery and Data Mining, 168-177 (2004)
6. Kelman, H. C.: Compliance, identification, and internalization: Three Processes of attitude
7. Kim, S.M., Hovy, E.: Determining the sentiment of opinions. In: International Conference
on Computational Linguistics ACL, 1367-1373 (2004)
8. Leskovec, J., Huttenlocher, D., Kleinberg, J.: Predicting Positive and Negative Links in
Online Social Networks. In: World Wide Web ACM, 641-650 (2010)
9. Li, H., Bhowmick, S. S., Sun, A.: CASINO: Towards Conformity-Aware Social Influence
Analysis in Online Social Networks. In: Conference on Information and Knowledge
Management ACM, 1007-1012 (2011)
10. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer
Berlin Heidelberg, New York, 37-54 (2006)
11. Shaikh, M. A. M., Prendinger, H., Ishizuka, M.: Sentiment Assessment of Text By
Analyzing Linguistic Features And Contextual Valence Assignment. Applications of
Artificial Intelligence, 22(6), 558-601 (2008)
12. Thet, T. T., Na, J.C. and Khoo, C. S. G.: Aspect-Based Sentiment Analysis of Movie
Reviews on Discussion Boards. Journal of Information Science, 36(6), 823-848 (2010)
13. Wiebe,: Tracking point of view in narrative. Computational Linguistics, 20(2), 233-287
(1994).
14. Yi, J., Nasukawa, T., Bunescu, R. and Niblack, W.: Sentiment Analyzer: Extracting
Sentiments about a Given Topic using Natural Language Processing Techniques. In:
International Conference on Data Mining, 427-434 (2003).
Sutisa Songleknok1<sub>, Smarn Loipha</sub>2<sub>, </sub>
2<sub>Associate Professor, Information and Communication Management Program, Faculty of </sub>
Humanities and Social Sciences, Khon Kaen University, Thailand
3<sub>Assistant Professor, Information and Communication Management Program, Faculty of </sub>
Humanities and Social Sciences, Khon Kaen University, Thailand
<b>Abstract. </b>This research aim to purpose : (1) to exam-
ine knowledge management of indigo dyed cloth commu-
nity enterprises base on a value chain concept, (2)
to examine factors supporting knowledge management
and community enterprises management of indigo dyed
cloth community enterprises, (3) to develop a
knowledge management model for indigo dyed cloth com-
munity enterprises base on a value chain concept.
Expected outcome was to obtain a model for knowledge
management base on a value chain concept which is
correspondent to the context of the indigo dyed cloth
community enterprises in Sakon Nakhon province. This
model is a relationship between knowledge management
processes, Business management and enable factors of
knowledge management and community enterprises.
<b>Keywords : </b>Knowledge management, Community
enterprises, Value chain, indigo dyed cloth
adfa, p. 1, 2011.
© Springer-Verlag Berlin Heidelberg 2011
In Thailand, community enterprises are founded by community organizations, which
are started by people in the community. The businesses are run by the community, for
the community, and by the community fund (Petprasert, 1999; Pipatseritham, 2004) to
increase the members’ incomes and to improve living conditions and life quality of
local people, who are the majority of the country (Chommuang and Wasusophapon,
It has currently been found that some community enterprises are successful—they
could produce quality products granted a five-star standard of OTOP products, while
some are not so successful, some disappear from the market, and some are still at-
tempting to improve the quality of their products to meet the standard set by the pub-
lic authority. To reach the standard, it takes a long period of time because the com-
munity often lack knowledge of product development. Besides, community enterpris-
es encounter a lot of problems. These include (1) limited business management skills
resulting in low competitiveness level and low rate of growth and their sales just cov-
er the expenses and production costs (Allen, 1999; Petprasert, 1999), (2) product
problems which involve product design of how to be correspondent to and to meet the
customer’s needs as well as low quality and low standard products (Saenpot, 2003),
(3) product advertisement of those relying on outside markets or targeting at outside
markets (Thailand Productivity Institute, 2003), (4) labor (i.e., lack of skilled staff in
production, labor’s in-depth knowledge training, specific knowledge for product ex-
pansion such as accounting, production development, laws, contract, and so on (Thai-
land Productivity Institute, 2003; Pisaisawat, 1996)), and (5) inaccessibility to neces-
sary information such as marketing, raw materials, financial sources, and so on (Thai-
land Productivity Institute, 2003). These factors clearly demonstrate problems in
knowledge management of community enterprises.
Knowledge management is vital for the development of community enterprises be-
cause knowledge is an important factor driving economy and increasing competitive
level (Holsapple & Singh, 2001; Nonaka, 1998). Knowledge management reinforces
continuing learning, which is crucial for survival, maintenance, and sustainability of
the organization’s excellence. With constant and systematic knowledge management,
an organization can use the existing and stored knowledge for relearning and expand-
ing to create innovations (Drucker, 1993; Wijan, 2003). Community enterprises
should be viewed as a holistic scheme, from upstream production to midstream and
downstream ones. To achieve this, a concept of value chain should be adopted be-
cause it connects production activities from raw production searching and processing
to procedures of delivery and customer services. This concept can help increase a
business’s competiveness level and its product value in the customer’s eye.
Based on the literature reviews of previous studies related to community enterprises
in Thailand, it was found that the previous studies mainly focused on six areas. The
first area was on how to solve problems related to community enterprises manage-
ment by focusing on specific management issues (e.g., record of income and expense)
(Saenpot, 2003). Another area of the previous studies was attempting to solve prob-
lems related to knowledge management in community enterprises focusing on educa-
tional knowledge management by creating a new curriculum or integrating knowledge
management as part of the existing curriculum (Kemakorn, 2009; Mekwan, 2006).
Also, previous studies examined knowledge management of successful community
enterprises (Jonjoubsong, 2008; Phabu, 2002; Tinnaluck, 2004, 2005). The last re-
search area found in the previous studies was that they examined community enter-
prises by analyzing in order to understand major and support business activities
through a value chain concept (The Northeastern Strategic Institute of Khon Kaen
University and Office of the Public Sector Development Commission, 2006). To be
competitive in the market, the community enterprises must improve its organization
For the purpose of this study, indigo dyed cloth community enterprises were selected
because they were community enterprises whose identities reflected characteristics of
community enterprises. Also, they adopted local wisdom as a basis for their produc-
tion as the local wisdom is related to their ethnic group.
Natural indigo dyed cloth has gradually disappeared; there are currently only eight
countries producing indigo dyed clothes. In Thailand, major places of indigo dyed
cloth are the northeast : Udon Thani province, Sakon Nakhon province, Mukdahan
province, and Chaiyaphum province. This study, indigo dyed cloth places in Sakon
Nakhon province were selected because they were identities reflected characteristics
of community enterprises. Sakon Nakhon province has realized the importance of the
wisdom of indigo dyed cloth. It has promoted and preserved its local wisdom and
encouraged people to use their wisdom as part of their career. Moreover, Sakon Na-
khon province promoted indigo dyed clothes as their provincial dresses. Also, the
public sector has encouraged Sakon Nakhon province to be the center of indigo dyed
cloth cluster. This policy is in line with the World Craft Council of UNESCO, which
perceives the importance of local arts and crafts, so it has set up a policy to preserve
local arts and handicrafts for the restoration of indigo dyed cloth all over the world
(Kenan Institute, 2006; Thongchern, 2006).
3.1 To examine knowledge management of indigo dyed cloth community enterprises
base on a value chain concept.
3.2To examine factors supporting knowledge management and community enterpris-
es management of indigo dyed cloth community enterprises.
3.3 To develop a knowledge management model for indigo dyed cloth community
enterprises base on a value chain concept.
This study will be purposively selected. The community enterprises to be selected is
indigo dyed cloth community enterprises, which is rated as the five-star OTOP prod-
uct. The community enterprises that will be chosen are Baan Tam Tao Housewife
Farmers Group, Samak-kee Patana sub-district, Arkat-amnuay district, Tee Ta natural
indigo dyed group, Nahuabor sub-district, Pannanikom district.
This study is involved with knowledge management of indigo dyed cloth community
enterprises by deploying a value chain concept in order to understand indigo dyed
cloth community enterprises in Sakon Nakorn. Major frameworks of this study are
knowledge management and value chain. Based on the value chain concept, an indigo
dyed cloth community enterprises is divided into three processes: raw material sup-
ply, production of indigo dyed clothes, and marketing and sales promotion. This
study will carefully examine the knowledge management in each chain value stage,
namely 1) the establishment of objectives for knowledge management and types of
knowledge, 2) searching and provision of knowledge both within and outside the
community, 3) creation of new knowledge by examining phenomena and knowledge
building of each member, member groups, and community enterprises of other com-
munities, 4) categorizing knowledge and systematic storing of knowledge, 5) accessi-
bility and retrieval of knowledge of members in the community enterprises, 6) trans-
ferring of knowledge among members, and 7) use and reuse of knowledge as well as
methods of how knowledge is used and reused. To understand this phenomenon,
environmental factors reinforcing success in this business will also be examined.
<b>Fig. 1. The conceptual framework </b>
To develop a knowledge management model for indigo dyed cloth community enter-
prises following a value chain concept, the results from researching on social phe-
nomena, knowledge management procedures of each value chain, and factors contrib-
uting to the success in knowledge management and running indigo dyed cloth com-
munity enterprises will be used. It is expected that this model will lead to increasing
potential and competitiveness level of the community enterprises.
This study will adopt a qualitative approach. And research methodology is followed
to objectives as show in table 1.
<b>Table 1. </b>
7.1Obtain a model for knowledge management base on a value chain concept which
is correspondent to the context of the indigo dyed cloth community enterprises in
Sakon Nakhon province.
7.2Obtain an enable factors be in knowledge management and business community
management . Enable factors specific to the knowledge management and the business
community management.
7.3The model can be applied in other community enterprises similarities the context.
1. Allen, R.: Community enterprise: Civil society of the economic question. University of
Birmingham, Birmingham (1999)
2. Chommuang, L., Wasusophapon, S.: Thai Lua weaving clothes: Community business for
self sufficiency. Sangsan Publications, Bangkok (2003)
3. Community Development Office in Sakon Nakhon.: One Tambon One Product. Commu-
nity Development Office in Sakon Nakhon, Sakon Nakhon (2004)
4. Drucker, P. F.: Post-capital society. Harper and Collins, New York (1993)
5. Holsapple, C.W., Sing, M.: The knowledge chain model: Activity for competitiveness.
Expert System with Applications, vol 20, pp. 77-98 (2001)
6. Jonjoubsong, L.: An integrated knowledge management model for community enterprises:
A case study of a rural community enterprise in Thailand. Ph.D. Dissertation, School of In-
formation Management, Faculty of Commerce and Administration, Victoria University of
Wellington, New Zealand (2008)
7. Kemakorn, C.: Knowledge management organization model of Thai community business.
PhD. Dissertation, Graduate school, Chiang Mai University (2009)
8. Keenan Institute.: Final report of cluster mapping to increase the competitiveness level of
the production and service sector. Udomrat Printing and Design, Bangkok (2006)
9. Mekwon, C.: Knowledge management of community business groups in Roi-et. Master
thesis in Social Studies, Graduate School, Khon Kaen University (2006)
10. Nindam, S.: A study of community business performances, shops at Highway Service Cen-
ter of Kaophoe, Prachup Kerikhan. Master of Arts thesis in Rural Studies and Develop-
ment, Graduate School, Mahidol University (2008)
11. Nonaka, I.: The knowledge-creating Company. In P. F Drucker (Eds.), Harvard business
review on knowledge management. pp. 21-45. Harvard Business Press, Boston (1998)
12. Petprasert, N.: Community business: Potential ways. Research Fund, Bangkok (1999)
13. Phabu, T.: Processes of knowledge transfer in silk weaving: A case study in Baan Tayuak,
Tung Luang sub-district, Suwannaphum district, Roi-et. An Independent Study Report in
Thai Studies, Mahasarakham University (2002)
14. Pipatseritham, K.: OTOP: Layman fighters and community marketing persons. AR Busi-
ness Press, Bangkok (2004)
15. Pisaisawat, S.: Production and distribution of Mudmee-Kid Thai silk products: A case
study in Udonthani. Journal of Research for Development, vol 25(89), pp. 23-33 (1996)
16. Porter, M. E.: Competitive advantage: creating and sustaining superior performance: with
a new introduction. The Free Press, New York (1985)
17. Saenpot, K.: Improvement of community business performances: A case study of handi-
craft products in Udon Thani. PhD. Dissertation , Graduate School, Khon Kaen University
(2003)
18. SMCE.: Information system of community enterprises. Retrieved on November 11, 2011
from (2004)
19. Thailand Productivity Institute.: Educational report on the assessment of problems and
needs of community enterprises. Niwatporn Printing, Bangkok (2003)
20. The Northeastern Strategic Institute of Khon Kaen University and Office of the Public
Sector Development Commission.: Methods for value chain analysis: Jasmine rice for im-
port. Khon Kaen (2006)
21. Tinnaluck, Y.: Knowledge creation and sustainable development: A collaborative pro-
cess between Thai local wisdom and modern sciences. Ph.D. Dissertation, Depart-
ment of Public Communication of Science and Technology, University of Poitiers,
France (2005)
22. Tinnaluck, Y.: Modern science and native knowledge : Collaborative process that opens
new perspective for PCST. QUARK, vol 32, pp. 70-74 (2004)
23. Thongchern, P.: Reviving of indigo: An action research on the restoration of indigo dyed
cloth of local Southern areas. Research Project and Area Development 5, Research Fund,
Songkla (2006)
Jan Askhoj, Shigeo Sugimoto, and Mitsuharu Nagamori
Graduate School of Library, Information and Media Studies, University of Tsukuba, 1-2 Kasuga,
Tsukuba, Ibaraki 305-8550, Japan
, {sugimoto,na gamori}@slis.tsukuba.ac.jp
This paper presents a domain ontology for cloud archives, based in part on the
PREMIS Editorial Committee ontology for the PREMIS Data Dictionary. Our
ontology’s design is based on a layered model of cloud computing where lower
layers provide shared services to higher layers, resulting in the creation of generic
Submission Information Packages with PREMIS preservation metadata.
An ontology is designed to define a common vocabulary for cloud archives, and
define the roles and responsibilities for data creation and transfer, including the
registration of cloud-based content creation systems. We define the classes, object
properties, data properties and annotations necessary to describe the agents, objects,
events and rights that comprise a cloud archive.
We evaluated the ontology with a prototype system, using real-world examples of
cloud systems, digital objects and metadata. We found that the ontology was able
to describe the chosen components successfully, and that it improved metadata
interoperability between content creating applications and the services providing
Keywords: Archives; Metadata; Preservation; Cloud Computing; PREMIS; OWL
In recent years there has been a huge growth in the use of cloud computing for digital content
(Leavitt, 2010). With this move to cloud computing, organisations are gaining a number of
benefits, such as economies of scale, reduction of capital expenditure, on-demand scalability and
so on (Armbrust, 2010). However, the outsourcing of hardware, software and data storage to one
or more third parties makes it more difficult to guarantee the long term preservation of archival
content. Traditional models of archiving, such as the OAIS Model (CCSDS 2002), where data is
sent in packages from a producer to an archive in complete control of the technological
infrastructure, do not address the specific characteristics of cloud computing. We have previously
identified four areas of cloud computing that differ from the OAIS approach (Askhoj, 2010;
Askhoj, 2011).
To accommodate for these differences, the entities of a cloud archive and their roles and
dependencies must be formally defined, and the metadata necessary for preservation of Digital
Objects must be identified, captured and stored.
<i><b>2.1</b><b>A layered domain model of cloud computing</b></i>
Our model is a layered model, that builds on the way cloud computing can be divided into
different Service Models, such as PaaS and SaaS. The purpose of this division into layers to
match the service models is to match current cloud service delivery models.
As a starting point, we are working with the assumption that the cloud enables the sharing of
We have used the National Institute of Standards and Technology (NIST) October 2011 definition
of cloud computing (Mell, 2009). NIST not only defines the essential characteristics of the cloud,
but also presents the three basic models of service delivery: Software as a Service (SaaS),
Platform as a Service (PaaS) and Infrastructure as a Service (IaaS).
Figure 1.
Concept
ual
Model
of a
Cloud
Archive
divided
into
layers
1.T
he PaaS layer (Layer 1) provides Cloud Storage, a trusted, long-term repository for simple
bit-streams. These are primitive units that make no sense as information outside the
context of a system that can read and represent them. The bit-streams can be parts of the
2.The SaaS layer (Layer 2) holds the Creating Applications (i.e. applications for the
creation of Digital Objects used by a Producer) that represent PaaS layer bit-streams to
content users/creators as Digital Objects with associated metadata.
3.Digital Objects need accompanying metadata before they can be ingested into an archive
(in other words, they need to be turned into Submission Information Packages). This
metadata is needed to ensure long term preservation (for example information about
provenance and structure), and is different from the metadata added in layer two which is
specific to the Creating Application, such as descriptive metadata (Dublin Core, MODS or
similar). The Preservation Layer (Layer 3) creates SIPs that can be accessed by archive
systems.
4.The Interaction Layer (Layer 4) is where agents (users or systems) access cloud systems
to create, manage or archive Digital Objects using a browser or dedicated archive systems.
Archive systems in the Interaction Layer ingest Submission Information Packages
produced by the Preservation Layer.
The arrows in the right side of Figure 1 show the development of information as it moves
between the layers. There is a progression of complexity in information, from the simple bit-
strings in the PaaS layer to the complete Information Packages in the Interaction layer. It should
be noted that it is possible for a higher layer to interact with a layer one or more layers down (in
effect “skipping a layer”). An example of this is the Preservation Service allocating storage in the
PaaS layer.
<b>3. </b>
In a cloud environment, functionality in one or more of the layers from 1 to 3 may lie outside the
control of the archiving organisation. It therefore becomes extremely important to describe the
types of data produced and received by each layer. Without such information, it becomes
impossible to abstract functionality, as there are no guarantees that the necessary data will be
produced in the right format. We therefore defined a domain ontology for use in the design of a
cloud archive system, as outlined in the conceptual model.
<i><b>4.1 A Model Preservation System for Ontology Design</b></i>
Figure 2 illustrates the information flow from Digital Object to Archive. In our model, the
Archive System and Creating Application share a common storage platform. As storage needs to
be reliable and long term, this is allocated by the Preservation Service. The Preservation Service
serves as an abstraction layer between the Creating Application and the Archive. An Archive
needs Digital Objects to be accessible long after the organisations that created them have
disappeared.
In order to get access to storage and submit to the archive, Creating Applications need to register
using a registration template originating from the Preservation Service. The registration is used to
record information about the Digital Objects produced, any associated metadata schemas and the
Creating Application itself. This information is needed for preservation purposes, and can be
thought of as Static Preservation Metadata. We use the word static, because the preservation
related properties of the systems creating Digital Objects are expected to remain relatively
consistent, changing only in case of major version upgrades or added functionality.
Once registration is complete, the Preservation Service allocates storage space for the Creating
Application to save Digital Objects and for an Archive System to access these objects. This
information is passed on to the Creating Application as a Registration Response containing the
Storage URI, Path and Access Keys. The Creating Application can now submit Digital Objects to
the Allocated Cloud Storage. Along with the Digital Objects, the following is saved: Original
Metadata from the Creating Application and any Metadata about the Digital Objects required by
The Static and Dynamic Metadata cover a large part of the information necessary for preservation
purposes. Using the OAIS terminology, these two types combined deliver the Preservation
Description Information and Representation Information necessary to create Submission
Information packages. These are types of information that cannot be generated automatically by
the Preservation Service without input from the Producer. (OCLC, 2002)
Once submission is complete, the Creating Application Notifies the Preservation Service about
the submission. The preservation Service creates a Submission Information Package for the
archive system containing a URI to the Digital Object and Preservation Metadata created from
The Static and Dynamic Preservation Metadata; event information automatically generated as
part of the submission process; information resulting from analysis of the Digital Objects
themselves, and any information extracted from the Original Metadata.
The functional entities Archive System, Preservation Service, Creating Application and Storage
have been used as classes for our ontology, along with the information types produced, such as
Information Package, Registration Request. We have used the information flow between the
functional entities in the system to define the properties associated with the classes. These are
shown in figure 2 as the arrows between entities and as the contents of the different information
types indicated by a line.
rvation System
Figur
e
2
P
r
e
s
e
The classes related to preservation metadata for Digital Objects have been taken directly from the
PREMIS Editorial Committee OWL ontology draft. These are part of the PREMIS data dictionary,
and need to be included (Gartner R, 2004).
We have used the entities from the PREMIS data model as super-classes (Agents, Events, Objects
and Rights). These entities not only provide a convenient way to group classes, they can also be
used to express class inference. For example, RightsGranted is a sub-class of Rights. The classes
and sub-classes in the ontology are not intended to express property inheritance.
Using the PREMIS data model entities to group classes has the benefit of providing a second
level of semantics, by incorporating relationship information from the PREMIS data model. For
example, Agents are related to Objects, via either Events or Rights.
<b>5.2Class Extensions and Annotations </b>
The PREMIS Editorial Committee ontology has been extended by a number of other metadata
schemas, namely FOAF, SKOS Core, PRONOM, ORE and Dublin Core. We have decided not to
use these for the purposes of this paper. This is partly because we have no current plans to use
these schemas and partly to keep our own ontology as simple as possible.
The terms in the PREMIS data dictionary have annotations relating to their usage, such as
Definition, Rationale, Creation/Maintenance Notes and Usage Notes (Woodyard-Robinson, 2007).
We have added the annotation Layer to our classes. Layer is used to define where in the Model a
class is located, and makes it possible to assign responsibility for the functionality in a class to an
entity in a Layer.
<i><b>5.3</b><b>Object and Data Property Aspects</b></i>
Whereas classes are used to capture information about individuals and groups of individuals,
Object Properties connect individuals, and Data Properties connect literals and individuals (W3C,
2009). Using these, we can show the information flow in our conceptual model.
For Object Properties (properties where the value is an individual) we have included the
following annotations: 1. definition, 2. the property domains and ranges and domain/range
relationship (functional or inverse functional) 3. if the property is mandatory or not, 4. if the
property is repeatable or not, and 5. other comments such as See Also and Usage Notes.
For Data Properties (properties where the value is a literal), we have included the same
information as above. However, as Data Properties are used for literals, we have included an
annotation for Origin. Origin is used to define which entity in the Layered Model generates the
Data Property literal. For example, <i>contentLocationType </i>is generated by the Preservation Service.
<i><b>5.4</b><b>Using OWL as a Domain Description Language.</b></i>
We choose OWL (Web Ontology Language) to describe our domain. Compared to RDF, OWL
offers better semantic expression and greater machine interpretability than RDF, and is therefore
ideally suited to our purposes (McGuinness, 2004). Furthermore, an OWL ontology for the
PREMIS Data Dictionary was announced on October 18, 2011. This ontology is not finalized at
the time of writing, but the groundwork in defining the PREMIS semantics in OWL has been
completed. The newly drafted standard is available for comment from the PREMIS Editorial
Committee, and forms the basis of the ontology (LoC, 2011).
<i><b>5.5</b><b>Extensibility</b></i>
One of the main reasons for designing an ontology in OWL is cross domain interoperability. By
having a well-defined common vocabulary, individuals from different domains can be linked
according to their semantics. OWL already has three constructs to do this: owl:sameAs,
owl:differentFrom and owl:AllDifferent. We have come to the conclusion that these constructs are
not enough to express the relationship between individuals in different PREMIS implementations.
Good examples of this are PREMIS entities that are defined by locally controlled vocabularies.
The entity may be the same, but due to differences in vocabulary use, using owl:sameAs may
give rise to problems when exchanging data. We have chosen to use the Simple Knowledge
Organization System (SKOS) mapping properties to link individuals (Miles, 2005).
<b>6.Evaluation of the ontology using a case scenario </b>
We have defined a case scenario, using existing cloud components to show how the ontology can
be implemented. Our case scenario is very similar to the model preservation system from Figure 2.
It contains the same main entities and information flow. Each entity is an individual from a class
in the ontology, with the functionality of the individual explained in the class definition.
Individuals are linked to one or more layers, using the Layer annotation. The individuals
<i><b>6.1</b><b>Registration process</b></i>
Based on the class description from our ontology, Preservation Service is responsible for ensuring
the validity and completeness of preservation metadata to create archive packages. As OWL does
not specify any syntactic constraints, the preservation service provides an XML Schema
registration template, to be populated by the owning organisation of the Creating Application
(Drupal). Here, the class RegistrationResponse is used to define what data properties are related
to the registration, and how the registration is related to other classes, such as Event outcome. The
registered data can be automatically extracted using XPath and imported into the Preservation
Service. Any errors or omissions in the XML Schema result in a negative registration.
The registered data gives the preservation service the ability to validate the metadata provided by
the Creating Application. This is done by ensuring that all Mandatory Data Properties with the
origin Business System are either preregistered (static information such as <i>signatureMethod</i>) or
designated as provided at time of creation (dynamic information such as <i>originalName</i>).
If the provided data meets the requirements, a XML response is sent back to the Creating
Application from the Preservation Service, containing URI, path and access keys for the shared
Cloud Storage (Amazon S3).
<i><b>6.2</b><b>Conversion into Generic Submission Information Package.</b></i>
Once registration is complete, the Creating Application can save digital contents to the dedicated
Cloud Storage. Digital contents consist of three parts: the Digital Objects in an agreed format;
original metadata such as Dublin Core or MODS, and any preservation metadata not provided
during registration (dynamic metadata). Once saved, the Preservation Service provides read
access to these objects for the Creating Application and the archive system (DSpace).
Another benefit in the ontology lies in the linking of metadata from different creating applications
to one authoritative schema. As long as the creating applications are registered with the correct
metadata linking to the data properties in the ontology, complete Submission Information
Packages can be created from applications with different metadata schema.
<b>7 Discussion </b>
Our major criterion for evaluation has been whether the ontology can be applied to real world
data. We have evaluated the ontology by using values from existing cloud system components
and data from a PREMIS version 2.1 Sample Record1 <sub>from LoC. The components used were </sub>
Amazon S3 for Cloud Storage, two instances of Amazon EC2 with Ubuntu Linux 10.10 as SaaS
Platform and Preservation Service platform and Drupal as Creating Application.
We found that the ontology was descriptive enough to create a generic XML package with
PREMIS metadata, including cloud specific entities such as platform descriptions. We were able
to map instances to cloud layers, and to assign them to ontology classes. Using the OWL Object
Properties we were also able to show relationships between entities: for example, which Agent is
1 <sub> />
responsible for the creation/transfer of which Object. Finally, the SKOS properties allowed us to
link a number of elements from the DC Metadata Element Set to PREMIS.
Based on the discussion above, we believe that the test system we built using our ontology meets
the requirements presented in the introduction of this paper; it is possible for a Producer and an
preregistered metadata to automatically create SIPs in a way that eases the burden of metadata
provision for Producers. By assigning origin and layer information to each term in the ontology, it
is possible to assign responsibility for the metadata to specific layers and entities.
Whereas the ontology is complete in its current version (subject to modification after further
tests), the system we have built for testing purposes is still not mature and relies on a number of
functions being carried out by hand. Once this is complete, the next step will be to integrate the
cloud components and perform a re-evaluation of the ontology using a larger set of test data.
<b>8 Conclusion </b>
In this paper, we have presented an OWL ontology for cloud archive systems built on the
PREMIS Editorial Committee ontology combined with a layered model of cloud computing. We
believe that the strength of the ontology lies in the fact that it not only describes a metadata model
for Submission Information Packages, but also for the entities contributing to these packages. We
believe this to be a benefit in a cloud system with multiple Creating Applications such as the one
described in the paper. One reason for this is that a system with a large number of Creating
Applications increase the chances that not all of these will be able to supply submission packages
the right format (this is also a problem for many non-cloud based systems). Another reason is that
when different system entities share computing resources, such as storage, having a single model
to describe these resources increases consistency. Furthermore, without a common vocabulary
and information model, it is difficult to describe the different cloud entities that contribute to the
creation of Information Packages in a manner consistent for preservation purposes.
We used the ontology to describe a number of cloud system components, such as platform,
storage and creating application together with a PREMIS version 2.1 Sample Record. In our
model system, we found that the ontology was able to describe the chosen components
successfully, and that it allowed some metadata interoperability between content creating
applications and the preservation service. So far, our model system has provided a proof-of-
concept by showing an example information flow between system entities. In future, we plan to
create an integrated system that implements a storage controller to allow better abstraction of the
Cloud Storage and a registration framework.
<b>References </b>
Armbrust M, et al. (2010) A view of cloud computing. Communications of the ACM. Volume 53
Issue 4, April 2010, Pages 50-58. ACM New York.
Askhoej J, and Sugimoto S (2010) A Model for the Provision of Preservation Metadata as a
Service. In Taipei, Taiwan: CiSAP.
Accessed 15 Mar. 2012.
Askhoj J, Sugimoto S, Nagamori M (2011) Preserving Records in the Cloud. Records
Management Journal 21 (3). 175–187. doi:10.1108/09565691111186858.
CCSDS Secretariat (2002) Reference Model for an Open Archival Information System (OAIS).
Blue Book. Issue 1. The Consultative Committee for Space Data Systems.
Gartner R (2004) PREMIS—Preservation Metadata Implementation Strategies Update 2: Core
Elements for Metadata to Support Digital Preservation. RLG Diginews 8. 6. Article 3.
Leavitt N (2010) IEEE Xplore - Is Cloud Computing Really Ready for Prime Time? Computer 42.
1.January. 15 – 20.
LoC (2011) PREMIS Data Dictionary for Preservation Metadata Version 2.1. PREMIS Editorial
Committe. www.loc.gov/standards/premis/v2/premis-2-1.pdf. Accessed 15 Mar. 2012.
McGuinness DL, and Van Harmelen F (2004) OWL Web Ontology Language Overview. W3C
Recommendation 10.
Accessed 15 Mar. 2012.
Mell, P., and T. Grance (2009) The NIST Definition of Cloud Computing. National Institute of
Standards and Technology 53. 6.
Miles A, Matthews B, Wilson M, Brickley D (2005) Skos Core: Simple Knowledge Organisation
for the Web. In International Conference on Dublin Core and Metadata Applications, pp–3.
dcpapers.dublincore.org/index.php/pubs/article/view/798. Accessed 15 Mar. 2012.
The Library of Congress (2011) PREMIS OWL Ontology Now Available.
Accessed 15 Mar.
The OCLC/RLG Working Group on Preservation Metadata (2002) Preservation Metadata and the
OAIS Information Model - A Metadata Framework to Support the Preservation of Digital Objects.
Dublin, Ohio.
Accessed 18 Aug. 2012.
W3C (2009) OWL 2 Web Ontology Language Document Overview. W3C OWL Working Group.
Accessed 15 Mar. 2012.
Wetteroth, D (2001) OSI reference model for telecommunications. McGraw-Hill Professional.
Woodyard-Robinson, D (2007) Implementing the PREMIS Data Dictionary: a Survey of
<b>Wirapong Chansanam1<sub>, Dr. Kulthida Tuamsuk</sub>2<sub>, Dr. Kanyarat Kwiecien</sub>3</b>
<b>1</b><sub>PhD Candidate in Information Studies, Faculty of Humanities and Social Sciences, </sub>
Khon Kaen University, Thailand ()
<b>2</b><sub>Associate Professor, Information and Communication Management Program, Faculty of Humanities </sub>
and Social Sciences, Khon Kaen University, Thailand ()
<b>3</b><sub>Lecturer, Information and Communication Management Program, Faculty of Humanities </sub>
and Social Sciences, Khon Kaen University, Thailand ()
The Greater Mekong Sub-region (GMS) is an economic region bonded by the Mekong River.
The land of this region covers 2.6 square kilometers with a population of 326 million people. The
GMS comprises Cambodia, People’s Republic of China (only Yunnan province and Guang–xi
province), Lao People’s Democratic Republic, Myanmar, Thailand, and Viet Nam. In 1992, with the
assistance of ADB, the six countries entered into a program of sub-regional economic cooperation,
designed to enhance economic relations among the countries. The program and its projects are
supported by contributions from the Asian Development Bank (ADB) and other sources. The first
priority projects in the region include transportation, energy, telecommunication, environment, human
resource development, tourism, trading, and investment of both the private and agricultural sectors
During the past decades, cultural heritages have undergone several changes. Part of it was
caused by the use of management tools invented by UNESCO. Currently, cultural heritages do not
limit to only the collection of objects for memory. They also include tradition or practices related to
living, passed down from ancestors to later generations. Amid the world’s delicate situation and
globalization, intangible cultural heritages are vital factors that help preserve cultural diversity. An
attempt to understanding intangible cultural heritages of people from different cultures helps increase
communication across cultures, and, at the same time, helps encourage people to respect each other’s
way of life. The significance of the intangible cultural heritages are not simply that they are heritage
culture, but because they are knowledge and skills continuously passed down from one generation to
another generation. Social and economic values of the transmission of knowledge reveal the relations
of sub-social groups and major social groups in one culture. This practice is remarkable important for
developing countries because it leads to the development of human resources (UNESCO, 2011).
Currently, a number of intangible cultures are rapidly disappearing. This may be due to
changes in society and culture, the development of large-scale industry, increasing tourism, mobility
of up-country people to big cities, and changing environment. Under such changing context, the
practices of the intangible cultures as well as the transmission of such cultures have been considerably
affected. The announcement and registration of the intangible cultures are an important measure
increasing people’s awareness on their unique values. Also, they are means to eulogize ancestors’
knowledge and wisdom, promote cultural dignity and identity of all groups living all over the country.
They can also create understanding and acceptance of cultural diversity leading to creative
preservation and development, which are both systematic and sustainable (Department of Cultural
Promotion, 2012).
Knowledge domains and categorization of intangible cultures of most of the registered ones
constitute all categories of intangible cultures. One of the starting areas is the domain of knowledge
indicated in Section 2.2 of the convention concerning the protection of intangible cultures (i.e.,
tradition, figure of speech, and performances as well as languages which are a medium of intangible
cultural heritage, performing arts, social practices, ceremonies and festivals, knowledge and practices
about nature and universe, and traditional crafts). Accordingly, it is obvious that the domain of
knowledge does not cover all the existing contents. Any classification systems are just to better
organize the data of the list of the registered items (UNESCO, 2011a).
However, there should not only be one form of collection and classifications such well known
forms as the Library of Congress System or Dewey Decimal Classification widely used on the Internet
network. For web technology, distribution frameworks (e.g., Warwick Framework by Lagoze, Lynch,
and Daniel (1996) ) were proposed. This framework focuses on the importance and necessary of
metadata standards for interoperation. Therefore, Resource Description Framework (RDF) may have
to be reconsidered when designing a working framework. RDF is prepared for metalanguage to build
and use metadata on web technology. RDF is one means of use of the semantic web—a project of
World Wide Web Consortium (W3C). Under the concept of ontology, information and knowledge are
knowledge-based (Fishwick and Miller, 2004), mainly under the format of Extensible Markup
Language (XML). Grigoris and Frank (2004) concluded that an ontology approach is a method to
describe opinions under the domain of interest. In other words, it is the specification of a
conceptualization. Ontology is the construction of specific knowledge-based structure or domains,
which share mutual concepts and understanding and which are able to categorize documents of the
data under the domain of interest. Challenges on diversity and accuracy have been major issues for
criticisms, in particular in relation to Geographic Information System (GIS). As such, ontology
research is conducted to achieve the objective of the GIS through multiple perspectives (Schuurman,
2006; Schuurman and Leszczysnki, 2006; Kitchen and Dodge, 2007)
Therefore, to develop an integrated semantic web technology and a GIS for managing GMS’s
intangible cultural knowledge through the use of shared information under the ontology-based
approach, the structure of knowledge data will be constructed. Definitions of similar knowledge sets
from different institutions’ knowledge sources must be similar and related, and also they must have the
same meanings. This will be a knowledge base for data integration and linking of various data
sources. As a consequence, the integrated semantic web technology and the GIS can be presented. The
1. To examine knowledge domains of GMS’s intangible cultures
2. To construct an ontology of intangible cultures and an integrated semantic web by using the
Geographic Information System
3. To construct the GMS’s intangible cultural knowledge base systems by integrated semantic
web technology and geographic information system
This study will adopt Research and Development (R&D) approach. The details of research
methodology are shown in the following table.
<b>Objectives </b> <b>Method </b> <b>Tools/Procedures </b> <b>Population/Sample </b> <b>Outcomes </b>
To examine the
knowledge
-Documents in
GMS’s cultures
-Experts in GMS’s
cultures
Knowledge
domains
To construct an
ontology of
intangible cultures
and an integrated
semantic web by
using the
Geographic
Information
System
-qualitative
research
-Experts in GMS’s
cultures
-Ontology experts
An ontology of
GMS’s
intangible
cultures
To construct the
GMS’s intangible
cultural
knowledge base
systems by
integrated
semantic web
technology and
geographic
information
system
-Ontology experts
-Experts in GMS’s
cultures
<b>Conceptual framework </b>
<i><b>Figure1 </b></i>Research conceptual framework
1.Obtain the knowledge domains of intangible cultures in the GMS, procedures to manage the
domains, and methods to apply appropriate modern technology in knowledge management through
research methodology. Academics in cultural anthropology can use the obtained domains to compare
against other cultures and also can improve and add up more domains. For information specialists,
they can correctly and completely publicize the domains currently found.
2. Obtain methods to improve an information technology system by applying a cultural
anthropology approach through the consultation with experts and interdisciplinary integration of
information technology development by using ontologies, semantic web technology philosophy, and
geographic information system. The obtained methods will be guidelines for interdisciplinary research
aimed at conducting research by integrating knowledge from different disciplines.
3. Obtain the form of ontologies and semantic web for intangible cultures, which will be used
as basic information and methods for modern knowledge management and which will be presented in
the geographical information system that help enhance more accurate semantic searching and truly
replace the knowledge of the intangible culture domains for information science researchers.
Akerkar, R., & Saja, P. Knowledge Based Systems. Jones and Bartlet Publishers. 2010.
Asian Development Bank. (2011). Great Mekong Sub-region. Retrieved December 11, 2011, from
Berners-Lee, T., Hall, W., Hendler, J., Shadbolt, N., & Weitzner, D. J. (2006). Creating a Science of
the Web. <i>Science, 313</i>(5788), 769-771.
Burrough, P. A., & McDonnell, R. A. (1992). Principles of Geographical Information Systems,
Oxford University Press : New York.
Department of Cultural Promotion. (2011). Intangible Cultural Heritage. Retrieved December 13,
2011, from
Dublin Core Metadata Initiative.(2011). Dublin Core Metadata Element Set, Version 1.1: Reference
Description.Retrieved December 19, 2011, from
19990702.htm
Fishwick, P. A., & Miller, J. A. (2004, 5-8 Dec. 2004). <i>Ontologies for modeling and simulation: issues</i>
<i>and approaches. </i>Paper presented at the Simulation Conference, 2004. Proceedings of the 2004
Winter.
Kitchen, R.,& Dodge, M.(2007). Rethinking maps. <i>Progress in Human Geography</i>, 31(3), 331-344.
Kules, B., & Shneiderman, B. (2008). Users can change their web search tactics: Design guidelines for
categorized overviews. <i>Inf. Process. Manage., 44</i>(2), 463-484.
Lagoze, C., Lynch, C.,& Daniel, R. Jr.(1996).The Warwick Framework: A container architecture for
aggregating sets of metadata. Cornell Computer Science Technical Report TR96-1593.
Retrieved December 29, 2011, from
Lassila , O.,& Swick , R. R.(2004).Resource Description Framework (RDF) RDF/XML Syntax
Specification (Revised) W3C. Retrieved December 15,2011, from
Guarino, N. (1998). <i>Formal Ontology in Information Systems</i>. Proceedings of FOIS’98, Trento, Italy, 6-
8 June 1998.
Schuurman, N. (2006). Formalization Matters: Critical GIS and Ontology Research. [doi:
10.1111/j.1467-8306.2006.00513.x]. <i>Annals of the Association of American Geographers, 96</i>(4),
726-739.
Schuurman, N.,& Leszczynski, A. (2006). Ontology-Based Metadata. <i>Transactions in GIS, 10</i>(5), 709-
726.
UNESCO. (2011a). Drawing up inventories. Retrieved December 19, 2011, from
_. (2011b). What is Intangible Cultural Heritage?. Retrieved December 19, 2011, from
Uschold, M.,& King, M. (1995). Towards a Methodology for Building Ontologies. Retrieved
December 29, 2011, from
Uschold, M,& Jasper, R. (1999). A Framework for Understanding and Classifying Ontology
Applications. Retrieved December 29, 2011, from
W3C. (2011). Technology & Society Domain Activity. Retrieved December 29, 2011, from
/>
Weber, R., & Kaplan, R. (2003). <i>Knowledge-based knowledge management</i>. In Innovations in
Knowledge Engineering, International Series on Advanced Intelligence. Volume 4, July 2003,
pp. 151-172. Adelaide:Advanced Knowledge International .
Faculty of Humanities and Social Sciences, Khon Kaen University, Thailand.
Email:
2<sub>Assistant Professor, Information and Communication Management, </sub>
Faculty of Humanities and Social Sciences, Khon Kaen University, Thailand.
Email:
Currently, the world is becoming a knowledge-based society or a know
ledge-based economic society. It is a time where success and the competency
of competition are driven by knowledge. Organizations have to acclimate to
changes (Marquard, 2002). Therefore, in order to become transformational
leaders and gain the competitive advantage, knowledge is needed, especially
the knowledge of intellectual property, which could be valuable and add value
(Kaplan & Norton, 2003). At present, the volume of information and
knowledge has doubled, and this affects the competency of the organization in
accessing information and gaining knowledge. Hence, organizations that are
successful and are able to increase competitive competence often prepare their
workers who have information management skills and knowledge. As a result,
organizations need to have information professionals, who have the
knowledge and an understanding of effective information management within
the organization. Then, organizations will be able to be successful and com-
petitive.
Presently, the role and working style of the information profession
Social Networking and Social Computing have caused changes to
communication. They also play an important role in daily life. Currently,
communications equipment, that can be accessed anytime and anywhere, also
supports multimedia. These changes will affect personal interactions in the
future, which can be accessed anytime and anywhere (Ministry of Information
adfa, p. 1, 2011.
and Communication Technology, 2008). Therefore, the information profes-
sional needs to adjust his/her interaction style with customers.
The information profession is facing changes in technology as in
innovation technology in the digital form of such as electronic publishing, for
example, Web 2.0, Library 2.0, Really Simple Syndication (RSS), Blogs,
Wikis. Short Message Service (SMS), Podcasting, Mashups, Tagging, Folk-
sonomies, Open Source Software (OSS), and Open Access (OA). This has
changed their roles (Nonthacumjane, 2011) . Therefore, the information
professional needs to have knowledge and skills related to current information
technology.
There have been changes in Management. In the 21st century, the
administration has downsized the organization, has adapted the organizational
structure to flat organization, has increased or empowered manpower, has
re-adjusted the role of the administration, and has implemented a proactive
administration. Moreover, competition among government sectors has also
For the information aspect, there have been many changes in infor-
mation; Information Explosion, It is due to a large number of media and in-
formation format that have changed from paper to multimedia. (Raina, 2000)
Information Organization, especially FRBR, Semantic Web, RDF, SPARQL,
Metadata Schemas have been used. (Nonthacumjane, 2011) Information
Service emphasize about Quality of service, Content Provider, Customer Ori-
ented and Proactive service. Information behavior of Users, that the users
access to internet more than the libraries. Therefore, the information profes-
sion needs to be learning about new knowledge and how to modify of work.
framework development project which specifies that all curriculum needs to
be developed at the program level according to its qualification framework in
order to ensure the quality of the graduates it produces. Moreover, in order to
meet the need of the current labor market it is necessary to gain knowledge
and develop skills for information profession because there are threats of
program competition from other similar programs present. Due to our
changing world and threats from other sources, it is important to develop Thai
qualification framework for the information profession. Professional
autonomy could be developed and it could become an assurance tool for
Global changes in communication, technology, management, infor-
mation and other threats affect instruction in course information profession.
Therefore, it is necessary to develop educational standards and improve the
curriculum in fields related to information profession.
1) To synthesize the related literatures, and curriculums in Bachelor’s
Degree relating to Information Profession in Thailand and Foreign Countries.
2) To study the need for Qualification Framework of Information
Profession.
3) To develop the Qualification Framework of Information Profession.
<b>Objective</b> <b>Method</b> <b>Sources of Information</b> <b>Outcome</b>
1. To synthesize Bachelor’s
Degree curriculum in fields
related to the information
Content Analysis Bachelor’s Degree curriculum in fields
related to the information profession in
Thailand and abroad.
Current condi-
tions of Bache-
lor’s Degree
curriculum in
fields related to
the information
profession
2. To study and analyze
the needs of stakeholders
in the information profes-
sion at the Bachelor’s De-
gree level in Thailand
In-depth
Interview
Some of the stakeholders group Qualification
framework
needs of stake-
holders in the
information
Questionnaire 5 Stakeholders groups :
-Teachers in Information Profes-
sions
-Practitioners in Information Pro-
fessions
-Employers
-Students in curriculum related
Information Professions
-Alumni in curriculum related
Information Professions
3. To develop the Thai
Qualification Framework
for the information profes-
sion
Drafting Thai
Qualification frame-
work for information
profession
Result of Objective 1 and Objective 2 Thai Qualifica-
tion Framework
1)
2)
3)
4) The Thai Qualifications Framework for Information profession
can be applied by Universities in ASEAN that provided the curriculum related
information profession.
1. Abels E., Jones R., Latham J., Magnoni D., Gard J. (2003). <b>Competencies for Infor- </b>
<b>mation Professional of the 21st Century. </b>Retrieved May 15, 2011, from
Competencies2003_revised.pdf
2. ALA’s Presidential Task Force. (2008). <b>ALA’s Core Competences of Librarianship.</b>
Retrieved Jan 5, 2011, from
ALA_core_Competences_June_6_2008.pdf
3. Aschoft, L. (2004). Developing competencies critical analysis and personal transferable
skills in future information professionals [Electronic version]. <b>Library Review, 53</b>(2),
82-88.
4. Canadian Library Association. (n.d.). <b>Competency profile of information management </b>
<b>specialists in archives, libraries and records management : A comprehensive cross- </b>
<b>sectoral competency analysis. </b>Retrieved Jan 5, 2011, from
5. European Council of Information Associations. (2004). <b>EUROGUIDE LIS Volume 1</b>
<b>Competencies and aptitudes for European information professionals. </b>Retrieved June
15, 2012, from
6. Gordon B. Davis et al. (1996). IS '97 Model curriculum and guidelines for undergraduate
degree programs in information systems [Electronic version]. <b>ACM SIGMIS Database</b>,
<b>28</b>(1)<b>, </b>1-63
7. International Federation of Library Association and Institutions [IFLA]. (2000).
<b>Guidelines for professional library/information educational program 2000.</b>Retrieved
January 5, 2011, from
8. J. Daniel Couger et al. (1994). IS'95: Guideline for Undergraduate IS Curriculum
[Electronic version]. <b>MIS Quarterly</b>, <b>19</b>(3), 341-359.
9. John T. Gorgone et al. (2002). <b>IS 2002 Model Curriculum and Guidelines for</b>
<b>Undergraduate Degree Programs in Information Systems. </b>Retrieved November 11,
10. Kaplan R.S., & Norton, D.P. (2003). <b>Strategy Maps: Converting Intangible Assets into </b>
<b>Tangible Outcomes. </b>Massachusetts : Harvard Business School Press.
11. Malaysian Qualification Agency. (2002). <b>Programme Standards for Library & </b>
<b>Information Science. </b>Retrieved Nov 11, 2010, from
en/garispanduan_sperpustakaanmaklumat.cfm
12. Marquard M.J. (2002). <b>Building the Learning Organization: Mastering the 5 Ele- </b>
<b>ments for Corporate Learning. </b>California : Davies-Black Publishing.
13. Ministry of Information Communication and Technology. (2010). <b>ICT 2020 Conceptual </b>
<b>Framework </b>Retrieved Jan 10, 2012, from
20100912_ict2020_NICT_v1_2.pdf
14. Nonthacumjane Pussadee. (2011). Key skills and competencies of a new generation of
LIS professionals. <b>International Federation of Library Associations and Institutions - </b>
<b>IFLA, 37</b>(4), 280-288.
15. Office Public Sector Development Commission. (2009). <b>Development of preliminary </b>
<b>model of the public sector : high performance organization. </b>Retrieved Nov 5, 2012,
from
HighPerformanceOrganize.pdf
16. Raina Roshan Lal. (2000). <b>Competency Development among Librarians and </b>
<b>Information Professionals. </b>Paper of XIX IASLIC Seminar. [India] : Bhopal.
17. Sooksan Sompid. (n.d.). <b>Administration and Management : the 21st Century. (in </b>
<b>Thai). </b>Retrieved June 4, 2012, from library.uru.ac.th/article/htmlfile/manage21.pdf
18. Zins Chaim. (2007). Knowledge map of Information science [Electronic ver-
Rushdi Shams and Robert E. Mercer
Department of Computer Science
<i><b>Abstract. Text Denoising </b></i>is a tool that reduces texts to their content-
rich parts. It has been reported as an effective tool which improves
biomedical relation mining as well as supervised keyphrase indexers for
digital libraries. The idea behind text denoising is that the complex-
ity of a sentence plays an important role for it being the content-rich
part of the text. Therefore, the core measure of text denoising is a well-
known readability formula called Fog Index (FI). However, the effect
of using other readability formulas is yet to be explored. In this paper
we plug in four other readability formulas—FRES, SMOG, FORCAST,
and FKRI—with text denoising and report their performance on min-
ing relations from a corpus of 24 biomedical texts. Experimental results
show that FI outperforms all other formulas in terms of meaningful re-
lation extraction. The results also show that besides FI, formulas like
SMOG index and FKRI can be used as core measures of text denoising
for biomedical relation mining.
<b>Keywords: </b>Information Extraction, Information Retrieval, Text Denoising, Text
Readability, Relation Mining.
Apply
Threshold
Apply
Association
Matrix
Apply PPV
and
Sensitivity
Apply Fog
Index
Biomedical
Text
Sentences
Ranked by
Readability
Score
Filtered
Sentences
Ranked
Connected
Concepts
Re-ranked
Fig. 1: Text denoising and related concept extraction method described by Shams
and Mercer [16]
data, the indexers induced better classifiers and achieved better F-score than
their benchmarks. The authors concluded that low-readable sentences of a text
are content-rich when extracting relations or keyphrases.
To assess readability, FI considers two core measures, namely sentence length
and complex words. Currently, over 50 formulas have been proposed [1] as read-
ability measures. They consider different features involved in readability like
paragraph length, white spaces, use of headings, monosyllabic words, choice of
sample size, and proper nouns [7]. Among these formulas, four are considered not
only as yardsticks but also close to the popularity of FI—Flesch Reading Ease
Score (FRES), SMOG Index, FORCAST Index, and Flesch-Kincaid Readability
Index (FKRI). Like FI, the two formulas provided by Flesch use the sentence
length but consider word length as their second core measure. However, the
Flesch formulas use different weighting factors leading them to correlate almost
inversely with FI. On the other hand, SMOG index is similar to FI except that
it operates on some specific samples of the text. FORCAST, unlike most other
formulas, uses only one vocabulary element—monosyllabic words—making it
useful for texts without complete sentences. Despite the difference in their work-
ing principles, the formulas can estimate the difficulty of style; their intention is
not to rate the content, organization, format, imagery or quality of readers [10].
<i>Epilepsy-GABA </i>[15]. Experimental results show that FI outperforms the other four read-
ability formulas as the core measure by extracting more meaningful relations.
The results also demonstrate that among the four formulas, SMOG index and
The organization of the paper is as follows. In the next section, we describe
text denoising as well as the readability formulas used in this research. Following
that, Section 3 describes the methodology. Section 4 shows the experimental
findings. Section 5 draws the conclusion of the paper.
In this section, we briefly describe the text denoising technique that extracts
more content-rich sentences from full biomedical texts based on their FI scores.
This is followed by an overview of the four readability formulas, FRES, SMOG,
FORCAST, and FKRI. A detailed description of FI can be found in [16].
<b>2.1 Text Denoising</b>
One key aspect of biomedical papers is that they contain hidden or explicit re-
lations, especially among drugs, chemicals, diseases, genes and proteins. Most
of the proposed automated relation miners attempt to extract these relations
from paper abstracts because they are easier to access and they are believed to
contain biomedical content information. However, it is unlikely that abstracts
will contain all important relations because they are at best the concise sum-
maries of texts. For this reason, a number of biomedical ontologies like OMIM
(Online Mendelian Inheritance in Man) and GO (Gene Ontology) use human
annotators to extract relations from full texts. This is time-consuming as well
as error-prone procedure. To overcome these shortcomings, Shams and Mercer
[16] have proposed a method, <i>Text Denoising</i>, that identifies those sentences in
a text, called the denoised text, where content information, such as biomedical
relations, is more likely to occur. The rest of the text is called the noise text. The
authors suggested that the describing of biomedical relations lengthens sentences
To evaluate Text Denoising, the method was applied on a dataset of 24 full
texts that describe four related pairs of disease and chemical components. The
method extracted pairs of biomedical concepts from the denoised part of the
dataset of which about 75% are reported as related by the Unified Medical
Language System’s (UMLS) semantic relation network1<sub>. It was also noted that </sub>
1
the noise text did not contain any related biomedical entities of interest. These
experimental findings supported the hypothesis of the authors that sentences
that are difficult to read have the content information of the full text.
<b>2.2 Overview of Readability Formulas</b>
A brief description of the readability formulas is as follows (in historical order).
<b>Flesch Reading Ease Score (FRES) </b>Considered as one of the oldest and
most accurate readability indexes, FRES was developed to advocate a return to
the phonics [5]. The formula uses two core measures—average sentence length
and word length. It was originally developed to assess the grade-level of a reader.
Its use now extends to questionnaire formulation in the US Department of De-
fense and medical form content assessment. Mathematically, FRES can be writ-
ten as Eq. 1.
<i>FRES </i>= 206<i>.</i>835 <i>− </i>1<i>.</i>015 <i>×</i>
(
( \
<i>Total W ords</i>
<i>Total Sentences</i>
\
<i>− </i>84<i>.</i>6 <i>×</i> <i>Total Syllables</i>
<i>Total Words</i> (1)
The FRES score spans the range 0 to 100, where scores between 90 and 100 are
considered easily understandable by an average 5th grader and scores between
0 and 30 are considered easily understandable by university graduates.
<b>SMOG Index </b>SMOG index, when first published, was anticipated as a proper
substitute for FI due to its accuracy and ease of use [12]. A recent study claims
that SMOG index should be the preferred formula when evaluating medical
materials [4]. The formula for SMOG index counts the complex words (i.e.,
words that are polysyllabic) in three 10-sentence samples from documents of <i>n </i>
sentences, takes the square root of the sum of the count normalized by <i>n </i>and
30, and then adds 3.1219 (Eq. 2).
<i>SMOG Index </i>= 1<i>.</i>043 <i>×</i>
I
<i>Complex Words</i>
30 <i>×</i>
<i>n</i> + 3<i>.</i>1219 (2)
The meaning of SMOG index is similar to FI—the index indicates the year
of education required by the reader to understand the sentence. For example,
a passage with a SMOG index of 12 means that to understand it, the reader
should have 12 years of academic education.
<b>FORCAST Index </b>The FORCAST index was originally formulated to assess
(
yet significant vocabulary element—the count of simple words (i.e., monosyl-
labic words). Due to its relative ease of use, the index was applied to write
understandable publications by the U.S. Air Force. Eq. 3 is used to give the
FORCAST index of any document.
<i>N</i>
<i>FORCAST Index </i>= 20 <i>− </i>
<i>10 </i> (3)
where <i>N </i>is the number of monosyllabic words in a 150-word sample of the
<b>Flesch-Kincaid Readability Index (FKRI) </b>A second instalment of a read-
ability index proposed by Flesch and further investigated and modified by Kin-
caid [9] eventually took the form of Eq. 4.
(
<i>FKRI </i>= 0<i>.</i>39 <i>×</i>
\
<i>Total W ords</i>
<i>Total Sentences</i>
\
+ 11<i>.</i>8 <i>×</i> <i>Total Syllables</i>
<i>Total Words</i> <i>− </i>15<i>.</i>59 (4)
The score, like the other indexes, corresponds to a grade level. However, it cor-
relates inversely with FRES due to different weighting factors. For example, a
FKRI score of 10.1 would indicate that the text is anticipated to be understand-
able by any student studying in grade 10. Conversely, in the case of FRES, this
score would indicate it as a low-readable text. Another key difference between
them is that FKRI defines the lowest possible grade level score in theory, which
is <i>−</i>3<i>.</i>40, although very few real-life passages comprise a single one-syllable word.
In this section, we describe the dataset of 24 full texts, the experimental proce-
dures and performance evaluation measures that we used in our experiment.
<b>3.1 The Dataset</b>
FI was successfully established as a core measure for text denoising in [16]. The
trial used 24 biomedical texts as a test dataset divided into four sets. Each set
describes one pair of concepts related with an explicit <i>disease-chemical compo- </i>
<i>nent </i>relation reported by Perez-Iratxeta <i>et al. </i>[15]. The pairs of concepts are
<i>Ischemia-Glutamate, Ataxia-Dehydrogenase, Hypogonadism-Gonadotropin, </i>and
Text Denoising with
Four Readability
Formulas
Texts on <i>Disease-Chemical Component</i>
Concepts
Extracted using
FRES
Concepts
Extracted using
SMOG
Concepts
Extracted using
FORCAST
Concepts
Extracted using
FKRI
Fig. 2: Experimental Procedure
<b>– </b>The texts have been randomly collected from PubMed paper repository2
that can be described by the four concept pairs mentioned above.
<b>– </b>Texts have been preprocessed. For example, several sections of the texts like
title, affiliations, tables, figures, acknowledgments, and references have been
removed.
<b>– </b>Document size in terms of number of words varies.
Several annotation tasks for the dataset are still being carried out. However,
for this experiment, we only needed the pre-processed full texts.
<b>3.2 Procedure</b>
The experimental procedure is shown in Figure 2. In our experiment, we followed
the procedure described by Shams and Mercer [16] as shown in Figure 1, except
that we used four different readability formulas other than the FI. The four
However, among these selected pairs of concepts, some lack representative-
ness (i.e., they do not hold any relation according to UMLS semantic relation
network). These are called <i>noisy </i>pairs of concepts and needed to be removed.
Because we randomly collected texts, we observed that it is possible for the pairs
2
<i>TP </i> <i>TP</i>
to never co-occur in a sentence which indicates that our data set is imbalanced.
So, we used the equally weighted harmonic mean of the PPV and sensitivity of
the pairs of concepts provided by FI to evaluate their representativeness as it is
a great evaluation metric for imbalanced dataset [6].
PPV3 <sub>is the proportion of correctly predicted relations and sensitivity is the </sub>
proportion of relevant relations that are identified by our method. To measure
these values, we considered the number of sentences extracted by the formulas
which is the total number of results returned by the tool (<i>R</i>) that comprises
the number of True Positives (<i>TP </i>) and False Positives (<i>FP </i>). Then, we took
each pair in our co-occurrence frequency matrix and developed a second set of
sentences that contain both the concepts. The number of sentences in this set
is the number of results that should have been returned by our system (<i>S </i>) and
<i>TP</i>. Afterwards, <i>FP </i>is obtained by subtracting <i>TP </i>from <i>R </i>and <i>FN </i>is obtained
by subtracting <i>TP </i>from <i>S</i>. So, the PPV of every pair of connected concepts is
<i>TP+FP </i>and the sensitivity of every pair of connected concepts is <i>TP+FN </i>. Eq.
5 is then used to determine the equally weighted harmonic mean for the given
pair of concepts. In this way, we measured this mean for every pair of concepts
in our co-occurrence matrix.
<i>Harmonic Mean of PPV and Sensitivity</i>
( \
<i>PPV × Sensitivity</i> (5)
= 2 <i>× </i>
<i>PPV </i>+ <i>Sensitivity</i>
These pairs are then re-ranked based on each of their PPV and sensitivity.
From these re-ranked list, top 10 pairs of concepts were considered as the related
concepts of the texts. As we have these 10 pairs of concepts per set of texts by
each of the formulas, we divided them into two groups— (i) the first group con-
tained the pairs of concepts that were reported to be related by UMLS semantic
relation network; (ii) the second group was composed of pairs of concepts that
do not have any semantic relation. The more pairs of concepts extracted using
a readability formula in the first group, the better its performance is.
<b>3.3 Evaluation Measures</b>
Our evaluation of the readability forumlas is twofold. First, we are interested
in knowing the number of meaningful pairs of concepts extracted using each of
the formulas. The concepts in the tables 1–4 are divided into two segments. The
upper segment of the table contains the first group of concepts while the lower
3
Related Concepts
FI SMOG FKRI FRES FORCAST
Rank Harm onic Rank Harm onic Rank Harm onic Rank Harm onic Rank Harm onic
Mean Mean Mean Mean Mean
Ischemia-Glutamate 1 51.85 1 51.85 1 51.85 2 39.13 1 48.15
Levels-Glutamate 3 41.66 3 41.66 3 41.66 3 37.50
Glutamate-Neurons 4 39.02 4 39.02 4 39.02
10Min-Ischemia 5 37.50 5 37.50 5 37.50 1 48.14 6 29.16
Glutam ate-CA4 6 35.89
Increase-Glutamate 7 32.55 6 32.55 6 32.55 4 32.55
Ischemia-5Min 9 31.57 8 31.57 5 31.57
Ischemia-DG 4 39.02
Glutamate-Microdialysis 3 37.50
Neurons-Ischemia 8 22.22
CA1-Ischemia 9 15.78
Glutamate-Release 6 27.77
Levels-Ischemia 2 43.47 2 43.47 2 43.47 2 39.93
10Min-Glutamate 8 31.81 7 31.81 7 31.81 4 32.55 8 27.27
Glutam ate-5Min 9 31.57 8 31.57 5 31.57
Ischemia-Release 6 32.55 10 15.00
Experiment-Ischemia 9 27.77 7 27.77
Glutam ate-Exp eriment 9 27.77 7 27.77
10Min-Release 5 31.57
Increase-Ischemia 7 23.80
Table 1: Relations extracted using the readability formulas from the papers on
Ischemia and Glutamate
Related Concepts
FI SMOG FKRI FRES FORCAST
Rank Harm onic Rank Harm onic Rank Harm onic Rank Harm onic Rank Harm onic
Mean Mean Mean Mean Mean
Friedreich-Ataxia 1 59.25 1 66.66 1 59.25 1 59.25 1 51.85
PDHC-Ataxia 2 56.00 2 56.00 2 56.00 3 48.00 3 39.99
Activity-Friedreich 3 43.47 6 43.47 3 43.47 7 34.78 2 43.47
Patients-Ataxia 3 43.47 3 52.17 3 43.47 2 52.17 5 34.78
Activity-Ataxia 3 43.47 6 43.47 3 43.47 7 34.78 2 43.47
PDHC-Friedreich 3 43.47 6 43.47 3 43.47 7 34.78 5 34.78
Patients-Friedreich 6 36.36 5 45.45 5 45.45
Activity-PDHC 6 37.03 4 37.03
Preparations-Ataxia 4 40.00 7 40.00 4 40.00 6 40.00
Preparations-Friedreich 4 40.00 7 40.00 4 40.00 6 40.00
Pyruvate-Ataxia 5 38.09 4 47.61 5 38.09 4 47.61 6 28.57
Siblings-Ataxia 7 22.22
Disease-Pyruvate 8 21.05
Table 2: Relations extracted using the readability formulas from the papers on
Ataxia and Dehydrogenase
segment of the table has concepts that belong to the second group (see Section
3<i>.</i>2).
Related Concepts
FI SMOG FKR I FRES FORCAST
Rank Harm onic Rank Harm onic Rank Harm onic Rank Harm onic Rank Harm onic
Mean Mean Mea
n
Mean Mean
AAS-Treatment 1 29.41 1 29.41 1 32.35 1 23.52 1 29.41
AAS-Testosterone 3 18.46 6 15.38 3 18.46 3 24.61
Gonadotropin-Treatment 4 18.18 2 18.18 4 18.18 2 21.21 8 12.12
Testosterone-Treatment 5 14.92 8 14.92 5 17.91 3 17.91 9 11.94
Levels-Testosterone 6 14.49 3 17.39 7 14.49 10 11.59
AAS-Conditions 7 12.90
Treatment-HCG 7 12.90 9 12.90 8 12.90 6 12.90
Treatment-Therapy 7 12.90 5 16.12 6 16.12 4 16.12 6 12.90
Gonadotropin-Testosterone 8 14.92 9 11.94 5 14.92
Clomiphene-Citrate 4 24.32
Tamoxifen-Citrate 5 20.28
Use-AAS 2 21.62 4 16.21 2 18.91 8 10.81 2 27.02
Replacement-Therapy 7 12.90
AAS-Conditions 8 12.90
Testosterone-Production 7 15.15
Function-Testosterone 7 12.30
Therapy-Gonadotropin 9 10.00
Function-Testosterone 10 9.67
Therapy-Testosterone 7 12.69
Table 3: Relations extracted using the readability formulas from the papers on
Hypogonadism and Gonadotropin
against a gold standard. As FI is already proved to be an effective measure for
text denoising, we considered the performance of FI as our gold standard. We
considered the related concepts extracted using FI as the <i>positives </i>and examined
the <i>true positives</i>, <i>false positives </i>and <i>false negatives </i>of a given formula, and
calculated its <i>precision </i>and <i>recall</i>. In addition, we calculated both micro and
macro average of precision and recall and hence the F-Score of every formula
(Table 6). We calculated the micro average as we have large number of sentences
that differ from one set to the other as well as the macro average to see how the
formulas performed across all sets [11]. To calculate the micro average, the <i>true </i>
<i>positives, false positives </i>and <i>false negatives </i>were added up across every set first
that are used to compute the statistics. On the other hand, the macro average
was calculated by calculating the precision and recall for each instance first that
is averaged over all instances in the reference standard.
Related Concepts
FI SMOG FKRI FRES FORCAST
Rank Harm onic Rank Harm onic Rank Harm onic Rank Harm onic Rank Harm onic
Mean Mean Mean Mean Mean
Inhibition-GABA 1 26.08 1 28.16 1 34.78 1 34.78 1 23.91
GABA-Synapse 2 20.25 2 20.25 2 22.78 2 20.25
Neurons-Synapse 3 14.17 4 14.70 3 14.70 5 11.76
Inhibition-Hippocam pus 4 12.30 4 10.66
Neurons-GABA 6 8.00 3 16.00 6 10.66
Prop erties-GABA 7 6.45 7 9.67 5 9.67 7 9.67
Cl-Gradient 9 3.33
Inhibition-Dentate Gyrus 6 9.37
Synapse-Change 5 9.37 8 9.37 7 9.37 4 12.50
GABA-Change 7 6.45 7 9.67 9 6.45
GABA-Numb er 8 6.34 5 12.69 6 9.52 8 9.52
Synapse-Number 8 6.55
Neuron-Input 9 6.45
Animal-Models 6 11.26 3 14.08
Neurons-Inhibition 7 9.67
GABA-Alteration 3 18.18
Study-Tissue 9 9.37
Study-Inhibition 10 8.95
Rat-Inhibition 2 20.51
Slices-Inhibition 4 12.50
Animal-Rat 5 12.30
Cortex-slices 7 9.09
Epilepsy-Rat 8 8.95
Inhibition-Kindling 8 8.95
Number-Tissue 9 6.45
Table 4: Relations extracted using the readability formulas from the papers on
Epilepsy and GABA
SMOG performed marginally better than FRES and FORCAST as most of its
meaningful relations had low harmonic mean. It is noteworthy that the ranks
and harmonic means of the relations for the first three formulas were somewhat
similar to each other—means that they extracted almost the same sentences.
In Table 2, the relations extracted using the formulas from the papers on
Ataxia and Dehydrogenase are displayed. It is surprising that all of the formulas
extracted exactly seven meaningful relations. In this case, the performance of
FI and FKRI were almost identical. On the other hand, both SMOG and FRES
Readability Formula Precision Recall F-Score
SMOG 100.00 100.00 100.00
FKRI 85.71 85.71 85.71
FRES 100.00 100.00 100.00
FORCAST 85.71 85.71 85.71
Readability Formula Precision Recall F-Score
SMOG 100.00 71.43 83.33
FKRI 100.00 71.43 83.33
FRES 100.00 71.43 83.33
FORCAST 50.00 14.29 22.22
Readability Formula Precision Recall F-Score
SMOG 96.88 82.60 89.16
FKRI 89.73 82.60 86.01
FRES 80.83 65.63 72.44
FORCAST 77.88 61.61 68.72
Readability Formula Precision Recall F-Score
SMOG 100.00 71.43 83.33
FKRI 85.71 85.71 85.71
FRES 40.00 28.57 33.33
FORCAST 100.00 71.43 83.33
(a) (b)
Readability Formula Precision Recall F-Score
SMOG 87.50 87.50 87.50
FKRI 87.50 87.50 87.50
FRES 83.33 62.50 71.43
FORCAST 75.00 75.00 75.00
(c) (d)
Table 5: Performance of the four readability formulas for the papers on (a)
Ischemia and Glutamate, (b) Ataxia and Dehydrogenase, (c) Hypogonadism
Readability Formula Precision Recall F-Score
SMOG 95.83 82.14 88.46
FKRI 88.89 82.76 85.71
FRES 82.61 65.52 73.08
FORCAST 81.82 62.07 70.59
(a) (b)
Table 6: Average precision, recall and F-Score of the formulas with (a) micro-
average and (b) macro-average methods
they are semantically related. The performance of FRES is the poorest among
the five as it extracted four pairs of concepts without semantic relations.
Table 4 shows the related concepts extracted using the formulas from the
papers on Epilepsy and GABA. FI outperformed others by extracting seven
semantically related concepts. FKRI, SMOG, and FRES extracted five related
concepts each but the performance of FKRI is better than the other two. Between
SMOG and FRES, most of the low-ranked pairs of concepts extracted using the
prior lack meaning than the latter. On the other hand, FORCAST performed
really poor in this case by extracting only two semantically related pairs.
Table 5 displays the performance of the four readability formulas—FKRI,
SMOG, FRES, and FORCAST—against the gold standard. From the table, it
precision and recall but their significanly poor recalls cost them lower F-Scores.
Information on Table 6 (b) shows similar results except that both SMOG in-
dex and FKRI achieved the best recall. From this analysis, it can be said that
SMOG index and FKRI are both performing similar to FI and thus can be used
to reduce textual noise and extract related biomedical concepts.
While FI has been used in text denoising to make it a meaningful relations
extraction tool for biomedical texts, we reported the performance of four other
readability formulas, namely FKRI, SMOG, FRES, and FORCAST on this task.
We applied the formulas to the sentences of 24 biomedical texts, ordered them
according to their reading difficulty, and extracted frequently co-occurred con-
cepts from the 30% of the low-readable sentences. These concepts were then
re-ranked according to the harmonic mean of their PPV and sensitivity. A com-
parative result shows that FI outperformed the other formulas by extracting
more meaningful relations according to UMLS semantic relation network. We
also analyzed the performance of the formulas considering the performance of
FI as a gold standard. It shows that SMOG index achieved the best F-Score
followed by FKRI while FRES and FORCAST performed poorly. It can also be
noted that SMOG index, like FI, uses the core measure of <i>complex words </i>and
its performance is the best compared to the gold standard which reveals the fact
that the measure of complex word fits best for text denoising and biomedical
relation extraction.
As for relation mining we found at least two competitive measures for text de-
noising other than FI, their performances on training data reduction for keyphrase
indexers can be of great interest. This task is left as future work.
This work was partially funded through a Natural Sciences and Engineering
Research Council of Canada (NSERC) Discovery Grant to Robert E. Mercer.
1. J. Bogert. In defense of the fog index. Business Communication Quarterly, 48:9–12,
1985.
2. J. S. Caylor, T. G. Stitch, L. C. Fox, and J. P. Ford. Methodologies for determining
reading requirements of military occupational specialities. Technical Report 73-5,
Human Resources Research Organization, Alexandria, VA, 1973.
3. T. M. Duffy and P. Kabance. Testing a readable writing approach to text revision.
<i>Journal of Educational Psychology, 74:733–48, 1982. </i>
5. R. Flesch. A new readability yardstick. <i>Journal of Applied Psychology, 32:221–33, </i>
1948.
6. O. Frunza and D. Inkpen. Extraction of disease-treatment semantic relations from
biomedical sentences. In Proceedings of the 2010 Workshop on Biomedical Natu-
<i>ral Language Processing, BioNLP ’10, pages 91–98, Stroudsburg, PA, USA, 2010. </i>
Association for Computational Linguistics.
7. E. Fry. A readability formula that saves time. Journal of Reading, 11:512–16 cont.
8. O. S. Goh, C. C. Fung, A. Depickere, and K. W. Wong. Using gunnnig-fog in-
dex to assess instant messages readability from ecas. In Proceedings of the Third
<i>International Conference on Natural Computation, volume 5 of ICNC ’07, pages </i>
480–486, Washington, DC, USA, 2007.
9. J. P. Kincaid, R. P. Fishburne, R. L. Rogers, and B. S. Chissom. Derivation of new
readability formulas (automated readability index, fog count, and flesch reading
ease formula) for navy enlisted personnel. Research Branch Report 8-75, Chief of
Naval Technical Writing: Naval Air Station Memphis, 1975.
10. K. Koenke. Another practical note on readability formulas. <i>Journal of Reading, </i>
15:205, 1971.
11. C. Manning and H. Shutze. Foundations of Statistical Natural Language Processing.
Cambridge, MA: MIT Press, 1999.
12. G. H. McLaughlin. Smog grading – a new redability formula. <i>Journal of Reading, </i>
12(8):639–46, 1969.
13. O. Medelyan. <i>Human-competitive automatic topic indexing. PhD thesis, University </i>
of Waikato, New Zealand, 2009.
14. O. Medelyan and I. Witten. Domain-independent automatic keyphrase
indexing with small training sets. <i>Journal of the American Society for Information </i>
<i>Science and Technology (JASIST), 59(7):1026–1040, 2008. </i>
15. C. Perez-Iratxeta, P. Bork, and M. Andrade. Literature and genome data min-
ing for prioritizing disease-associated genes. In F. Eisenhaber, editor,
<i>Discovering Biomolecular Mechanisms with Computational Biology, Molecular </i>
Biology Intelli- gence Unit, pages 74–81. Springer, 2006.
16. R. Shams and R. E. Mercer. Extracting connected concepts from biomedical texts
using fog index. <i>Procedia - Social and Behavioral Sciences, 27:70–76, 2011. </i>
17. R. Shams and R. E. Mercer. Improving supervised keyphrase indexer classification
of keyphrases with text denoising. In 14th International Conference on Asia-
<i>Pacific Digital Libraries (ICADL 2012), Taipei, Taiwan, 2012. </i>
18. R. Shams and R. E. Mercer. Investigating keyphrase indexing with text denoising.
In <i>Proceedings of the 11th ACM/IEEE-CS Joint Conference on Digital Libraries </i>
<i>(JCDL 2012), Washington DC, USA, 2012. </i>
Unchasa Seenuankaew
Ph.D. Candidate in Information Studies Program.
Faculty of Humanities and Social Sciences, Khon Kaen University, Thailand.
Chollabhat Vongprasert
Assistant Professor, Information and Communication Management Program,
Faculty of Humanities and Social Sciences, Khon Kaen University, Thailand.
<b>Abstract</b>
adfa, p. 1, 2011.
In the 21st century, the world is stepping closer to a knowledge-based society where
information, knowledge, and information technology are used in development, as
opposed to traditional manpower resources. Similar changes in Thailand cannot be
information culture is most often linked to information literacy and information
behavior (Gendina, 2004). For the purpose of this study, information culture is de-
fined as the attitudes, beliefs and behavior towards information ownership, infor-
mation seeking and information use. Information culture in a developing country
requires that to truly step into the information society, developing countries need to
adopt holistic approaches that are designed to cultivate a modern information culture,
and to make incremental social institutional changes, in addition to technological
innovations (Zheng, 2005).
More than half of all Thai people are agriculturalists. Indeed, agriculture is closely
related to Thai culture and the lifestyle of Thai people (Ministry of Agriculture and
Cooperatives, 2012). Agriculture continues to play an important role in strengthening
proach. The research outcomes are intended to provide a theoretical conclusion re-
garding the information behavior of farmers in Thailand. Grounded theory is the
theory that explains the understanding of a phenomenon, though, and belief from the
view of people in the phenomena. Then it conceptualizes information from those phe-
nomena in order to find the connection among concepts and therefore receive theoret-
ical conclusion of the phenomenon that needs to be explained (Hawanon et al., 2003).
In the same way, Glaser & Strauss (1976) explain that creating grounded theory is the
creating of theoretical explanation directly from information. The methodology in
creating grounded theory is developed from belief, which is to understand human
behavior and how they live together. You need to understand the process in which
people give the meaning to things around themselves because human thought and
action has basic element in the meaning of things around themselves. This methodol-
ogy focuses on the study of social phenomenon in understanding things and putting
information into concept. It also finds connection among concepts to receive theoreti-
cal conclusion of a social phenomenon. Therefore, the researchers were interested in
studying Farmers’ Information Behavior from the grounded theory. The results of this
study will be used to develop a theory to explain the information behavior of Thai
rice farmers, how they access information, how they manage changes in information,
and how the government manages the farmer’s ability to access information.
This research aims to study and understand the phenomenon of Thai farmers’
information behavior. The research objectives are as follows:
2.1 To study information behavior consisting of the information need, information
seeking, and information use of Thai farmers.
2.2To study enable that supports the information behavior of Thai farmers.
2.3To develop an information behavior model of Thai farmers, by using the grounded
theory approach.
The research method that was selected for this study is grounded theory. The aim of
this research method is building theory, not testing theory. Rather than begin a
study with a preconceived theory that needs to be proven, the researcher begins with a
general area of study and allows the theory to emerge from the data.
3.1 <b>The Study Area </b>
two provincial agricultural research officers, this is the most productive area in the
province.
3.2 <b>Entering research site </b>
Entering the research site is an important step. It is necessary to ensure that the data
collection process is complete and that true information, well understood by both key
informants and the researchers, is obtained. Interview questions have been prepared
3.3 <b>Data Collection and Analysis </b>
<i>Data Collection .</i>
Data will be collected through the use of in-depth interviews. Guidelines were
developed from the research objectives and the conceptual framework of the study.
These guidelines help to verify whether questions for the interview elicit the answers,
as indicated in the objectives or not. After that, the results were used to improve the
questions in order to get the accurate information as stated in the objectives of the
study. With regard to the selection of key informants, I will need to study the phe-
nomenon first. However, I have an approximate idea of the key informants I will
need. There are three farmers groups: subsistence farming, semi- subsistence farming,
and purely commercial farming. – to include farmers who both own or rent there land.
<i>Data Analysis.</i>
Since, I conclude regarding these steps in my data analysis: The first, after the in-
terview of the first key informant, I will transcribe the recorded interview conversa-
tion word-for-word. I will use this information to ensure that the question guidelines
are accurate. The second, the data collected from interviewing the key informants,
which will be recorded on tapes, will be used for describing the behaviors of each
case in detail. The description of each case must give a complete picture of behaviors
that respond to different situations. The third, I will read and analyze the data many
times in order to clearly understand the information relating to different behaviors
4.1New knowledge which is the theoretical conclusion of farmers’ information be-
havior in Thai society context. Moreover, developed information behavior can be
used to explain Thai farmers’ information behavior. Finally, new knowledge will be
useful for library and information science.
4.2The results of the study can be used as a guideline to develop information services
and reduce the gap in information access provided for Thai farmers.
4.3 The findings of this research will provide an alternative method of information
management for the public sector, as well as other related sectors. Furthermore, this
new methodology will more effectively and efficiently respond to the information
behavior and needs of farmers, thereby helping to stimulate and enhance the econom-
ic and social development of Thailand.
<b>Fig. 1. </b>Conceptual Framework
1. Curry, A., & Moore, C.: Assessing Information Culture: An Exploratory Model. <b>Interna- </b>
<b>tional Journal of Information Management, 23</b>(2), 91-110 (2003)
2. Gendina, N.: Information Culture in the Information Society: the View from Russia. In:
<b>Proceeding the International Conference UNESCO between Two Phases of the </b>
<b>World Summit on the Information Society. </b>Retrieved November 2012, from
(2004)
3. Glaser, B. and Strauss, A.: <b>The Discovery of Grounded Theory: Strategies of Qualita- </b>
<b>tive Research</b>. London: Weidenfeld & Nicolson (1976)
4. King, D.G. and Palmour, V.E.: “How Need are Generated ; What We Have Found out
them.” <b>In The Nationwide Provision and Use of Information : ASLIB, IIS, LA Joint </b>
<b>Conference</b>, 15-19 Sept 1980 Sheffield Proceeding, 68-79. London : Library Association
(1981)
5. Kuhlthau, C. C.: Inside the search process : Information seeking from the user's perspec-
tive. <b>Journal of the American Society for Information Science, 42</b>(5), 361-371 (1991)
6. Leckie, J., Pettigrew, E. & Sylvain, C.: Modeling the information seeking of professionals:
a general model derived from research on engineers, health care professionals and lawyers.
<b>Library Quarterly 66</b>(2), 161-193 (1996)
7. Marcella, Rita and Baxter, Graene.: The Information needs and the information seeking
behavior of a national sample of the population in The United Kingdom
8. Ministry of Agriculture and Cooperatives.: <b>Hand out for Agriculture Development Plan </b>
<b>during the Eleventh National Economic and Social Development Plan (2012-2016). </b>(in
Thai). Retrieved June 11, 2012, from docu-
ment_plan/01.PDF.
9. Hawanon, N. et al.: <b>The Accordance between grounded theory and empirical indica- </b>
<b>tors in building Community Empowerment Index. </b>(in Thai). Bangkok: PhD thesis Pro-
gram in Development Education, Graduate school, Srinakharinwirot University (2003)
10. National Information Center, Office of Permanent Secretary Ministry of Commerce. <b>Thai- </b>
<b>land’s 15 essential export goods during 2006-2010 </b>(Jan.-June). (in Thai). Retrieved July
2010, from
11. Riyaz, Aminath.: <b>The Information culture of the Maldives: An exploratory Study of </b>
<b>Information provision and Access in a Small Island Developing State. </b>Retrieved Au-
gust 3, 2012, from tin. edu.au/R/?func=dbin-
12. Cheejang, S.: Higher Education and Knowledge-based Society. (in Thai). <b>Journal of Du- </b>
<b>sit Thani College, 2 </b>(2), 19-41 (2008)
13. Spink, A., & Cole, C.: Human Information Behavior: Integrating Diverse Approaches and
Information Use. <b>Journal of the American Society for Information Science and </b>
<b>Technology, 57</b>(1), 25–35 (2006)
14. Wilson, T.D.: Human Informtion Behaviour. <b>Informing Science 3</b>,(2), 49-55 (2000)
15. Zheng, Y.: Information Culture and Development: Chinese experience of e-health. In:
Proceedings of the 38th <sub>Hawaii International Conference on System Sciences. </sub>
Retrieved
July 2012, from