IntroductionThe information that is recommended to the
IntroductionThe scientific literature has been one of the main reasons of knowledge dissemination. It is also one of the cause of increasing amount of articles. Helping users to efficiently locate relevant and high-quality scientific information is a major focus of the information retrieval and digital library communities (Avancini et al., 2007; Bollacker et al.
, 2000; Hwang et al., 2003, 2010).The numerous recommender systems employ various methods to filter information in a systematic and transparent manner so that the personalised information that is recommended to the user is pertinent to his or her interests (Huang et al., 2004; Jannach et al., 2010; Ricci et al.
, 2010).Recommender systems using these approaches have been shown to greatly help users locate information of interest (Huang et al., 2004 ; Jannach et al., 2010; Ricci et al., 2010). Free digital libraries on the internet means that users can search or browse articles without having to identify themselves. Even with proprietary digital libraries it is difficult to track the long-term browsing behaviours of individual users, since many subscribers use site subscriptions when browsing. Proposed ApproachesWhen a user visits the article database, he or she can specify a particular task profile by including one or more articles of interest.
Alternatively, a user’s task profile can be implicitly derived from his or her recent browsing and access logs. The relevance calculation step involves applying the information retrieval, common citation analysis, and coauthor relationship analysis techniques to the collected articles. The obtained relevance data are used in the rank vector calculation step to generate a rank vector for each article (Rong Lin, 2003).
Rank vector calculation moduleArticles that cite one another can be defined as an article citation network in which each node represents an article and each directed edge represents a citation of that article by another article (Shiou Yang, 2003).An article citation network can form the basis of scholarly assessment and, as noted by various researchers (Bollen et al., 2006; Chen et al., 2007; Ma et al., 2008; Palacios-Huerta and Volij, 2004; Walker et al., 2007; Yan and Ding, 2010). Citation is also the method for us to use to avoid plagiarism.
Relevance calculation moduleThe relevance calculation module uses textual, citation, and author information pertaining to articles to determine the relevance between articles (Rong Lin, 2003). Two articles that refer to similar articles will often discuss the same concepts, methods, or experiments in a field, and hence are likely to be topically related (Bollacker et al., 2000). In addition to content and citations, who the author(s) are also represents important information about the articles. An article may be of particular interest to the user if its authors are professionally close to the authors of the set of articles included in the target task profile. Therefore we also estimated the relatedness of articles by applying a coauthor relationship analysis technique (Hwang et al.
, 2010). EvaluationBased on the article, there are six categories of information were identified and stored for each article: title, author(s), keywords, abstract, reference list, and full text (in PDF/PS format). An article citation network was constructed consisting of 396,036 nodes representing all of the included articles and 4,003,924 edges representing the citations (references) associated with these articles. A coauthorship network was also constructed, which consisted of 175,637 nodes representing all the article authors and 3,512,742 edges representing the coauthor relationships between authors. The relevance calculation procedures described earlier were applied to the articles. ConclusionThe proposed approaches combined information retrieval, common citation analysis, and coauthor relationship analysis techniques to find relevant and high-quality articles. This work could be extended in several directions (Shiou Yang, 2003).
First,only collected articles published between 2000 and 2006. Having a longer collection period might make it possible to observed the ageing effects of citation networks. Second, the present study evaluated only the performance of the proposed approaches in terms of the recommendations made. It would be useful to evaluate user assessments of the system in order to further improve our recommendation strategy.
Finally, the present study integrated textual, citation, and author information pertaining to articles to make recommendations.