- QUICK LINKS
- How to enroll
- Career services
What is a scholarly source? Examples, resources and more
By Laurie Davies
At a glance
- A scholarly source discusses research in a particular academic, clinical or scientific field.
- Using scholarly sources lends credibility, authority and impartiality to your research.
- A process called peer review is considered the gold standard in scholarly or academic sources.
- At University of Phoenix, the Research Center Enterprise helms multidisciplinary research in areas such as leadership, workplace diversity and other real-world issues.
Whether you’re working on a college paper, a corporate annual report or a blog post, your credibility can hinge on the sources you choose to research and substantiate your claims. There’s a big jump from a Twitter thread to a scholarly source.
What makes a source scholarly? Read on to learn how to tell if a source is scholarly. Plus, learn how to find these sources, discover why you’d use them, and hear from academics who have written them.
What is a scholarly journal?
A scholarly journal — also sometimes called a scholarly source or an academic journal — presents and discusses research in a particular academic, clinical or scientific field.
Examples of scholarly sources are:
- Conference presentations
- Video lectures
“When I think of scholarly material, I think it’s essentially written by scholars for scholars,” says Shawn Boone, EdD, associate dean of research at the College of Doctoral Studies at University of Phoenix (UOPX).
There you have it! Scholarly sources defined.
But wait. Finding trusted and quality sources can be intimidating. Don’t worry. A University of Phoenix faculty member who writes scholarly articles offers hacks for how students and non-scholars can make journals work for them.
First, however, another definition is needed.
What is a peer-reviewed source?
Often scholarly journals are peer-reviewed. A peer-reviewed source is one that’s been vetted (reviewed) by other experts (peers) in the field.
Peer-reviewed journals are also sometimes called refereed journals. In this case the “referees” are reviewers who are tasked with filtering out poor quality, flawed methodology and a lack of rigor.
According to Wiley , a publisher of peer-reviewed journals, the peer review process is designed to assess the validity, quality and originality of articles for publication.
Boone, who has both published scholarly articles and served as a peer reviewer, looks for these criteria when he’s reviewing:
- Rigor in design strategy
- Continuity of theory
- Absence of confirmation bias
- Writing quality
The process of peer review is not without criticism, namely that peer reviewers sometimes reject innovative ideas, thus potentially leading to conformity of thought. Plus, in the case of something new like COVID-19, researchers are tasked with building the plane while they’re flying it — conducting research on a phenomenon about which little is known.
Despite flaws, peer-reviewed publications are widely considered the gold standard among scholarly sources.
Examples of peer-reviewed sources are:
- Journal of Leadership Studies
- The Journal of Higher Education
- Journal of Educational Supervision
- JAMA (Journal of the American Medical Association )
- The New England Journal of Medicine
Ready to dive into the world of research through a doctoral program? Here are five things to know before you start.
Why use scholarly sources?
Credibility: If you’re a student writing a research paper, scholarly sources help establish credibility.
Authority: A scholarly source can lend more authority than a news report or book. While a journalist or author might interview experts, a scholarly source actually is an expert.
Impartiality: A scholarly source offers findings that have been authenticated and should be free of confirmation bias.
This latter point is critical, says Rodney Luster, PhD, a widely published researcher, a regular contributor to Psychology Today , and chair of the Center for Leadership Studies and Organizational Research at UOPX.
“We’re all passionate about the things we want to write about,” Luster says. “If we’re not careful, confirmation bias — interpreting new findings as confirmation of our beliefs — can creep in.”
True scholarly sources don’t allow this to happen.
How to use scholarly sources
So, maybe you’re convinced. Scholarly sources are the way to go next time you’ve got a research-based project to submit.
But how in the world do you cite them? After all, if you’re like most people, terms like regression analysis, research methodology and theoretical constructs are enough to make the eyes glaze over.
Luster has good news. Three basic components of scholarly research may offer the takeaways you’ll need to effectively (and intelligently!) cite scholarly sources:
- The title . Often the major finding or idea is expressed here.
- The abstract . A summary of the research, an abstract conveys the starting point, what researchers were looking for and what they concluded.
- The conclusion . The researchers explain what they found, perhaps even telling the industry what needs to happen (e.g., action or more research).
How to tell if a source is scholarly
If you’re wondering how to tell if a source is scholarly, these characteristics are shared by scholarly references:
- The source informs or reports on research or ideas (rather than attempting to sway opinion or entice the reader to purchase a product).
- Authors are clearly identified, and they have authority or expertise in their field.
- Sources are always cited, usually in an extensive bibliography.
- Methodology is outlined.
It’s important to note that not all journals are scholarly. Some are “predatory,” meaning they require authors to “pay to play” — they charge a fee for authors to have their research published. Avoid these. You can spot them by looking for the publication’s submission requirements.
(Note: “Pay for play” is different from an “open-access” article, which is when the author pays a fee to allow the article to be accessible to the public rather than accessible by subscription only.)
Most scholarly sources offer clues about their validity. Look for these criteria:
- The masthead or journal description says “peer-reviewed.”
- Journals request three copies of submissions (likely to go to peer reviewers).
- Researchers in that field write the articles.
- References are clearly listed in a bibliography.
- Journal articles generally follow this format: abstract, literature review, methodology, results, conclusion, references.
- There’s no advertising.
Examples of scholarly sources
With scholarly source websites, it’s easier now than ever before to find the research you need to support your project.
Google Scholar is a powerful resource for finding scholarly sources in your area of interest. Enter “headaches,” and 824,000 articles will appear in 0.03 seconds. (That actually kind of triggers a headache, doesn’t it?)
If you’re a student looking to write a well-informed paper sourced by experts, other tools can help. Here are some ideas:
- Check the bibliographies of books or articles in your area of interest.
- Search digital libraries and publishers, such as JStor , ProQuest , Emerald and Wiley .
- Check the University of Phoenix Research Hub , which lists peer-reviewed journals and publishers in education.
- Explore links to a growing body of research produced by UOPX scholars from the Center for Leadership Studies and Organizational Research, the Center for Educational and Instructional Technology and the Center for Workplace Diversity and Inclusion.
Frequently asked questions about academic sources
What is a scholarly source.
A scholarly source presents and discusses research in a particular academic, clinical or scientific field. It does not attempt to persuade to an opinion, and it does not encourage readers to purchase a product.
A scholarly journal publishes scholarship related to a particular field (e.g., medicine) or academic discipline (e.g., leadership studies). Peer-reviewed scholarly journals provide extra scrutiny of articles for quality and validity.
Is .org a scholarly source?
No. Often websites ending in .org may be credible. Generally, however, .org sites are nonprofit entities with a specific mission. Nonprofit entities with a .org domain might lead you to scholarly sources if they cite studies with a list of authors.
Is NPR a scholarly source?
No. NPR and other news agencies report the news, sometimes with bias. They may interview experts, but a true scholarly source will be written by an expert.
How do I use scholarly sources?
Scholarly sources are generally written for other scholars, but don’t let that deter you from mining them and citing them. The abstract and conclusion sections may lend solid information to your project.
University of Phoenix offers a workshop called Dissertation to Publication for students interested in publishing their doctoral dissertation in a peer-reviewed journal. Learn more .
3 ways to jump-start your doctoral experience.
September 28, 2022 • 5 Minutes
Study Buddy: 6 Ways to Create a Focus-Friendly Space
February 23, 2022 • 5 Minutes
How to Take Teacher Feedback
August 23, 2023 • 5 minutes
About University of Phoenix
Rise like a Phoenix
As pioneers in online higher education since 1989, University of Phoenix is an accredited online university for working adults. We are proud to offer quality educational pathways through flexible, career-focused online degrees, certificates and professional development courses that fit into your life and options to save you time and money. Our students are supported every step of the way, including career services for life.
Let us help you take the most direct path to your future career goals. We’re ready when you are.
More than 100 online programs aligned to 300+ careers.
Online courses and certificates
Explore professional development and earn credentials.
Ways to save
Learn ways you can save as you pursue your goals.
Let us help you jumpstart your goals. Connect with us.
Click through the PLOS taxonomy to find articles in your field.
For more information about PLOS Subject Areas, click here .
The Number of Scholarly Documents on the Public Web
Affiliation Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, United States of America
* E-mail: [email protected]
Affiliations Computer Science and Engineering, The Pennsylvania State University, University Park, Pennsylvania, United States of America, Information Sciences and Technology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- Madian Khabsa,
- C. Lee Giles
- Published: May 9, 2014
- Reader Comments
The number of scholarly documents available on the web is estimated using capture/recapture methods by studying the coverage of two major academic search engines: Google Scholar and Microsoft Academic Search. Our estimates show that at least 114 million English-language scholarly documents are accessible on the web, of which Google Scholar has nearly 100 million. Of these, we estimate that at least 27 million (24%) are freely available since they do not require a subscription or payment of any kind. In addition, at a finer scale, we also estimate the number of scholarly documents on the web for fifteen fields: Agricultural Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary, as defined by Microsoft Academic Search. In addition, we show that among these fields the percentage of documents defined as freely available varies significantly, i.e., from 12 to 50%.
Citation: Khabsa M, Giles CL (2014) The Number of Scholarly Documents on the Public Web. PLoS ONE 9(5): e93949. https://doi.org/10.1371/journal.pone.0093949
Editor: Ren Zhang, Wayne State University, United States of America
Received: October 7, 2013; Accepted: March 10, 2014; Published: May 9, 2014
Copyright: © 2014 Khabsa, Giles. This is an open-access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was partially funded by the National Science Foundation, grants 0958143, 1348712, and 1143921. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. There has been no additional external funding received for this study.
Competing interests: The authors have declared that no competing interests exist.
Many researchers and academics are concerned about the extent to which academic and scientific documents are available on the web, as well as their ability to access them. For convenience, we will refer to all academic and scientific documents as “scholarly”. By scholarly documents, we mean journal and conference papers, dissertations and masters theses, books, technical reports and working papers. Patents are excluded.
The web has become a standard resource for such documents because individual authors, academic and research publishers, and repositories have made their documents available online, with some open to the public and others limited to subscribers.
Numerous databases and search engines such as Google Scholar and CiteSeer track scholarly documents and thus facilitate research. However, the coverage of some of these search engines and databases is unknown. An important question that a scholar or researcher might ask is whether a single search engine or database is sufficient to obtain comprehensive results in a particular field. For example, Web of Science reported that as of January 2013 it comprises more than 49.4 million records  , and Microsoft Academic Search (MAS) stated that it covers 48.7 million documents  . However the size of Google Scholar is unknown despite studies that have tried to determine the extent to which Scholar's citations overlap with those of other citation indices  ,  . Relatively smaller digital libraries and databases, such as CiteSeer and PubMed, tend to focus on documents from certain fields, most of which are also indexed by large search engines such as Google Scholar and MAS. Bjork et al.  estimated the number of published papers in 2006 to be roughly 1.35 million, whereas a similar estimate for 2011 put the number at 1.8 million  . But despite the availability of per year estimates, researchers have yet to provide an estimate of the total number of published scholarly documents.
Estimating the number of scholarly documents available on the web is quite different from estimating the size of the web itself, and thus presents different challenges. Studies that offer estimates of the size of the web such as Lawrence and Giles  ,  , Bharat and Broader  , or Dobra and Fienberg  can not be used to estimate the number of scholarly documents on the web for many reasons. For example, search engines are no longer receptive to automated requests for fear of denial of service attacks or reverse engineering of their ranking function. Checking that a document indexed by search engine A is also available in the index of search engine B is nontrivial. To estimate the size of the web, one strategy would be to check whether a particular URL is available in both engines. However, in the case of scholarly documents the search engines might not have obtained their copies from the same location since the same document might be available at different URLs. Therefore, it is necessary to explore the content of the document and not just the location from which it was obtained. Even when a search engine returns the location of a certain document, it could be that the publisher offers full access to subscribers only and has a limit on the number of downloads allowed per day, thus making automated methods impractical. Finally, many publishers restrict access for many web crawlers.
Estimating the Number of Scholarly Documents on the Web
To estimate the number of scholarly documents on the web, we use the relative size of two major academic search engines: Google Scholar (Scholar) and Microsoft Academic Search (MAS). We note that our estimates are limited to English documents only. We used the option offered by Google Scholar of filtering results by language, whereas for MAS we ran a language detection algorithm on the title of each document. Only those identified as English were used. Our approach can be described as follows. Assuming that each academic search engine would sample the web independently for papers, then each index would contain a subset of available documents. Next, we considered each search engine to be a random capture of the document population at a certain time. Using the intersection of these two captures, we estimate the entire size of the population. However, since obtaining the database of both academic search engines was not feasible, we approximated the overlap by randomly sampling from each search engine and then determining the size of overlap in the random sample. The simplest approach for sampling from two search engines is to send queries to each and then measure the overlap of the results. This approach was used by Lawrence and Giles  ,  and by Bharat and Broader  . However, it is known to suffer from many biases and statistical dependencies. To mitigate the effect of bias and dependence and to obtain a selection that was as random as possible, we sampled from each academic search engine with the following methodology: if we choose a random paper p that is in the database of an academic search engine, then the set of papers S that cite p is a random collection from this search engine. If we collect the set of papers citing p from both Google Scholar and MAS, then the overlap between these two is an estimate of the overlap between the two search engines. This method provides a good estimate of the coverage of each search engine because when an academic search engine builds its database by indexing a new document, it has no knowledge of the incoming citations to this document. Therefore, the search engine has to obtain all the available manuscripts and analyze them in order to determine whether there are any citations to a target paper. In contrast to references, which the search engine can extract from the document and try to obtain a copy of each referenced item, incoming citations are not embedded with a document. Hence, to build a complete citation network, it is necessary for a search engine to obtain all the available scholarly documents. The more documents the search engine obtains, the larger its citation network.
Based on the methodology described, we chose 10 documents from each of the fifteen fields specified by Microsoft Academic Search: Agriculture Science, Arts and Humanities, Biology, Chemistry, Computer Science, Economics and Business, Engineering, Environmental Sciences, Geosciences, Material Science, Mathematics, Medicine, Physics, Social Sciences, and Multidisciplinary. The list of papers used as queries for which we retrieved the collection of incoming citations was randomly chosen from the most cited documents in each field. Special care was taken in regard to choosing documents because search engines impose a limit on the maximum number of retrievable results. Therefore, the chosen documents each had fewer than 1,000 citations in Scholar and likewise fewer than 1,000 citations in MAS.
- PPT PowerPoint slide
- PNG larger image
- TIFF original image
We argue that this estimate is a lower bound of the number of scholarly documents on the web because the likelihood that a document is in an academic search engine given that it was found in another academic search engine, is larger than the likelihood that any given document is indexed by an academic search engine. Although we designed our experiments to mitigate any possible statistical dependence by relying on citations instead of query results, the experiments do introduce a bias against documents with more than 1,000 citations. Search engines impose a restriction on the number of retrievable results for all type of queries, unless an Application Programmable Interface (API) is provided. Hence, any study based on sampling from a search engine, regardless of the approach, would encounter this bias. For our study it is relevant to note that Google Scholar at this time does not provide an API.
Using the statistics calculated above, we estimated Google Scholar to have 99.3 million documents, which is, approximately, 87% of the total number of scholarly documents found on the web. This percentage is close to the 86% reported by Norris, Oppenheim and Rowland  when they tested the coverage of Google and Google Scholar for finding Open Access documents. With this estimated size, Google Scholar is more than twice as large as the nearest alternative, as MAS and Web of Science are both reported to have fewer than 50 million records. However, we estimate that Scholar fails to index 13% of all web accessible documents. This implies that it is necessary to search across multiple search engines in order to retrieve a comprehensive list of results. The relative size of each database/search engine is depicted in Figure 2 .
Total and Google Scholar are estimates.
Field Level Analysis
In addition to computing statistics about the total number of scholarly documents on the web, we can reinterpret the experiments at the field-scale, making it possible to obtain estimates of the size of each of the fifteen scholarly fields defined in MAS. To obtain these estimates, we assumed that a paper and its citations belonged to the same field. Though this assumption does not always hold, we assumed that it would be a good approximation to the number of citations within a discipline. We also noted that it is possible for some papers to be classified into multiple fields especially in closely related fields, e.g. engineering and mathematics. Nevertheless, as the number of citations grew for a given paper, we anticipated more papers from the same field would cite it.
Using the classification provided by MAS, and the number of papers reported in each field, we used the 10 queries in the experiments for each field to compute the overlap between Scholar and MAS in that particular field. Table 1 reports the estimate of the total number of available documents using the procedure described above (method #1 in the table).
Another interesting estimate is the percentage of scholarly documents on the web that are freely available, i.e. can be accessed without paying a fee or needing a subscription. We used Google Scholar to estimate this percentage because Scholar provides a direct link to the publicly available document next to each search result where a link is available. Note that there is no easy way to distinguish between publisher's links and public links in MAS. As our estimate found that Scholar contains only 87% of the available scholarly documents on the web, our estimate of the percentage of public documents is limited to the coverage of Scholar. However, this is still a good indicator of the relative availability of publicly available documents. To estimate the percentage of publicly available documents for each field, we randomly sampled 100 documents from MAS belonging to each field such that each document had at least one citation. We imposed a citation limit to filter out documents that are collected by MAS that were not real scholarly documents (although it is rare to find such documents, they nevertheless exist). Then, each of the 100 documents was searched on Google Scholar to establish whether the document was freely available on any site. The percentage of freely available documents for each field is reported in Table 2 . In the last two columns, we multiply the estimate of the percentage of freely available documents by the size estimate of the field in Table 1 (method #1), resulting in the total number of freely available documents in that field.
It would be interesting, however, to determine the quality of these freely available documents. It is also worth pointing out that this estimate of 24% for the percentage of publicly accessible scholarly documents is a bit higher than the 15–20% documents estimated to be self-archived  ,  .
Note here that our sampling is uniform, because we retrieved the document IDs of all the documents in each given field from MAS, then uniformly chose 100 that conformed to the citation sampling restriction. To the best of our knowledge, this is the only uniform sampling method for estimating the percentage of freely available scholarly documents. The numbers reported in Table 2 differ from other recent estimates in regard to the number of documents available on the web as open access, e.g. Bjork et. al.  . We believe this difference arises from the sources from which they sampled. For the other recent estimate researchers considered only journals over the period of one year, whereas our definition of scholarly documents is not limited to journals and sampling was cumulative, i.e. not limited to any time period. Compared to other kinds of publications, journal publications are more likely to be indexed by databases such as Web of Science  . However, other documents such as conference proceedings and technical reports, though influential may not be indexed by Web of Science. As an example, the famous PageRank paper  , which presents the seminal algorithm for Google ranking was published as a technical report. Therefore, Web of Science does not index it.
In summary, the lower bound estimate of the number of scholarly documents, published in English, available on the web is roughly 114 million, of which Google Scholar covers nearly 87%, approximately 100 million documents. Therefore, it would be useful for researchers to consider as a standard practice querying multiple databases and academic search engines in order to gain the most comprehensive result for their query. Also, we estimate that almost 1 in 4 of web accessible scholarly documents are freely and publicly available. Our estimates for specific academic fields differs significantly, such that some fields have 4 times greater percentage of freely available documents than others.
Appendix providing size estimates using other methods.
We gratefully acknowledge comments and suggestions from D.R. Hunter, E.A. Fox, L. Rokach, and the reviewers.
Conceived and designed the experiments: CLG MK. Performed the experiments: CLG MK. Analyzed the data: CLG MK. Contributed reagents/materials/analysis tools: CLG MK. Wrote the paper: CLG MK.
- 1. Web of Science fact page. Available: http://wokinfo.com/realfacts/qualityandquantity/ .
- 2. Based on the statistics reported at the homepage of Microsoft Academic Search as of January 10, 2013. Available: http://academic.research.microsoft.com .
- View Article
- Google Scholar
- 10. Dobra A, Fienberg SE (2004) How large is the world wide web. Web Dynamics: 23–44.
- 13. Hogg R, Tanis E (2010) Probability and Statistical Inference. Pearson/Prentice Hall.
- 17. Page L, Brin S, Motwani R, Winograd T (1999) The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab.
In order to continue enjoying our site, we ask that you confirm your identity as a human. Thank you very much for your cooperation.