spacer Medical Library
Tri-Cat Ovid PubMed e-journals FAQ Home

Finding Statistics & Data Sets

Finding Statistics in the Literature

Statistics and information about data sets appear in the biomedical literature. In MEDLINE, there are two subheadings often applied to articles containing statistics. For articles about diseases/conditions, use the subheading EP - Epidemiology For everything else, use the subheading SN - Statistics and Numerical Data. Search on the MeSH term "Databases" along with your subject to find articles where large databases were used.

Once you know the source of the data, you can search the web to see whether the data set has been updated or expanded. Many statistical publications are updated yearly.

Internet Search and Manipulation Hints

Two questions to consider before beginning your statistical search:

  • Who cares about or has a mandate to study the topic?
  • Who has the resources and staff to collect data in this topic area?

Knowing the answers to these questions will help direct you to appropriate resources.

Look for information about the:

  • File Format (HTML, PDF, Excel, text, etc...)
  • Dates of Data (not the same as the publication date of the document or page)
  • Sources of Data
  • Contact Person
  • Suggested Citation
  • Availability of Documentation
  • Data Use Limitations
  • Anything special about the data?

Many agencies publish their statistics in PDF files that require Adobe Acrobat Reader. If you are searching the web in general for statistics, be sure to use a search engine that indexes PDFs. Google (www.google.com) and AllTheWeb (www.alltheweb.com) are two examples.

It is difficult to remove data from PDF format tables if you do not have the complete Adobe Acrobat program. Tools designed for screen-reading browsers (such as the Adobe PDF Conversion Form at (http://www.adobe.com/products/acrobat/access_onlinetools.html) may be able to translate PDFs into HTML or plain text for you.

Statistical Resources and Publications

Projects Presenting Spatial Data (Geographic Information Systems)

Projects Consolidating Data Sets

  • Reported Volume for Selected Procedures Performed in New York State Licensed Hospitals and Ambulatory Surgery Centers - Center for Medical Consumers
    http://www.medicalconsumers.org/#Main_Index
    Data derived from New York State SPARCS/Ambulatory Surgery databases.

WCMC/NYP/CU Data and Statistical Expertise

Web Directories of Data Sets

  • Directory of Health and Human Services Data Resources
    http://aspe.hhs.gov/datacncl/DataDir/index.shtml
    Compilation of collection systems sponsored by the U.S. Department of Health and Human Services (HHS). Databases from continuing departmental data projects or program administrative and evaluation activities that met the criterion of broad utility were included. Such data projects and systems included recurring surveys and disease registries either maintained or sponsored by HHS. Databases from one-time studies or data collections were also included when the data may have broad interest.
  • Health Services & Sciences Research Resources (HSRR)
    http://www.nlm.nih.gov/nichsr/hsrr_search/
    HSRR is a searchable database of information about datasets and instruments/indices employed in Health Services Research, Behavioral and Social Sciences and Public Health with links to PubMed.
  • Health and Medical Care Archive - Robert Wood Johnson Foundation
    http://www.icpsr.umich.edu/HMCA/
    Sponsored data sets at Inter-University Consortium for Political and Social Research (ICPSR) also more at http://www.icpsr.umich.edu/ - Cornell is a member.

Data Sets

Tools and Software for Data Acquisition and Analysis

  • Epi Info / Epi Map
    http://www.cdc.gov/epiinfo/
    Epi Info and Epi Map are public domain software designed to provide for easy database construction, data entry, and analysis with epidemiologic statistics, maps, and graphs.
  • DataFerret
    http://www.cdc.gov/nchs/datawh/ferret/ferret.htm
    DataFerret, a collaborative effort between the National Center for Health Statistics and the Bureau of the Census, is a unique data mining and extraction tool. It allows you to select a databasket full of variables, and recode those variables as needed, and then develop and customize tables and charts. DataFerrett helps you locate and retrieve the data you need across the Internet to your desktop or system, regardless of where the data resides.
  • SAS (Statistical Analysis Software) is loaded on PCs in the Library Computer Room
  • Free Statistical Analysis Tools - Compiled by David Lane, Rice University
    http://davidmlane.com/hyperstat/Statistical_analyses.html

Background Reading & Additional Training in Finding and Using Data

Updated: September 25, 2007
© Weill Cornell Medical College
Cornell University (Ithaca) | Privacy Notice | Disclaimer
NewYork Hospital