Finding Statistics & Data Sets
Finding Statistics in the Literature
Statistics and information about data sets appear in the biomedical literature. In
MEDLINE, there are two subheadings often applied to articles containing statistics. For
articles about diseases/conditions, use the subheading EP - Epidemiology For everything
else, use the subheading SN - Statistics and Numerical Data. Search on the MeSH term
"Databases" along with your subject to find articles where large databases were used.
Once you know the source of the data, you can search the web to see whether the data
set has been updated or expanded. Many statistical publications are updated yearly.
Internet Search and Manipulation Hints
Two questions to consider before beginning your statistical search:
- Who cares about or has a mandate to study the topic?
- Who has the resources and staff to collect data in this topic area?
Knowing the answers to these questions will help direct you to appropriate resources.
Look for information about the:
- File Format (HTML, PDF, Excel, text, etc...)
- Dates of Data (not the same as the publication date of the document or page)
- Sources of Data
- Contact Person
- Suggested Citation
- Availability of Documentation
- Data Use Limitations
- Anything special about the data?
Statistical Resources and Publications
Projects Presenting Spatial Data (Geographic Information
Systems)
Projects Consolidating Data Sets
WCMC/NYP/CU Data and Statistical Expertise
Web Directories of Data Sets
-
Directory of Health and Human Services Data Resources -
Compilation of collection systems sponsored by the U.S. Department of Health and Human
Services (HHS). Databases from continuing departmental data projects or program
administrative and evaluation activities that met the criterion of broad utility were
included. Such data projects and systems included recurring surveys and disease
registries either maintained or sponsored by HHS. Databases from one-time studies or data
collections were also included when the data may have broad interest.
-
Health Services & Sciences Research Resources (HSRR) -
HSRR is a searchable database of information about datasets and instruments/indices
employed in Health Services Research, Behavioral and Social Sciences and Public Health
with links to PubMed.
-
Health and Medical Care Archive - Robert Wood Johnson Foundation -
Sponsored data sets at Inter-University Consortium for Political and Social Research
(ICPSR) also more at http://www.icpsr.umich.edu/ - Cornell is a member.
Data Sets
Tools and Software for Data Acquisition and Analysis
-
Epi Info / Epi Map -
Epi Info and Epi Map are public domain software designed to provide for easy database
construction, data entry, and analysis with epidemiologic statistics, maps, and
graphs.
-
DataFerret -
DataFerret, a collaborative effort between the National Center for Health Statistics and
the Bureau of the Census, is a unique data mining and extraction tool. It allows you to
select a databasket full of variables, and recode those variables as needed, and then
develop and customize tables and charts. DataFerrett helps you locate and retrieve the
data you need across the Internet to your desktop or system, regardless of where the data
resides.
- SAS (Statistical Analysis Software) is loaded on PCs in the Library Computer
Room
-
Free Statistical Analysis Tools - Compiled by David Lane, Rice University
Background Reading & Additional Training
in Finding and Using Data
Last Updated: September 25, 2007