The University of Arizona

ISTA 370:  Research Methods for the Information Age
Spring 2012


Instructor
Description
Schedule
Grading
Project and Assignments
Resources
Readings
Policies


Instructor

Prof. Paul Cohen (Director, SISTA)

Course Description


When: Monday, Wednesday, 10:30-11:45am, Gould-Simpson 906
Open to: All undergraduates who have the prerequisites.  Grad students are welcome to take a more challenging version of the course as independent study.
Prerequisites: ISTA 116 or a similar introductory statistics course, or permission of the instructor. 
Text:  Paul R. Cohen, Empirical Methods for Artificial Intelligence, MIT Press.
Requirements:  Term project, homeworks, exams, class participation

Have you ever heard someone say, "they have done experiments that show...," and wondered, who "they" are, how the experiments were designed, what the experiments actually show, and why you should believe them?

The Information Age runs on data, so you should know how data are gathered,  processed, visualized,  tested and modeled.  Most scientific fields teach their students these empirical methods (although, strangely, the information sciences generally do not) but the understanding of data should not be limited to scientists, any more than the understanding of art should be limited to artists. To be an informed citizen of the Information Age, you will need to understand the methods by which data are transformed into evidence and knowledge.

The best way to learn these methods is to do some empirical research, yourself.  The course requires a term project in addition to exercises and learning from lectures and the text.

Schedule


The five main parts of this course cover the nature of empirical research; data and exploratory data analysis; experiment design, including pitfalls and practicalities; classical and computer-intensive hypothesis testing, confidence and effect size; and how to build empirical models.  As this is the first time the course has been offered, it doesn't make much sense to assign strict dates to these topics.  However, the following dates are fixed:

January 16:  Martin Luther King Day, no classes
January 25:  Project Description Documents due
February 1:  Teams must be formed
February 13: Teams' written and oral presentations of protocols
February 29: Midterm Exam
March 5: Teams' oral presentations of pilot results
March 10 - 18:  Spring Break
May 2.  Last day of Class
March 19: Teams' written report on pilot results (or first-look results)
April 25: Team' oral presentation of final projects
May 2: Team' oral presentation of final projects
May 9. Final Exam 10:30am-12:30pm and teams' written presentations of final projects

Readings

The text is Empirical Methods for Artificial Intelligence, MIT Press, 1995.  It is available at the UA bookstore.  You will be responsible for the following sections:

The nature of research :  Chapter 1 through Section 1.4
Exploratory data analysis : Chapter 2 except Sections 2.2 aand 2.6.
Experiment design : Chapter 3 except Section 3.2.2,  and you can quickly skim Sections 3.2.4 and 3.3.
Hypothesis testing : Chapter 4 through section 4.5.  Chapter 5 through Section 5.3.
Confidence, power and effect size:  Sections 4.6 - 4.9
Modeling and performance assessment:  Chapter 6 through Section 6.3. Sections 6.9, 6.10.
Modeling and interactions between factors: Chapter 7 through Section 7.5. Sections 8.2 - 8.7. 


Grades and Grading


You will get three kinds of exposure to research methods in this class, so you'll be evaluated on each:

  1. Assigned readings from the text, evaluated in a midterm and final exams worth 15 points each.
  2. Assignments to give you practice.  There will be five assignments, each worth a maximum of four points.
  3. A multipart term project, worth 40 points.
In addition, class participation will be worth 10 points.  Class participation includes participating in experiments, either for this class or for related SISTA research projects. Attendance at two SISTA colloquia will be worth 5 points, total, of extra credit. 

Grades will be assigned according to this scale:  A: 90 points or more, B: 80 - 80 points, C: 68 - 79 points, D: 56 - 67 points, E: 55 points or less.

Exams

The exams will focus on the assigned readings.  This course is not designed to make you memorize dozens of formulae, so the exams will be open-book and open-notes, and will consist primarily of short answer and problem-solving questions. The use of calculators or other electronic devices is not permitted on exams and will not be necessary.

Students are expected to take the exams at the announced exam times. We give make-up exams only in extreme circumstances. The instructor decides whether a circumstance is "extreme."

Assignments

The purpose of assigments is to give you practice with empirical methods.  All the assignments will be posted on this page.  In general, assignments are due one week after they are posted.  They should be emailed to the instructor. Late assignments lose one point each day they are late, so after four days they lose all value.

The Project

Each student is required to participate in an empirical research project.  The organization of the project, including how teams will be formed, is described here

Some hints about how to think of a good project are here.

SISTA Colloquia

The  SISTA Colloquium Series features researchers from all over campus talking about computational aspects of their various disciplines.  Last term we heard from biologists, psychologists, anthropologists, geographers, and many others.  To get credit for attending the colloquium, fill out a card as you leave. 

Resources


Lecture slides.  PDFs of the lecture slides are here.  In general, the correspondence between pdfs and lectures will not be perfect, as it is impossible to predict how much or little material we will cover in each lecture.

The Project.  Some hints about how to think of a good project are here.

R Resources.  All of the lectures, most of the assignments, and the project will require you to use an open-source statistical package called R.  R has been adopted by SISTA as its standard for courses that have statistical content, so if you are a SISTA student, you will have to use R for some of your courses.  For this reason, and because the formats of results given by other statistics packages are so variable, we will not accept assignments in which statistical computations are required unless the work is done in R.

We  recommend a free IDE for R called RStudio.  It takes moments to install and will save you hours of struggle!

If you don't already know R, then you can learn enough to get started by downloading RStudio and working through the examples in the first 28 pages of W.J Owen's R GuideWilliam King's introductory R web site is straightforward and well-written.  A more complete treatment is Kearn's Introduction to Probability and Statistics Using R.  Also strongly recommended are the Lab documents developed by Colin Dawson and Derek Green for ISTA 116.  Keep in mind that R documentation is available through the RStudio IDE, and that much of that documentation is written in terms of data sets that R loads automatically (if not, just load the package called datasets), so it is relatively easy to follow examples. 

Data Sets.  All data sets for the course are either packaged with R or can be found here.

Sweave.  All the lecture slides are developed in Sweave, a framework that integrates R and Latex code into a single file.  Basically, a Sweave file runs R code and integrates the results into a Latex document to be formatted by the Latex engine.  Sweave is a terrific way to generate homework and project reports, and to help assure the reproducibility of research, but it is not required.  For those students who already know Latex and want to try Sweave, check out the Sweave web site and the short Sweave Manual.  Also, the ISTA 116 Lab notes are developed in Sweave, so you can see many examples of .Rnw files -- the ones that combine R and Latex -- there.

Miscellaneous Class Policies

Academic  (a.k.a., Cheating):

The university rules governing cheating can be found at:

See Also: • The University of Arizona Code of Academic Integrity: http://deanofstudents.arizona.edu/codeofacademicintegrity

• The Arizona Board of Regents list of Prohibited Conduct: https://azregents.asu.edu/rrc/Policy%20Manual/5-303-Prohibited%20Conduct.pdf

• The Arizona Board of Regents Student Code of Conduct: https://azregents.asu.edu/rrc/Policy%20Manual/5-308-Student%20Code%20of%20Conduct.pdf

Miscellaneous University Policies: