
| Statistical Research Design and Analysis (Stat4013/7009) Copyright information, course outline and references University of the Witwatersrand School of Statistics and Actuarial Science Course notes for STAT4013/STAT7009 Statistical Research Design and Analysis or ?Using statistical methods for those who hate stats? JS Galpin 2007 Note: This material may be freely used provided it is referenced as being the intellectual property of the University of the Witwatersrand, and has been developed by JS Galpin. Statistical Research Design and Analysis (STAT 4013/7009) Course aims and format for 2007 Note: This course is offered at both honours and masters level. It consists of weekly lectures and assignments, an exam, and a major project / long essay. The same set of notes serve as reference to both groups, as these notes are intended to assist students in their future studies and work. The exams for the two levels are different, with Masters level students being asked more in-depth questions, and expected to show more insight into the methods. They are also expected to produce a project with higher level statistical components than those of honours students. The project should ideally be the stats component of the student?s honours project or masters research report / long essay. AIMS The aim of this course is to introduce participants to the statistical way of thinking, and to provide sufficient background to statistical terminology and procedures that many research projects may be tackled without recourse to expert statisticians. On completion of the course, participants should be able to:
In view of the last objective, several sections are included in the notes that sketch the background for a term often used in the literature, without giving details of the method, or examples of its application. Thus several methods currently more prevalent in research articles than in textbooks are included. Although some of the participants attending this course may have completed previous statistical courses, many have not. For this reason, no statistical knowledge is assumed, and all concepts will be introduced. There will be no attempt at derivation of procedures or formulae. These may be found in the list of recommended books. The course is essentially a practical course, aimed at issues such as
rather than the nuts and bolts of how the techniques function, or the statistical reasoning of why they should work. In line with this, the weekly group assignments are aimed at implementation and interpretation of material covered in lectures. COURSE FORMAT The course comprises a double lecture and an optional tutorial period each week, together with a weekly hand-in group assignment. Students will also be expected to individually complete a major assignment describing the design and analysis of a study of interest to them. This is usually their major project / research report etc for the course for which they are enrolled. The marks for the course will consist of 50% from the May examination, 10% from the class assignments, and 40% from the major assignment. (a) Lectures The course will be run with two parallel groups, a Monday class aimed at SEBS and other CLM students, and a Wednesday class aimed students from APES and other faculties and occasional students (main group). Although both groups will cover certain core sections, other sections differ, and the examples and order in which the material is covered may also differ. All students are welcome to attend any lectures. Lectures for the Business Economics group will be held on Mondays from 14h15-16h00, and those for the main group will be held on Wednesdays from 15h15-17h00, in CB248. Lectures will start on 12/14 February, and continue till 21 / 23 May, with recap lectures on Thursday 31 May (9h30-13h00, main group) and Monday 4 June (9h30 ?13h00, CLM students). (b) Weekly assignments and tutorial period After each double lecture, students will be expected to complete an assignment relating to the material covered in class that week. These may be completed by groups of up to 5 students working together. A tutorial session will be offered on Thursdays between 08h00-13h15, in the Math Sciences Laboratories on the 2nd floor of Senate house. Assignments generally take a maximum of 2 hours to complete, and you may come at any time during this period. The length of the tutorial session is to allow students to fit this in between other lectures and labs, and also to allow students and tutors more time for discussion of statistical problems arising in their research, as well as discussion of problems with course work and assignments. The assignments may be handed in during the lab session or at at CB141. Deadlines are 14h00 (CLM students) on the Friday, and for other students on the following Monday (12h00). There is no penalty for early hand-in! Completion of the class assignments will require the use of a computer package. There are a number of statistical packages available - you will need to be able to handle multiple regression and analysis of variance as a minimum requirement. Some spreadsheet programs, such as Excel, can handle some of the statistics required, and may be used for those purposes, but these packages can in general not handle all required methods, such as residual plots and the diagnostics required for multiple regression. Packages used in some departments are SAS, Stata, Statistica, SPSS and Systat. The package used in the lab will be R (a freeware package), and instructions on the use of R will be given. The package is downloadable from http://www.r-project.org or from the Stats and Actuarial Science website. In order to access the package, see option R notes. The hints for each week?s assignment will be put on the website, on the Wednesday, and handed out at the tut on the Thursday. (c) Exam The examination for the course will be a 3 hour OPEN BOOK examination, held on 6 June 2007 at 14h00. The examination will be aimed at examining understanding of principles, and of what procedure to use when, and why. Interpretation of computer output (similar to that in these notes and in the weekly assignments), will be expected. Very few calculations will be required in the examination - these will be stated in the recap sessions on Thursday 31 May (9h30-13h00, main group) and Monday 4 June (9h30 ?13h00, CLM students). These sessions will provide an overview of the work, handle questions on the material, and talk through a previous exam paper. Not all material is applicable to the different groups of students. Examinable sections will be noted during lectures, and in the recap sessions. The exam will consist of 3-4 sections. The first will cover the material common to all students. The other sections will cover material given to subgroups only, and students may attempt only ONE of these sections. (d) Project/ Major Assignment Students will also be expected to individually complete a major assignment describing the design and analysis of a study of interest to them, and is usually their major project / research report etc for the course for which they are enrolled. The hand-in date varies by school, with SEBS and APES setting their own deadlines. For other students, the 1st hand-in date will be 20 September 2007. Students handing in by this date will be allowed to make corrections to their projects. The second hand-in date will be 5 November 2007. Although an extension may be granted, this will be under the condition that students accept that the projects may not be marked before April 2008, or even later. This should describe a research study in which the student is involved, covering all stages from problem definition, data collection, analysis of the data, and synthesis of the results. It is hoped that this will be one of the projects to be done for your degree, otherwise a project concerning an issue of interest to you. For honours and masters students, the statistical section of your major project would be ideal. For students whose honours projects do not involve statistics, other data will have to be used - please discuss this with your lecturers. The write-up should:
It should be written in such a way that a non-expert in your research area can understand what the problem is, what you have done, and why. Remember that marks can only be given for what is written and motivated, not for what you assume the examiner will deduce or what you thought. The report should not be a historical review of what has been done, but should be aimed at outlining the problem and the questions being asked, motivating and discussing the methods of data collection and analysis used, and drawing conclusions from the data. You will need to include enough printouts so that I can see if you have interpreted them correctly. You do not need to include your data. Try not to copy and paste descriptions of multiple analyses. For example, if you are going to do identical tests for 20 variables, describe the 1st one in detail, then do a summary table for all 20. Please try to set the font size of tables so that I can read the numbers, and that they fit on 1 line. For example: use
Do NOT forget to put your name, student number, contact details and A TITLE on the front page. (e) Notes The notes cover the material included in the course in more detail than is possible in the lectures. Not all sections are of interest to every student. Some sections are not of interest to you now, but may be of interest in later years. Details of the calculation of the formulae for some specific data sets are included in the notes, partly to illustrate the procedures, and partly for those students who are unhappy with blindly trusting the computer. Although almost all computations needed for this course are to be done by computer, the worked examples in the text are also there to serve as a reference for later use when a package may not always be available. FORMULAE ARE NOT EXAMINABLE. YOU DO NOT HAVE TO UNDERSTAND THE FORMULAE TO PASS THE COURSE. They will seldom be used in lectures. Further details on the material covered may be found in the list of recommended books. These notes are downloadable from this page under STAT4013/7009. Notes are updated every year, so that you can access revised sections of the notes should you wish. (f) Lecturer and contact details for consultation Course lecturer: Prof J Galpin, contact via Mrs Maud Manuel on 717-6277, CB152, or jacky@galpin.co.za. You are welcome to make an appointment to see me, with Mrs Manuel ? as I am Head of School I am not able to give fixed consultation hours. LIST OF RECOMMENDED BOOKSGeneral, introductory: Anderson, TW and Finn, JD (1996). The new statistical analysis of data. Springer. Brown, BW and Hollander, M. (1977) Statistics: A Biomedical introduction. Wiley. Campbell, RC. (1989) Statistics for biologists, 3rd edition. Cambridge University Press. D?Agostino, RB, Sullivan, LM and Beiser, AS. (2006). Introductory Applied Biostatistics. Duxberry. Dunn, OJ. (1977) Basic Statistics. A primer for the Biomedical Sciences. Wiley. Maths Green. Sampling design and statistical methods for environmental biologists. Iversen, GR and Gergen, M. (1997) Statistics: the conceptual approach. Springer. Ludwig, JA and Reynolds, RF. (1988) Statistical ecology: a primer on methods and computing. Wiley. Moore, DS. (1991) Statistics: Concepts and controversies. WH Freeman. Moore, DS, and McCabe. (1993) Introduction to the practice of statistics. WH Freeman. Mulhotra, NK. (1993) Marketing Research: an applied orientation. Prentice Hall. Pagano, M and Gauvreau, K. (1993) Principles of Biostatistics. Duxberry. Rosner, B. (2006) Fundamentals of Biostatistics. 6th Edition. Duxberry. Sokal and Rohlf. (1992) Biometry. WH Freeman. Woolson, I. (1987) Statistical methods for the analysis of biomedical data. Wiley. Zar. Biostatistical analysis. General, more advanced: Bethea, et al. Statistical methods for engineers and scientists. Marcel Dekker. Fleiss, JL, Levin, B and Paik, MC. (2003) Statistical methods for rates and proportions. 3rd edition. Wiley. Fleiss, JL. (1986) The design and analysis of clinical experiments. Wiley. Keeping, ES. (1962) Introduction to statistical inference. van Nostrand. Sample size calculations: Cohen, J. (1977) Statistical power analysis for the behavioural sciences. Academic Press. Odeh and Fox. Sample size choice. Marcel Dekker. Sampling: Cochran, WG. (1977) Sampling techniques. Wiley. Kish, L. (1965) Survey sampling. Wiley. Reliability and validity: Anastasi, A, Urbina, S and Anasasti, A. (1997). Psychological Testing. Prentice Hall. Nonparametric statistics: Conover, WJ. (1971) Practical Nonparametric Statistics, Second edition, Wiley. Edginton, ED. (1995). Randomization tests, 3rd Edition. Marcel Dekker. Experimental design: Box, GEP, Hunter, WG and Hunter, JS. (1978) Statistics for Engineers. Wiley. Cornell, JA. (1990) Experiments with mixtures. Second edition. Wiley. Jones, B and Kenward, MG. (1989) Design and analysis of cross-over trials. Chapman and Hall. Mason, RL, Gunst, RF, and Hess JL. (1989) Statistical design and analysis of experiments. Wiley. Montgomery, DC. (1986) Design and analysis of experiments. Second edition. Wiley. Neter, J, Kutner, MH, Nachtsheim, CJ and Wasserman, W. (1996) Applied linear statistical models. Irwin. Regression, introductory: Chatterjee, S and Price, B. (1977) Regression Analysis by example. Wiley. Montgomery, D C and Peck, E A. (1992) Introduction to linear regression analysis. Second edition, Wiley. Regression, more advanced: Bates, DM and Watts, DG (1988) Nonlinear regression and its applications. Wiley. Belsley, DA, Kuh, E and Welsch, RE. (1980) Regression diagnostics: identifying influential data and sources of collinearity. Wiley. Draper, N R and Smith, H. (1966) Applied regression analysis. Wiley. Hamilton, LC. (1992) Regression with graphics: a second course in applied statistics. Duxberry. Neter, J, Kutner, MH, Nachtsheim, CJ and Wasserman, W. (1996) Applied linear statistical models. Irwin. Seber, GAF and Wild, CJ. (1989) Nonlinear regression. Wiley Categorical data: Greenacre, MJ. (1984). Theory and Applications of correspondence analysis. Academic Press. Hosmer, DJ. and Lemeshow, S. (1989). Applied logistic regression. Wiley. Upton, GJG. (1978) The analysis of cross-tabulated data. Wiley. Categorical data, advanced: Christensen, R. (1990) Log-linear models. Springer-Verlag. Feinberg, S. (1980). The analysis of cross-classified categorical data. The MIT Press. Freeman, DH. (1987) Applied categorical data analysis. Wiley. Survival analysis, introductory: Lee, ET. (1980) Statistical methods for survival data analysis. Lifetime Learning Publications. Armitage, P. (1977) Survival analysis, more advanced: Eland-Johnson and Johnson. (1980). Survival models and data analysis. Wiley. Miller, RG. (1981). Survival analysis. Gross, AJ and Clark, VA. (1975) Survival distributions, reliability and applications in the medical sciences. Wiley. Klein, JP and Moeschberger. (1997) Survival analysis: techniques for censored and truncated data. Springer. Multivariate analysis, introductory: Chatfield, C and Collins, AJ. (1980) Introduction to Multivariate Analysis. Chapman and Hall. Manly, BFJ. (1986) Multivariate statistical methods: a primer. Chapman and Hall. Multivariate analysis, more advanced: Johnson, RA and Wichern, DW. (1988) Applied Multivariate Statistical Analysis, second edition. Prentice Hall. Kryzanowski, WJ. (1988) Principles of multivariate analysis. Oxford. Seber, GAF. (1984) Multivariate observations. Wiley. Repeated measures, advanced: Crowder, MJ and Hand, DJ. (1990) Analysis of repeated measures. Chapman and Hall. Jones, RH. (1993) Longitudinal data with serial correlation: A State-space approach. Chapman and Hall. Time series: Brockwell, PJ and Davis, RA. (1996). Introduction to time series and forecasting. Springer. Box, GEP and Jenkins, GM. (1976) Time Series analysis: forecasting and control. Holden Day. Chatfield, C. (1984) The analysis of time series: an introduction. Third edition. Chapman and Hall. Chatfield, C. (1996) The analysis of time series: an introduction. Fifth edition. Chapman and Hall. Diggle, PJ. (1990) Time series: a biostatistical introduction. Oxford Science Publications. Kendall, M and Ord, JC. (1990) Time Series. Third Edition. Oxford. Makridakis, S, Wheelwright, SC and McGee, VE. (1983) Forecasting methods and applications. Second Edition. Wiley. Geostatistics Haining, R. (1990) Spatial data analysis in the social and environmental sciences. Cambridge. Is cs, EH and Srivastava, RH. (1989) An introduction to applied Geostatistics. Oxford. Tables: Most introductory statistics books contain tables. Hald, A. (1952) Statistical Tables and Formulas. Wiley. Stoker, DJ. (1977) Statistical Tables. Academica. TABLE OF CONTENTS
|