Modern Statistical Computing in R

Universidad Pompeu Fabra

Course Description

  • Course Name

    Modern Statistical Computing in R

  • Host University

    Universidad Pompeu Fabra

  • Location

    Barcelona, Spain

  • Area of Study

    Computer Info Systems

  • Language Level

    Taught In English

  • Prerequisites

    A basic course in statistics.

    Hours & Credits

  • Contact Hours

    45
  • Recommended U.S. Semester Credits
    3
  • Recommended U.S. Quarter Units
    4
  • Overview

    Language of Instruction: English
    Professor: Albert Satorra
    Professor's Contact and Office Hours: Albert Satorra (albert.satorra@upf.edu)
    Course Contact Hours: 45
    Recommended Credit: 5 ECTS credits
    Course Prerequisites: A basic course in statistics
    Language Requirements: None
     
    Course Description:
    Over the recent years, R has become the leading software tool for statistical computing
    and graphics. The software is greatly enhanced by numerous contributed packages
    submitted by users. The majority of computing in the leading applied statistical journals
    is done in R, and it is used almost exclusively in some of the leading-edge applications,
    such as in genetics and data mining. The purpose of this course is to set a foundation
    for full exploitation and creative use of the statistical language for computing and
    graphics R.
     
    Much of the statistical methodology implemented in software packages is used in the
    form of a black box. This is advantageous for a user who is not interested in the details
    of the methods, but the result is often a second-rate application, because the
    implementation, even if of high quality, is often meant for a different context, small
    details in the setting of options are ignored or misunderstood, and the orientation in the
    output, formatted for general interest, is difficult.
     
    The course will introduce students to the syntax and inner workings of R, to become
    proficient in everyday computational tasks with datasets of all kinds, skilled in
    applications of elementary statistical methods, with an emphasis on (initial) data
    exploration and simple graphics. Focus will also be placed on opportunities to enhance
    the learning experience in other statistical courses by illustrating and applying basic
    statistical concepts in R.
     
    Learning Objectives:
    At the end of the course, students wll have learned
    -to use a fundamental tool for computing in the practice of quantitative
    analytical methods (the ?paper-and-pencil? tool of the 21st century), that can work for the
    small jobs (like a pocket calculator) as well as for the big jobs (complex statistical data
    analysis).
    -programming, data handling, transformations, subsetting, exploratory data
    analysis, probability distributions and simulations, regression and linear models,
    summarising data, how to handle large data sets, effective graphics.
    -modern concepts of statistics based on simulations and writing a report of a
    quantitative analysis.
     
    Course Workload
    The course is divided into lectures, discussions, practice with portable computers, and
    tutoring. Students should be prepared to read between 50 to 150 pages per week.
     
    Methods of Instruction:
    The course includes both lectures and field studies. Two-hour classroom sessions are
    normally divided into one-hour lecture and one-hour of practice in computing. Students
    are required to come with their own laptops.
     
    Method of Assessment
    Class participation (15%) homework and mini project (20%) (the equivalent of the
    Midterm exam) the main function of which will be to prepare students for the main
    project (65%). This project will involve some computing in R and submission of a report
    of up to 6 typed pages (not counting appendices). Students will select their
    projects by their own (upon approval of the instructor) and will make a brief oral
    presentation at the end of the course (the equivalent of the Final Exam).
    Class Participation: 15%
    Midterm Exam: 20%
    Final Exam: 65%
     
    1. General introduction to computing
    Using R as a calculator
    Numbers, words and logicals; missing values (NA)
    Vectors and their attributes (names, length, type)
    System- and user-defined objects
    Accessing data (data()). Data in the system and date outside the system
    (read.table, scan)
     
    2. First steps in graphics
    The basics of R syntax
    The R workspace
    Matrices and lists
    Subsetting
    System-defined functions; the help system
    Errors and warnings; coherence of the workspace
     
    3a. Data input and output; interface with other software packages
    Writing your own code; R script
    Good programming practice
    R syntax - further steps
    The parentheses and brackets
     
    3b. Exploratory data analysis
    Range, summary, mean, variance, median, sd, histogram, box plot, scatterplot
     
    4. Probability distributions. Simulations
    Random number generation Distributions, the practice of simulation
     
    5. Apply-type functions Compiling and applying functions Documentation
    Conditional statements
    Loops and iterations
     
    6. Statistical functions in R
    Statistical inference, contingency tables, chi-square goodness of fit, regression,
    linear models, advanced modeling methods
     
    7. Graphics; beyond the basics
    Graphics and tables
    Working with larger datasets
    Principles of exploratory data analysis (big data analysis)
     
    8. Dataframes in R
    Defining your own classes and operations Models and methods in R
    Customising the user's environment
    UPF Study Abroad Program 2016
     
    Required Readings: Handout material will be posted on the web as the course
    evolves.
     
    Recommended bibliography:
    Students are encouraged to consult the following sources on their own.
    Dalgaard, P. (2002), Introductory Statistics with R, Springer
    Dennis, B. (2013). The R Student Companion, Taylor & Francis Group
    Matloff, N. (2011). The Art of R Programming: A Tour of Statistical Software Design,
    William
    Philip H. Pollock (2014). An R Companion to Political Analysis, CQ Press
    Chihara, L. and Hesterberg, T. (2011), Mathematical statistics with resampling and R,
    Wiley
    Lander, J. P. (2014) R for Everyone: Advanced Analytics and Graphics, Addison-Wesley
    Data & Analytics Series

Course Disclaimer

Please note that there are no beginning level Spanish courses offered in this program.

Courses and course hours of instruction are subject to change.