TA Fall2019 STAT 204

Graudate core course, UCSC, Statistics, 2019

STAT 204 Introduction to Data Analysis

Here are relevant resources for STAT 204.

Use R Markdown

1.1 Helpful Guide

R Markdown: The Definitive Guide

1.2 Use R Markdown with a template

Often time, we want to pass a template to Latex that fulfills the style guide of specific journals. Among them, the ASA two-column template is a department favorite. Being the all-time advocate for R Markdown, I will illustrate how to incorporate the ASA template into an R Markdown file, so that you can take advantage of the reproducibility and simple syntax of RMD while still check all the boxes for style requirements.

Step 1: acquire the template

Download the asaproc.cls file into the directory where the .rmd file is located.

Step 2: prepare a raw.rmd file

In this step, we focus on generate outputs in forms of figures, tables and summary statistics in an intermediate file. The pdf generated by this file is for debug purpose only. Use the following header, R global options code chuck, and section syntax to create the skeleton of the final paper, with all results in place. The R global option will save the figures created in this file under the directory Figs/. A .tex file will also be generated since the keep_tex argument in the header is set to true.

---
geometry: "margin=0.8in"
output: 
  pdf_document:
    keep_tex: true   
documentclass: asaproc
bibliography: "your_path_to_bib/yourbib.bib"
header-includes:
    - \usepackage{setspace}
    - \singlespacing
    - \usepackage{bm}
---

 ```{r global_options, include=FALSE}
    knitr::opts_chunk$set(fig.width=12, 
    fig.height=8, fig.path='Figs/',echo=FALSE,  warning=FALSE, message=FALSE)
     ```
\title{This is an example}

\author{<you name>^1$\\
UCSC$^1$}
 
\maketitle

\section*{Abstract}

[@box2011bayesian]

\begin{keywords}
Bayesian hierarchical model
\end{keywords}
\section*{1. Introduction}
\subsection*{1.1 Description}
\subsection*{1.2 EDA}
\end{document}

One important thing to bear in mind is that I have to input the section number manually, i.e., \subsection*{1. Introduction}, to make them show up correctly with the template. This makes using this method slightly inconvenient, but the gain is the figures will be populated properly inline with the correct proportion in two columns.

Step 3: save the raw.tex file as final.tex and make some changes

Comment out the following section in the final.tex file to make the title show up properly:

% % Create subtitle command for use in maketitle
% \newcommand{\subtitle}[1]{
%   \posttitle{
%     \begin{center}\large#1\end{center}
%     }
% }

% \setlength{\droptitle}{-2em}
%   \title{}
%   \pretitle{\vspace{\droptitle}}
%   \posttitle{}
%   \author{}
%   \preauthor{}\postauthor{}
%   \date{}
%   \predate{}\postdate{}

Input equations, analysis, and conclusion. You will notice that all pictures and tables are handeled by auto-generated code that shouldn’t be changed.

Step 4: compile final.tex file to get your final.pdf file.

This is it! With these 4 steps, you can write a proper ASA format journal article using RMD. I find this approach extremely helpful when taking the take-home exams under 24-48 hour time limit.

Use lm in R

$R^2$ computed by lm when there is no intercept

When there is no intercept, R uses a different formula to compute $R^2$:

\[R^2 = 1 - \frac{\sum_i (y_i - \hat{y_i})^2}{\sum_i y_i^2}\]

The numerator in the second term will increase when the intercept is omitted, while the denominator will increase much more, which results an artificially inflated $R^2$. Such $R^2$ should not be used for model comparison.

Here is a more detailed (explanation)[https://stats.stackexchange.com/questions/26176/removal-of-statistically-significant-intercept-term-increases-r2-in-linear-mo]