Simple Git Guide

This post is based primarily on a great git-tutorial git-scm.com.

1. Types of files

The files in your working directory may be of different states:

• tracked files are files that were in your last snapshot (they are already preserved in your git repository),
• untracked files are all the others, i.e. that are files that were not in your last commit and not in your staging area (not ready to commit).

Starting a new git repository

The post briefly describes how to create git repository.

Schedule R Code for Windows

This post  explains how to set your code to work monthly/weekly/daily which is very useful for data monitoring, scraping or automatic reports. In Mac and Linux, cron is usually used for this purpose. Windows users can work with Task Schedular, using the command line or special R features.

Principal Component Analysis in R

The full information on the theory of principal component analysis may be found here. This article is about practice in R. It covers main steps in data preprocessing, compares R results with theoretical calculations, shows how to analyze principal components and use it for dimensionality reduction. The last section is devoted to modelling using principal components and comparing it to LDA.

Principal Component Analysis

Principal component analysis (PCA) is an unsupervised method of generating components from a large set of variables available if a data set which represent a combinations of features that capture as much information in the data as possible. Or in a Wikipedia way:

A statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

Eigenvectors and eigenvalues

Spectral theorem:

If A is symmetric, then A is orthogonally diagonalizable and has only real eigenvalues. In other words, there exist real numbers λ_1, …, λ_n (the eigenvalues) and orthogonal, non-zero vectors v_1, …, v_n (the eigenvectors) such that for each i = 1,2,…,n:

$A v_i = \lambda_i v_i$.Read More »

Matrices in NumPy

NumPy: Matrices

import numpy as np
a = np.array([[1, 2, 3], [2, 5, 6], [6, 7, 4]])

b = np.eye(5)
c = np.ones((7, 5))
d = np.zeros((7, 5)) # tuple as argument!

v = np.arange(0, 24, 2) # start, stop, step as arguments
d = v.reshape((3, 4)) # reshape tells dim of matrix

Indexing:

print (d[2, 1]) # print an element
print (d[[1, 0], [2, 3]]) # print two elements
print(d[1,:]) # print a row
print([d[:,3]) # print a column

Multiplication:

r1 = np.dot(a, b)
r2 = a.dot(b)

r = a * b # is a multiplication by coordinates

Matrix operations

# Transposing:
b = np.transpose(a)
c = a.T