Simple Git Guide

This post is based primarily on a great git-tutorial git-scm.com.

1. Types of files

The files in your working directory may be of different states:

  • tracked files are files that were in your last snapshot (they are already preserved in your git repository),
  • untracked files are all the others, i.e. that are files that were not in your last commit and not in your staging area (not ready to commit).

Read More »

Advertisements

Principal Component Analysis in R

The full information on the theory of principal component analysis may be found here. This article is about practice in R. It covers main steps in data preprocessing, compares R results with theoretical calculations, shows how to analyze principal components and use it for dimensionality reduction. The last section is devoted to modelling using principal components and comparing it to LDA.

Read More »

Principal Component Analysis

Principal component analysis (PCA) is an unsupervised method of generating components from a large set of variables available if a data set which represent a combinations of features that capture as much information in the data as possible. Or in a Wikipedia way:

A statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

Read More »

Matrices in NumPy

NumPy: Matrices

import numpy as np
a = np.array([[1, 2, 3], [2, 5, 6], [6, 7, 4]])

b = np.eye(5)
c = np.ones((7, 5))
d = np.zeros((7, 5)) # tuple as argument!

v = np.arange(0, 24, 2) # start, stop, step as arguments
d = v.reshape((3, 4)) # reshape tells dim of matrix

Indexing:

print (d[2, 1]) # print an element
print (d[[1, 0], [2, 3]]) # print two elements
print(d[1,:]) # print a row
print([d[:,3]) # print a column

Multiplication:

r1 = np.dot(a, b)
r2 = a.dot(b)

r = a * b # is a multiplication by coordinates

Matrix operations

# Transposing:
b = np.transpose(a)
c = a.T