A server is a computer or device on a network that manages network resources. Essentially a server is a collection of linked computers that have processors (CPU’s or GPU’s), memory (RAM) and disk storage space, like hard drives.
Servers are used for bioinformatics because data sets can be very large, and can take long times and/or large amounts of RAM to process.
Servers generally run using a linux operating system, rather than the MicroSoft Windows or MacOS. This is beause linux is more stable, more secure and has methods to process and distriute large numbers of tasks to different processors.
We will use a server based in York, called York Advanced Research Computing Cluster, or YARCC for short.
NOTE: You only need to use the VPN if you are working off campus
If you are on campus you can skip to either the Logging into the server from a PC or Logging into the server from a Mac sections.
To work remotely on YARCC, you need to log in.
Regardless of whether you are logging in from a PC or a Mac, you will need to use the York VPN application PulseSecure. The download and installation instructions are provided here.
When you open PulseSecure, you should have no connections added. As shown below:
Click on the plus to add a connection. To connect to York you will need to fill in the VPN settings for the Name: and Server URL:, as below:
Then click connect, and add your username and your password to log into the YARCC server.
If you connect and the information is correct, you should have the following screen, with a green tick, and a disconnect option available.
Now that you are connected to the York VPN, you can connect to the YARCC compute cluster. This cluster contains software and data that we will use for the remainder of the course.
Logging in from a PC is slightly different from logging in from a Mac.
If you are using a windows computer, you will then need to install putty and configure this to connect using the Pulse secure VPN you just set up. Putty should already be installed on the University PCs. When you run putty you should have the following screen.
You will need to fill out the host name and make sure it is connecting through SSH as shown beneath:
This will be the first time connecting to the server using your login set up, so it will give you an authentication screen. You will need to select yes, to authenticate.
Once this has connected you should have a terminal screen open which will then ask you to again fill in your login details. Note: When you are typing in your password, on the screen it will look like nothing is being typed in! Do not worry about this. If you log in successfully you should see the following two windows:
If you are logging in from a mac, this will be much easier as the putty software is not needed.
You will still need to log in using the VPN.
Once this is connected, open a window of the Terminal App. Then, log in using the ssh (secuure shell) command as follows (replacing username with your email id):
ssh username@biollogin.york.ac.uk
When you press enter this will ask for your password. If both are entered succesfully this should have you logged in! Note: When you are typing in your password, on the screen it will look like nothing is being typed in! Do not worry about this.
YARCC runs on a linux operating system. To do the bioinformatics processing, you will need to use linux a little.
If you are familar with linux and merely want a reminder, get a linux cheat sheet here. If not, read on.
In the same way that windows PC’s have folders where you store your files, linux systems have directories. They are essentially the same thing: a place to organise files and programs.
Linux often uses a command line (text-based) interface rather than a graphical interface like windows, so directories are referred to with text.
When you first log into a linux server you are directed to your home directory. On YARCC your home directory will be:
/home/userfs/t/username
Note that username is replaced with your username (like kd684, etc). So everyone has their own unique home directory.
In linux you are always ‘working from’ a specific directory. You can think of this as where you are in the file system. You are always somewhere!
To find out what your working directory at any point, use this command:
pwd
Directories can contain files and other directories. Just like houses contain rooms, rooms contain items and boxes. And tins and jars within boxes etc.
Directories are nested.
As you work in linux it is important to keep track of where you are in relation to your directories.. This image below shows how directories might be organised. The something directory isn’t too useful, but the data directory tells you what is in there. (You’ll find out about bam files later.)
Directories and files have a path, which is the list of subdirectories that you need to specify to go to that location. The path of your home directory on the YARCC linux system is something like: /home/userfs/t/tmpq1234.
To change which directory you are ‘in’ use this command:
cd data
This will take you to the data directory (if it exists).
It is important to understand the concepts of directories and paths.
Discuss with your other students or a demonstrator before you move on.
This set of linux command will get you started. We suggest that you type each of these commands into your linux system in order. Do not copy and paste the text from this web page.
First, choose a building name, a room name, and a word for a box or container in any language. Note these down somewhere.
Then log into the server (if you haven’t already).
Then check where you are, with:
pwd
mkdir building
cd building
mkdir room
cd room
pwd
touch box.1
touch box.2
touch smallbox.1
ls
Most commands in linux have optional ‘flags’, that are added with extra letters after the command. Flags allow you to run the command with different variations. Some of these flags can be very useful.
man command
(replacing command with something like ls, touch, cd etc)
ls -lrt
Give this a try.
Note: listing files sorted by time is quick way of finding files that you have most recently created.
cd ../
This is how cd ../ changes your working directory.
NOTEFiles can have any name in linux.File names and commands are case sensitive. So the command to change directory cd will not work if you type Cd. Be careful with dots .. and spaces in linux - they matter!
Make a copy of a file called myfile. The new copy is called myfile2.
cp myfile myfile2
If your working directory is kitchroom, you can copy a file called myfile from your working directory to the fridge like so:
cp myfile fridge/
Remove a file called this.
rm this
Show (or print out to the screen) all of a file called this.
cat this
Warning!: some of the file we will work with are very large. Using cat can take a long time. To escape from a command that is running type Ctrl+Z.
Show the first ten lines of a file called this.
head this
Show the last ten lines of a file called this.
tail this
One of the most powerful parts of file handing in linux is its use of wild cards. These allow you to specify groups of files to move or copy, and in many other situations.
There are three main wildcards in Linux:
An asterisk (*) - matches one or more occurrences of any character, including no character.
Question mark (?) - represents or matches a single occurrence of any character.
Bracketed characters ([ ]) - matches any occurrence of character enclosed in the square brackets.
For example, to list only the files in your room directory that start with box, do this:
Go back your home directory. You can do this by either typing out the full path, or by using the ~ shortcut:
cd ~
List files that start with box:
ls ~/building/room/box*
To list files that end with 1:
ls ~/building/room/*1
To list files that start with b and end with a . followed by any single character:
ls ~/building/room/b*.?
To list files that start with anything, end with anything but contain the word box:
ls ~/building/room/*box*
You will probably find the nano text editor easiest to use. vi is another editor, but takes some getting used to.
nano people.txt
This will open the file you have just created people.txt within the text editor nano. Fill this file with a list of names by typing these in:
Boris Johnson
Borat
Rachel Johnson
Jo Johnson
Dominic Cummings
To save the file and its contents you need to press control X, followed by Y and then Enter.
cat people.txt
wc -l people.txt
grep Bo people.txt
grep -v Johnson people.txt
grep -c Bo people.txt
grep -vc Bo people.txt
Another powerful part of linux systems is the ability to ‘pipe’ or redirect the output of one program directly into another program (using the | symbol), or into a file (using the > symbol). Pipes work like an assembly line.
grep -v Bo people.txt | wc -l
Note that you finish the grep command, add a pipe symbol (|), then use the wc command
grep -v Bo people.txt > not_Bo_people
Note that you finish the grep command, add a redirect symbol ( > ), then specify a file name.
DISCUSS WITH OTHER STUDENTS
Quiz each other about what these commands mean:
cp this here/
rm *.vcf
mkdir something
rm ~/building/*/*.?
cp ~/building/room/*ox* ~/building/
And one command you should not do!
rm *.*
Why not?
This should be all you need for linux at the moment.
Examples of all the commands you will need (and more) are in this cheat sheet.
The File Commands and Shortcuts will be the most useful for you now.