Lecturer: Prof. Hrvoje Jasak
This course is teaching you how to be an efficient programmer and a
valued member of a team: organisation, programming and “software management skills”. I will be teaching within the UNIX OS environment, with an “old-fashioned” setup: shell, editor, debug tool, self-contained libraries, compiled code
Lectures:
- Your computer infrastructure
– know your computer: what is available
– CPU, cache 700 GiB/s (level 1), 200 GB/s Cache 2 memory 10 GB/s , disk (storage device) 100-200 MB/s (2 GB/s max). Adding complexity slows down the memory hierarchy
– data access speed and considerations: what can I afford to do? Adding complexity slows down the memory hierarchy Latency and bandwidth are two metrics associated with caches
– fetching data from memory in pages
– filing system layout: A file system provides a way of separating the data on the drive into individual pieces, which are the files. It also provides a way to store data and retrieve data associated with files: filenames, permissions, size, access time, other attributes. Journaled, non-journaled, RAID. Filing systems as disk, tape, transactional database. File locking and concurrency control. HPC and distributed filing systems. memory controller hub, known as the northbridge
– realistic layout of a network and communication speed. Bandwidth of processors: Intel Xeon 3200 12800 MB/s, PCI-Express link: 16 BT/s (gigatransfers per second)
– parallel computing; shared memory, distributed memory architecture
– distributed memory communication speed: Infiniband. Adapter latency 0.5 mu s, throughput 120-200 Gb/s
– GPUs and vector computers; vectorisation, pipelining - UNIX Working Environment
– computer science is about humans: computers handle the data in a much more efficient manner
– operating system commands: shell environment, startup files. Automate and customise your shell (as you use it a lot!)
– env, $SHELL, uname
– redirection and paths: executable and library paths.
– home directory ls pwd touch rm mkdir rmdir, cd, cp, more, cat, head, tail
– editors: vi, emacs; any others?
– file permissions, Manipulating files, organisation of the filing system
– program files, headers, footers, tar archives, backup and related issues
– find grep (locate)
– moving files around: scp ssh tar gzip
– man pages
– alias commands
– how to organise your files? Who is this for? organisation of own directories
file headers and footers, file templates
– Backup Your Work - Executable and environment
– compiled and interpreted code. What is a compilation, linking and execution
– interface of a code with the operating system: main(). Static and dynamic library binding
– source code, object code, static library, shared library, compile and run-time binding
– what does a compiler actually do?
– heap, stack function pointer allocation
– segmentation violation errors
– compile-time errors vs run-time errors
– why should I fix a warning?
– strongly typed vs weakly typed languages
Optimisation, pre-fetch, cache hit rate
– optimisation: why optimisation matters, what to expect from the compiler/optimizer/linker
– guidelines on optimisation: small loops and big loops
– function calls and if-statements in loops, optimisation bottle-necks - Practical programming 1
– software development cycle and extended cycle
– how do I start? Hello World; for a complex project
– CPP pre-processor
– #include, #ifdef and compilation instructions
– file templates: Project, copyright, author, doc contents. Corrupted files
– naming conventions: underscores, capital letters; training underscores
– enforcing conventions and code review
– organising control statements
– KISS means Keep It Simple and Stupid
– interfaces and assumptions in code design
– documentation and comments
– local data, function arguments (in and out), class data, static data, global data. Lifetime of data, naming functions and data
– what is debugging and how to start? - Data types and Computer arithmetic
– Data Types
– fundamental data types: int, float, char, void
– memory layout, pointers, function pointers, void* type
– const and non-const access, pointer, reference
– data layout and typing; check data types
– initialising the data
– Computer arithmetic
– IEEE standard
– floating point exception
– overflow, underflow, variable ranges
– built-in functions
– What is a NaN
– what to check and where? - Container types
– single allocated block of memory: fetch pages and general memory access
– singly linked lists, doubly linked lists
– hash tables
– user input via dictionaries: (XML format?)
– sparse matrix format
– array bounds checking - Functions, classes, variable names
– strongly typed languages: what has a compiler ever done for me?
– global data, data encapsulation
– what is a class: data encapsulation
– getting data in and out of a function: union, class, memory layout
– public, private, protected
– function and variable names; telephone test on code clarity
– global variables; singletons; enumerations, complex data types
– abstract base class
– run-time selection
– templating: Generic programming - Practical Programming
– Build system: why use a build system? make/cmake, home-brew, compilation flags; debug -g -O optimisation flags
– Version control system: git, why version control, “I will examine your project via git log”
– Trapping errors at run-time: put in debug checks
– Garbage collection: cost and benefits
– Programming Design Patterns: lazy evaluation, factory mechanism
– Defensive programming: what to check and how? overhead in checking: what can I afford?
– Inside the C++ object model: how does C++ actually work, private, protected, public and static function, virtual function table, virtual destructor, pointers and references, const- and non-const references, generic programming, inline functions
– Efficiency of programming and execution: n-squared algorithms
Software re-use, external libraries
– Agile programming
– Testing and documentation: unit testing, start by writing the test, input data and output data, class/function level documentation: algorithms; references, High Performance Computing: Programming apsects, distributed memory systems
– HPC: local computation vs communication, load balancing
Practical Sessions:
Demonstrations on programming skills and practical programming are offered as a part of lecture delivery
Prerequisites:
Programming skills in a compiled and interpreted environment: C++ and Python. Familiarity with a software development environment under UNIX or Windows.
Recommended reading / references:
- Schopatz, A. and Huff, K.: Effective Computation in Physics: Field Guide to Research with Python, O’Reilly, 2016
- Stroustrup, B.: C++ Programming Language, Addison-Wesley, 2013
- Gamma: Design patterns : elements of reusable object-oriented software 1995