Writing Scripts

A script is a file containing a sequence of commands to be executedIn Python, this is just a simple .py file: my_script.py, hello_world.py, and so on. For our purposes, Python scripts will be run in a shell.

Before we begin, there’s important clarification around the terms script, module, package, and library.

The two lowest levels are scripts and modules. They may appear similar, but generally speaking, scripts are meant to do something, while modules can just have methods and variables that can be imported to other files. If we are executing a .py file directly, we classify that as a script; otherwise, it is a module.

Packages are collections of modules that fulfill similar purposes. They contain an __init__.py file to distinguish packages from general files containing modules. You can access sub-packages with the "." structure, as in xml.etree. The __init__.py structure prevents importing from lower-level packages, so modules must be acquired from

Libraries are similar in structure to packages, and the two are often used interchangeably. A frequent generalization is that libraries are collections of packages, which are themselves collections of modules.

Don’t worry if this is confusing — the differentiation is primarily for file systems, as Python classifies modules, packages, and libraries all as module type. Your main takeaway should be that libraries are meant to be imported for their contents, while scripts are meant to be run directly.


Shells & shebangs

As a brief review, shells are run in a terminal (also called a command line) that connects the user to their files without a GUI. Terminals use UNIX as a programming language, and the shell enables you to interact with UNIX. While imperative for executing scripts, understanding shells isn’t required to write your scripts.

Shebangs are typically the first line in your script. It takes the form #!/bin/bash, and indicates the interpreter used for executing the script. bash is the default shell used by most terminals.

Without a shebang in your script, you would execute your file in a command line as follows:

python ~/my_script.py

If we didn’t use the python keyword, the operating system would try to execute the script using various shells; this would inevitably fail, as (if you recall) shells are UNIX-based, and Python is not UNIX.

Shebangs are useful because they remove the need for python or python3 at the start of a command when executing a script, automatically using the interpreter indicated by the shebang.

The standard Python script shebangs are:

  • #!/usr/bin/python

  • #!/usr/bin/env python

The first will use the system default Python interpreter located at /usr/bin/python. The second is whatever interpreter is used if you just type python. Importantly, you can activate your own interpreter and environment with a virtual environment or custom conda, then choose them using the #!/usr/bin/env python shebang.


Package-proofing

When you want a script to run with a certain set of Python packages present in a given environment, it may not be convenient for you to run a command like conda activate my_environment prior to executing the script, especially if you have many packages. In these situations, you can simply use your shebang to point to the interpreter associated with my_environment.

For example, the interpreter used with our Fall 2020 semester environment is at /class/datamine/apps/python/f2020-s2021/env/bin/python. If we wanted our script to run using all of the packages we’ve installed into our environment, we can simply use this shebang: #!/class/datamine/apps/python/f2020-s2021/env/bin/python.


Arguments

Arguments are the values passed to the script. For example, in the following command, -i, special word, and my_file.txt are arguments being passed to the function grep.

grep -i 'special word' my_file.txt

This same structure can be applied to Python scripts:

$HOME/my_script.py -i 'okay, sounds good!'

The acceptance of arguments in the command line depends on how you write your script. To access and utilize arguments in your Python script, you can use the sys package. For example, take the following script:

import sys

def main():
    print(sys.argv)

if __name__ == '__main__':
    main()

We can then run our example execution from before and get output:

$HOME/my_script.py -i 'okay, sounds good!'
['$HOME/my_script.py', '-i', 'okay, sounds good!']

Evidently, sys.argv returns a list of arguments, and because it’s a list, you can index for the n-th argument using sys.argv[n]. Keep in mind that sys.argv[0] is simply the script name.

As mentioned, the functionality of arguments depends heavily on how you write your script and order your arguments. If we were to write our grep example in a different order:

grep my_file.txt 'my_pattern'

This execution would fail because grep requires pattern before file. Programming your own script is much easier when you enforce the position of your arguments.


Options/Flags

Arguments can also be optional, those of which are often called flags or options. Most flags have both a short form beginning with "-" and a long form beginning with "--" (-i is short for --ignore-case). Options have default TRUE value when included, FALSE when not included, and some can have further, non-boolean values.

Another difference between the forms is space. When using short form, the value for the option is separated by a space (grep -f my_file.txt), while long form separates the value with an equals sign (grep --file=my_file.txt).