NOTES ON AWK


1 GENERAL SYNTAX

1.1 Call

Awk is a shell command. It can be executed on a file:

awk '<pattern1> { <action1> } ... <patternN> { <actionN> }' <optional-declarations> <file>

or with a redirection:

<unix-command> | awk '<pattern1> { <action1> } ... <patternN> { <actionN> }' <optional-declarations> 

1.2 Data

Data can thus be either a text file, or the output of a shell command. These data are splitted in a series of records (by default, the lines of the file). The records are splitted in fields (by default, the words).

1.3 Actions

Actions are instructions to apply to data depending on the conditions given by <pattern>.

The default action is print, i.e. <pattern><pattern> { print }.

1.4 Conditions

Conditions are boolean instructions selecting the data on which the actions will apply. They can also be BEGIN or END instructions.

By default, all the data are considered.

1.5 Declarations

Optional variable declarations can be added at the end (e.g. change the value of FS, etc.).


2 VARIABLES AND FUNCTIONS

2.1 Identifiers

$i
→ identify the ith field of the record (1≤i≤N).
$0
→ identify the whole record.

2.2 Predefined Variables

RS (Record Separator)
→ delimiter to split the data into different records. By default, it is \n.
ORS (Output Record Separator)
→ similar to RS, but for outputs. By default, it is \n.
NR (Number of Records)
→ number of current lines, by default (1≤NR≤N). It is 0 for an empty file.
FS (Field Separator)
→ delimiter to split the records into fields, during the reading. By default,it is SPC.
OFS (Output Field Separator)
→ similar to FS, but for outputs. By default, it is SPC.
NF (Number of Fields)
→ number of fields in the current record (1≤NF≤N). It is 0 for a blank line.

2.3 Booleans

0
→ false.
1 or non zero
→ true.
&&
→ and.
||
→ or.
!
→ not.
+<var>
→ true if <var> is a numerical variable.

2.4 Predefined Functions

print(<val1>, ... , <valN>)
→ write <val1> to <valN>, which can be the value of a variable, a field or a string between quotes.
printf("<format>",<var>)
→ where the format can be a combination of strings and instructions:
%d
→ integer, free format;
%<n>d
→ integer, on <n> columns;
%f
→ real, free format;
%<n>.<d>f
→ real with <d> decimals, on <n> columns;
%s
→ string, free format;
%<n>s
→ string, on <n> columns.
toupper(<var>)
→ convert <var> in upper case.
substr(<var>,<first-char>,<length>)
→ extract from the string <var> the characters between <first-char> and <first-char>+<length>.
split(<varIN>,<varOUT>,<sep>)
→ split the string <varIN> in a table <varOUT> according to the separator <sep>.
length(<var>)
→ number of characters in <var>.

2.5 Operations

Simple operations can be done on variables. In particular:

Initialization
<var>=<val> (by default, a variable is initialized to 0);
Incrementation
<var>++;
Arbitrary Incrementation
<var>+=<incr>.

2.6 Tables

Tables are associative. They work as a dict in Python: ARR[<var>] where <var> is not necessarily an index, but can be a field.


3 ALGORITHMIC

3.1 Instructions at the Beginning or the End of a Read

BEGIN { <action> }
→ perform <action> before the reading. This can used to initialize variables.
END { < action> }
→ perform <action> after writing.

3.2 Structure

Iterative schemes
{ for (<arr1> in <arr2>) <action> }.
Decision schemes
{ if (<cond1>) { <action1> } else if (<cond2>) { <action2> } else { <action3> }.

BACK

Author: F. Galliano
Last update: 05 janv. 2024