AWK Tutorial – Day 2

In previous session, we saw few AWK one-line examples, which can help us to do some simple tasks. In this session, we will try to understand the structure of an AWK Program and the features available in language.

Structure of the simple AWK Program:

awk <pattern> {<action>}

Pattern is a condition that will evaluated for each and every line of the input. Pattern can be an arithmetic or string or regex comparison condition. If the condition satisfied then the AWK will execute the ‘action. Action can be a NULL or single or set of command.

If no pattern and no action specified then, AWK will do nothing.
$ awk ‘ ‘ /etc/passwd

If only pattern specified with-out an action then, the default action {printing the whole line} will be executed for every matching line.
$ awk ‘//’ /etc/passwd
$ awk ‘/bash/’ /etc/passwd

If only action specified with-out a pattern then, the action will executed for every input line.
$ awk -F: {print $1} /etc/passwd

Structure of the formal AWK program:

# Comment Line
BEGIN { action }
pattern1 { action }
patternN { action }
END { action }

AWK provides two default patterns BEGIN and END. These two words specify actions to be taken before any lines are read like initializations operation, and after the last line is read, like printing final results.

# ex1.awk
BEGIN {
  print “BEGIN”
  cnt = 0
}
{
  cnt++
}
END {
  print “END”
  print “Total no of lines = ” cnt
}

In this example, we have BEGIN and END patterns and a null pattern with an action, that count the no of lines in the input file.
$ awk -f ex1.awk input.txt

# ex2.awk
BEGIN {
  print “BEGIN”
  c1 = c2 = 0
}
/bash/ {
  c1++
}
/nologin/ {
  c2++
}
END {
  print “Sub with Bash Shell: ” c1
  print “Sub with-out any shell: ” c2
}

In ex2.awk, apart from the standard patterns, we have 2 patterns. One will try match the word “bash” in the input line and other one will try match the word “nologin” in the input line.
$ awk -f ex2.awk /etc/passwd

Variables:
In a variable, you can store Number or String values. The kind of value a variable holds can be changed over the life of a program. Variable Name must start with character and can be followed by a sequence of chars, digit and underscore. Name is case sensitive. New variables are automatically initialized to empty string and it is equal to zero.

AWK supports two kind of variables, Built-in and User defined. User defined variable created the users and used to store some value in the program. AWK maintain a set of built-in variables for a specific purpose. This variables are in capital letters. The built-in variables are in-turn classified into 3 types. In one set of built-in variables, changing value will create some unwanted effects. So you should not change its value. Another set of built-in variables are provide to change the functionality of the AWK, by changing its values. So you can change its values, as per the need. Positional variables are built-in variable and used to refer the fields ($0, $1, $2, ..) in the input line. you can set some other value to this variables like $1=””, $2=”HI”.

For the complete list of built-in variables, please refer the AWK man page.

Built-in variable:
ARGV, ARGC: Command line argument list
$ awk ‘BEGIN { for (i = 0; i < ARGC; i++)  print ARGV[i] }’ file1 file2 file3

NF: Number of fields or column in the current record.
$ echo “11 22 33 44 55” | awk ‘{print NF}’

NR: Number of record in the input file.
$ awk ‘END {print NR}’ /etc/passwd

FS: Input Field separator
It is the symbol used to split the line into fields. The default value is ” “, a string consisting of a single space. As a special exception, this value means that any sequence of spaces, tabs, and/or newlines is a single separator.  It also causes spaces, tabs, and newlines at the beginning and end of a record to be ignored.

OFS: Output Field Separator
It is the symbol printed in between the fields in the output of print command. The default value is ” “.

RS: Input Record Separator
It is symbol used to separate the records. The default value is “n”.

ORS: Output Record Separator
It is the symbol output at the end of every ‘print’ statement. The default value is “n”.

FILENAME: Currently processing or input filename

IGNORECASE: To ignorecase, Set a non-null value to this variable.

Array:

Awk programming language support the one-dimentional array. Array is associative memory. Associative array is name(index) and value pair. Its order is irrelavent and elements can be added and deleted in anywhere in the array. The index can be numbers (+ve or -ve) or strings or mixed of both.

Example of a vaild Array: A[1] = 0, A[“two”] = “TWO”, A[-3] = -3, A[0] = “Zero”

Delete an element with index “two”: delete A[“two”]

Delete the array: delete A

AWK Support only one-dimentional array. But AWK will convert this A[5″,”6]=56 like A[“5@6”] = 56. Using SUBSEP, we can change the “@” symbol. Using this trick, we can emulate the multi-dimentional array.

Arithmetic Operator:
+ (Add), – (Sub), * (Multi), / (Div), % (Reminder), ++ (Auto Incr), — (Auto Decr)
+=, -=, *=, /=, %=

String Operator:
<space> Concatenation

Relational Operator:
It can be used for both Numbers and Strings. In strings, lower case is greater then upper case letters.
==, !=, >, >=, <, <=

Regular Expression Operator:
~ (Matches), !~ (Doesn’t Match)

Boolean Operator:
&& (And), || (Or), ! (Not)

Bitwise Operator:

No bitwise operators in AWK. But GAWK provides bitwise functions: and(v1, v2), or(v1, v2), xor(v1, v2), compl(v), lshift(V, C), rshift(V, C).

Control Statements:

  1. if ( conditional ) statement [ else statement ]
  2. while ( conditional ) statement
  3. do { statement } while (conditional);
  4. for ( expression ; conditional ; expression ) statement
  5. for ( variable in array ) statement
  6. switch (condition) { case pattern: { statement } default: {statement} }
  7. break, continue, next, exit.

In the next session, we will see few examples to understand the features listed in this session.

Advertisements
Tagged with:
Posted in Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: