Friday, July 26, 2013

AWK PROGRAMMING LANGUAGE



Named after its authors Aho, Weinberger and Kernighan, awk, until the advent of Perl, was the most powerful utility for text manipulation.

Syntax:
           
            awk options ‘selection_crateria {action}’ file(s)

The selection_crateria filters the input and selects lines for action component to act upon.

Examples:

1.         $awk ‘/director/ { print }’ emplist

Checks for the pattern ‘director’ and prints the entire line(s). If selection_crateria is missing, the action applies to all the lines. If action is missing, the entire line is printed. Either of the two is optional (but not both), but they must be enclosed within a pair of single (not double) quotes.

The following formats are equivalent;

            $ awk ‘/director/’ emplist
            $ awk ‘/director/ {print}’ emplist
            $ awk ‘/director/ {print $0}’ emplist

awk uses the special parameter, $0, to indicate the entire line. It also, identifies the fields by $1, $2, $3, …

2.         $ awk ‘/director/ { print $1, $2}’ emplist

Unlike other Unix filters, awk uses a contiguous sequence of space and tabs as a single delimiter. If the delimiter is other this, we have to explicitly express.

3.         $ awk –F”|” ‘/director/ { print $1, $2}’ emplist

Line addressing is allowed in awk with the help of the built-in variable NR. This prints the lines from 3 to 6.

      4.         $ awk –F”|”   ‘NR==3, NR==6 {print NR, $1, $2, $3}’ emplist

C-like printf statement is available in awk to format the output.

    5.           $awk –F”|”  ‘/director/ { printf  “%3d %-20s %d\n”, NR, $1, $2}’ emplist
Every print or printf statement can be separately redirected with the > and | symbols. However, make sure that the filename or command that follows these symbols is enclosed within double quotes.

6.         $ awk –F”|” ‘/director/ { print $1, $2  | “sort” }’ emplist
7.         $ awk –F”|” ‘/director/  { print $1, $2 > “abc” }’ emplist

Every expression in awk is interpreted either as a string or a number, and awk makes the necessary conversion according to context. Awk allows the use of user-defined variables but without declaring them. Variables are case sensitive.

Ex: x=”sun”; y=”com
      print x y       gives               suncom
           
     x=”5”; y=6;
     print x+y       gives               11

Comparison operators: || (or), && (and),  ! (not)

8.         $ awk ‘$3==”director” || $3==”chairman” {print }’ emplist

Regular Expression Operators: ~ (match), !~ (no match)

9.         $ awk ‘$3 ~ /^a { print }’ emplist

Number Comparison: >, >=, <, <=, ==, !=

Arithmetic Operators: +, -, *, /, %

10.        $ awk ‘$3 > 2000 { printf “%d\n”, $2*0.5 }’ emplist

Built-in Variables:
NR                                cumulative no. of lines read
FS                                input field separator
OFS                             output field separator
NF                                no. of fields
FILENAME                    current input file
ARGC                           no. of command line arguments
ARGV                           list of arguments

Awk patterns can be put in a file and we can awk to look for the pattern in that file and execute on the input file. Here the file is pattern.awk.

$ awk –f pattern.awk  emplist

BEGIN & END Sections: BEGIN performs actions before processing each line and END performs actions after the last line of the file. BEGIN {action} and END {action} is the syntax.

$ awk ‘BEGIN {print “welcome”} /director/ {print} END {“Bye”}’ emplist

awk reads standard input when filename is omitted.

Arrays:
1.         An array is considered declared the moment it is used
2.         Array elements are initialized to zero or empty string unless initialized explicitly
3.         Arrays expand automatically
4.         The index can be anything even a string

Ex:

$ awk ‘ BEGIN { print “REPORT” } /director/  {tot[1]=tot[1]+$6 } END {print
                                                                                                  $tot[1] }’ emplist

Associative Arrays:

Awk does not treat array indexes as integers, the arrays are associative, where the information is held as key-value pairs. The index is the key that is saved internally as a string. When we set an array element using mon[1]=”mon”, awk converts the number 1 to a string. There is no specified order in which the array elements are stored.

$ nawk ‘BEGIN {print “HOME”  “=” ENVIRON [“HOME’]} /director/’ emplist

Functions:
int (x)                            returns integer value of x
sqrt (x)                          returns square root of x
length                           returns the length of a complete line
length (x)                      returns the length of x
substr (string,m,n)         starting from m, n characters as a string
index (s1,s2)                 returns the position of s2 in s1
split(string,array,ch)       splits the string into an array using ch as the delimiter
system (“cmd”)  executes operating system commands and returns exit status

CONTROL FLOW:
if (condition) { statements }
if (condition ) { statements } else {statements}

for (k=1;k<=10;k++)
{
            statements
}
for k (in array)
{
            statements
}

while (condition)
{
            statements
}

No comments:

Post a Comment