Awk

Understanding AWK: A Beginner’s Guide with Examples

AWK is a powerful programming language designed for text processing and typically used as a data extraction and reporting tool. Named after its creators (Aho, Weinberger, and Kernighan), AWK is a versatile tool that can handle complex text-processing tasks with ease. In this post, we’ll explore three AWK commands, ranging from easy to advanced, using specific file contents and showing the output of each command. Let’s dive in!

Example 1: Easy - Extracting Specific Columns

Target File: ‘data.txt’

Lets say the above file contains the following lines:
Name   Age   Occupation
Alice     30   Engineer
Bob      25   Designer
Carol    27    Teacher

Awk Command

awk '{print $1, $3}' data.txt

Output

Name Occupation
Alice Engineer
Bob Designer
Carol Teacher

Explanation

  • awk : The command invokes the AWK programming language.
  • ’{ }’ : The curly braces contain the AWK command to be executed.
  • print $1, $3 : print is and AWK function that prints text, $1 and $3 represent the first and third columns of the input file, respectively. Each $ followed by a number refers to a specific column in the input data.
  • data.txt : The name of the input file to be processed.

This simple command uses AWK to print the first and third columns of data.txt. The {print $1, $3} part of the command tells AWK to display the first and third fields (columns) for each line of the file. This is a great way to quickly extract specific information from a structured text file.

Example 2: Medium - Summing Values in a Column

Target File: ’expenses.txt'

The above file contains the following lines:
Item        Cost
Groceries    100
Utilities    150
Rent      1200
Transport    80
Entertain    200

Awk Command

awk 'NR > 1 {sum += $2} END {print "Total Expense:", sum}' expenses.txt

Output

Total Expense: 1730

Explanation

  • awk : Invokes the AWK programming language
  • NR > 1 : NR is a built-in AWK variable that represents the current record (line) number. This condition ensures that the action is only performed for the lines where NR is greater than 1, effectively skipping the header row.
  • {sum += $2} : Within the curly braces, sum += $2 adds the value in the second column ($2) to the variable sum for each line. kmm* END {print “Total Expense:”, sum} : The END block is execute after all input lines have been processed, it prints the total sum of the values in the second column.
  • expenses.txt : The name of the input file to be processed.

This command sums up the values in the second column of expenses.txt. The NR > 1 condition skips the header row. The sum += $2 part adds each value in the second column to the sum variable. Finally, END {print “Total Expense:”, sum} prints the total expense after processing all lines.

Example 3: Hard/Cool - Finding the longest Line in a file

Target File: ‘paragraphs.txt’

The above file contains the following paragraphs:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum consectetur nunc sit amet risus varius, vel facilisis velit tincidunt. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. In quis luctus libero.

Awk Command

awk '{ if (length($0) > max) { max = length($0); longest = $0 } } END { print "Longest line:", longest }' paragraphs.txt

Output

Longest line: Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.

Explanation

  • awk : The command invokes the AWK programming language.
  • { if (length($0) > max) { max = length($0); longest = $0 } } : Within the curly braces, this block of code is executed for each input line.
    • length($0) : The length function returns the length of the current line ($0 represents the entire line).
    • if (length($0) > max) : This condition checks if the length of the current line is greater than the current value of max.
    • max = length($0) : If the condition is true, max is updated to the length of the current line.
    • longest = $0 : The lonngest variable is updated to store the current line.
  • END { print “Longest line:”, longest } : The END block is executed after all input lines have been processed. It prings the longest line found.
  • paragraphs.txt : The name of the input file to be processed.

This advanced command finds the longest line in paragraphs.txt. The length($0) function calculates the length of each line. The if (length($0) > max) condition checks if the current line is longer than the previously recorded maximum length. If it is, the max variable is updated to the current line’s length, and the longest variable stores the current line. After processing all lines, the END { print “Longest line:”, longest } block prints the longest line.

Conclusion:

AWK is a powerful and flexible tool for text processing that can handle a wide range of tasks. From extracting specific columns to summing values and finding the longest line, AWK commands can be simple or complex, depending on your needs. These examples provide a glimpse into what AWK can do, but there’s much more to explore. Dive into AWK, and you’ll find it an invaluable tool in your text-processing toolkit!