Awk
Understanding AWK: A Beginner’s Guide with Examples
AWK is a powerful programming language designed for text processing and typically used as a data extraction and reporting tool. Named after its creators (Aho, Weinberger, and Kernighan), AWK is a versatile tool that can handle complex text-processing tasks with ease. In this post, we’ll explore three AWK commands, ranging from easy to advanced, using specific file contents and showing the output of each command. Let’s dive in!
Example 1: Easy - Extracting Specific Columns
Target File: ‘data.txt’
Lets say the above file contains the following lines:
Name Age Occupation
Alice 30 Engineer
Bob 25 Designer
Carol 27 Teacher
Awk Command
awk '{print $1, $3}' data.txt
Output
Name Occupation
Alice Engineer
Bob Designer
Carol Teacher
Explanation
- awk : The command invokes the AWK programming language.
- ’{ }’ : The curly braces contain the AWK command to be executed.
- print $1, $3 : print is and AWK function that prints text, $1 and $3 represent the first and third columns of the input file, respectively. Each $ followed by a number refers to a specific column in the input data.
- data.txt : The name of the input file to be processed.
This simple command uses AWK to print the first and third columns of data.txt. The {print $1, $3} part of the command tells AWK to display the first and third fields (columns) for each line of the file. This is a great way to quickly extract specific information from a structured text file.
Example 2: Medium - Summing Values in a Column
Target File: ’expenses.txt'
The above file contains the following lines:
Item Cost
Groceries 100
Utilities 150
Rent 1200
Transport 80
Entertain 200
Awk Command
awk 'NR > 1 {sum += $2} END {print "Total Expense:", sum}' expenses.txt
Output
Total Expense: 1730
Explanation
- awk : Invokes the AWK programming language
- NR > 1 : NR is a built-in AWK variable that represents the current record (line) number. This condition ensures that the action is only performed for the lines where NR is greater than 1, effectively skipping the header row.
- {sum += $2} : Within the curly braces, sum += $2 adds the value in the second column ($2) to the variable sum for each line. kmm* END {print “Total Expense:”, sum} : The END block is execute after all input lines have been processed, it prints the total sum of the values in the second column.
- expenses.txt : The name of the input file to be processed.
This command sums up the values in the second column of expenses.txt. The NR > 1 condition skips the header row. The sum += $2 part adds each value in the second column to the sum variable. Finally, END {print “Total Expense:”, sum} prints the total expense after processing all lines.
Example 3: Hard/Cool - Finding the longest Line in a file
Target File: ‘paragraphs.txt’
The above file contains the following paragraphs:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Vestibulum consectetur nunc sit amet risus varius, vel facilisis velit tincidunt.
Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.
In quis luctus libero.
Awk Command
awk '{ if (length($0) > max) { max = length($0); longest = $0 } } END { print "Longest line:", longest }' paragraphs.txt
Output
Longest line: Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas.
Explanation
- awk : The command invokes the AWK programming language.
- { if (length($0) > max) { max = length($0); longest = $0 } } : Within the curly braces, this block of code is executed for each input line.
- length($0) : The length function returns the length of the current line ($0 represents the entire line).
- if (length($0) > max) : This condition checks if the length of the current line is greater than the current value of max.
- max = length($0) : If the condition is true, max is updated to the length of the current line.
- longest = $0 : The lonngest variable is updated to store the current line.
- END { print “Longest line:”, longest } : The END block is executed after all input lines have been processed. It prings the longest line found.
- paragraphs.txt : The name of the input file to be processed.
This advanced command finds the longest line in paragraphs.txt. The length($0) function calculates the length of each line. The if (length($0) > max) condition checks if the current line is longer than the previously recorded maximum length. If it is, the max variable is updated to the current line’s length, and the longest variable stores the current line. After processing all lines, the END { print “Longest line:”, longest } block prints the longest line.
Conclusion:
AWK is a powerful and flexible tool for text processing that can handle a wide range of tasks. From extracting specific columns to summing values and finding the longest line, AWK commands can be simple or complex, depending on your needs. These examples provide a glimpse into what AWK can do, but there’s much more to explore. Dive into AWK, and you’ll find it an invaluable tool in your text-processing toolkit!