Metspitzer wrote:
Here is the original file. Give it a shot. I think it would have
been easier for me to just flip the lines one at a time than for you
to go through this. I wanted the easy way out. You have done more
work than I.
Thanks
http://www.filedropper.com/health
Program modified, to handle new input formats. There are at least five
variants on the date field format.
Part of my purpose in writing a script, is to show how much effort it
takes, to fix up the date field format and make the computer understand
it. Procedural languages such as the one I'm using, are "brittle".
Note that, when you take the shorthand like this...
03/2010 Major event
03/22 Minor event at day 22 of month
and invert it, it doesn't make quite as much sense, and I don't know how
to make it look better in that case. You may need to do some more edits
to the inverted file, like maybe "03/22/2010" for the minor.
03/22 Minor event at day 22 of month 03/2010 Major event
************************ Begin file "metzsort.txt"
********************** # Dependencies: gawk version 3.1.2 or later from
gnuwin32. (Currently 3.1.6) #
#
http://gnuwin32.sourceforge.net/packages/gawk.htm #
# Syntax: gawk -f metzsort.txt ascending input.txt > output.txt #
gawk -f metzsort.txt descending input.txt > output.txt #
^ ^ ^ # |
| | # ARGV[0] ARGV[1] ARGV[2] #
# gawk -f metzsort.txt input.txt > output.txt
<--- will be ascending # ^ ^
# | | # ARGV[0]
ARGV[1] #
# (Sorts based on date and text. Input sample comes next) #
# 9/1975 Broken right hand <--- Format 1 #
6/22/1965 Broken left Collarbone <--- Format 2 # 3/1966
Broken jaw
# 1965 Broken Collarbone <--- Format 3 #
03/2010 Need to keep "year" in a history buffer... # 03/22 As
the next entry to it assumes the same year <--- Format 4 # 3/22/12
percutaneous <--- Format 5 #
# Need to convert the date field into year_month_day, and append it to
the left # of the user input line.
#
# <- date -> <--------------- line ----------------> # 1975_09_00 Broken
right hand
# 1965_06_22 Broken left Collarbone
# 1966_03_00 Broken jaw
#
# Then, sort the date array, use the indices to print out the result #
BEGIN { # this clause runs, before the program eats any data...
count = 1
descending = 0
if (ARGV[1] == "descending") {
delete ARGV[1]
descending = 1
}
if (ARGV[1] == "ascending") {
delete ARGV[1]
}
# Note: Your input file name cannot be "ascending" or "descending" !!!
}
{
numfields = split( $1, dateparts, "/" ) switch (numfields) {
case 1: # this is a Format 3
tempdate = sprintf("%04d_00_00", dateparts[1]+0) break
case 2:
if ( dateparts[2] < 1000 ) { # this is a Format 4
tempdate = sprintf("%04d_%02d_%02d", oldyear, dateparts[1]+0,
dateparts[2]+0)
} else { # this is a Format 1
tempdate = sprintf("%04d_%02d_00", dateparts[2]+0,
dateparts[1]+0) oldyear = dateparts[2]+0 # for Format 4
}
break
case 3: # this is a Format 2
# Add fixups for format 5, a short year field. Not for
centenarians... Crappy code follows. if (dateparts[3] < 20) { add
= 2000 } # 2000 up to 2020 if (dateparts[3] > 50) { add = 1900 }
# 1950 to 1999 if (dateparts[3] > 1000) { add = 0 } # Proper
four digit date ? tempdate = sprintf("%04d_%02d_%02d",
dateparts[3]+add, dateparts[1]+0, dateparts[2]+0) oldyear =
dateparts[3]+add # for Format 4 break
default:
print "Unexpected date field at line " NR > "/dev/stderr" exit
}
for (j=2; j<=NF; j++) { # 1975_09_00Brokenrighthand
tempdate = tempdate $j
}
if ( tempdate in date ) {
print "Warn: Identical line detected at line " NR > "/dev/stderr"
count--
}
# print tempdate # Debug: Uncomment this line, to check the thing
we're sorting on... date[ tempdate ] = $0 # Associative array holds
the original user lines count++
}
END {
# the count variable, is now "one past the end"
# This built-in function, sorts by the index field, which is
"tempdate" asorti(date,datesort)
if ( descending == 0 ) {
for( j=1; j<count; j++) { # then let's print ascending
print date[ datesort[ j ] ]
}
} else { # descending is 1
for( j=count-1; j>=1; j-- ) { # then let's print descending
print date[ datesort[ j ]]
}
}
}
************************ End file "metzsort.txt" **********************
Output at pastebin, in descending order. (As per Seth's example)
http://pastebin.com/nTx3zwUP
Paul