Option 1
cat file1name file2name | sort | uniq > outputfilename
this will sort ascending
to sort descending add -r option to sort:
cat file1name file2name | sort -r | uniq > outputfilename
Option 2:
sort file1name file2name | uniq -u > diffLines
How to find duplicates in two lists?
sort file1name file2name | uniq -d > duplicates
How to compare two files line by line?
comm -1 -2 <(sort first.txt) <(sort second.txt)
Count the number of lines in a file and get only the number?
wc -l myfile.txt | cut -d' ' -f1
or:
count=`wc -l myfile.txt | cut -d' ' -f1`
String manipulation, matching, etc.
sed substitute command: match a regular expression and replace it by something else
sed s/"<www"/"<http:\/\/www"/ mappingbased_properties_en.nt >mappingbased_properties_en_fixed.nt
more
String matching in Perl
changing "www" in the "mappingbased_properties_en.nt" file to "http://www" in the "mappingbased_properties_en_fixed.nt" file:
Replace all slashes by commas in a file
sed -e 's/\//,/g' results.csv > results-all.csv
How to split file into several with the fixed length?
split -l 1000 file.nt
will split files into separate ones of lenth 1000
How to add a character (or a string) at the end of each line of a file?
add ">" at the end of each line ($ is a regex for end of the line):
sed 's/$/>/' myfile
This will not modify the file. To modify the file add option -i:
sed -i 's/$/>/' myfile
OR
save it to another file:
sed 's/$/>/' myfile > anotherfile
How to add a character (or a string) at the beginning of each line of a file?
add "<" at the beginning of each line (^ is a regex for end of the line):
sed 's/^/</' myfile
This will not modify the file. To modify the file add option -i:
sed -i 's/^/</' myfile
OR
save it to another file:
sed 's/^/</' myfile > anotherfile
How to rename a set of files or add an additional extension?
for i in *.*; do mv "$i" "$i.n3"; done
How to add a fixed header at the top of each file in your dir?
merge content of file1 with all other files in your dir (e.g. add a fixed header with file1 content at the top of each file in your dir):
for i in *.n3; do cat file1 "$i" > "$i.withhead.n3"; done
Convert 7-bit ASCII representations to UTF-8 Unicode
for i in *.n3; do ascii2uni "$i" > "$i.utf8.n3"; done
Convert middle quotes in the four double quotes with single?
e.g. if you have a string with ...."..."..."..".....
and need to convert it to:
...."...'...'..".....
do this:
for i in *.n3; do sed 's/\([\"].*\)[\"]\(.*\)[\"]\(.*[\"]\)/\1'\2'\3/g' "$i" > "$i.quotesfixed.n3"; done
Escape backslashes
for i in *.n3; do sed 's/\(.*\)[\\]\(.*\)/\1 \2/g' "$i" > "$i.bfixed.n3"; done
Convert middle quote in the three double quotes with single?
e.g. if you have a string with ...."..."..".....
and need to convert it to:
....".....".....
i.e. remove the middle one
for i in *.n3; do sed 's/\([\"].*\)[\"]\(.*[\"]\)/\1 \2/g' "$i" > "$i.3qfixed.n3"; done
How many lines are there in each file?
count numer of lines of the files with extension .txt in your specified dir:
find /thepathtothedir -maxdepth 1 -name "*.txt" -print0 | xargs -0 -n 1 wc -l
to put that list into a file use:
find . -maxdepth 1 -name "*.txt" -print0 | xargs -0 -n 1 wc -l > lines.txt
Count the files with 0 lines:
grep "^0 " lines.txt | wc -l
or with 10 lines:
grep "^10" lines.txt | wc -l
Sed tutorial
http://www.tutorialspoint.com/unix/unix-regular-expressions.htm
Awk tricks
Input file: expected.constraints in a form:
name Person:0.9 Organisation:0.1
awk '{print $1,$2}' expected.constraints|tr ':' ' '|sort -k 3 -rn|column -t|less
Output:
How to Read CSV or convert it to TSV using bash on Mac OS X?
It is very simple to read a CSV file using AWK such as:
awk -F "\"*,\"*" '{print $1 "\t" $2 "\t" $3}' test.csv
where your test.csv file looks like this:
first,second,third
1,2,3
4,5,6
7,8,9
If instead as input you get only the first line:
first,second,third
it is very likely that you need to fix your line endings as they might be CR.
To check you line endings do:
file test.csv
If what you get is:
test.csv: ASCII text, with CR line terminators
you will need to remove CR line terminators.
You can use dos2unix package for that. If you don't have it installed, you can get it using brew:
sudo brew install dos2unix
Finally, you can remove them as following:
mac2unix roles.csv
mac2unix: converting file roles.csv to Unix format...
Now run your awk script again, and it will work.
---
awk -F'\t' 'BEGIN{OFS="\t"}{print 0,0,"somelabel",$2,$3}' input.tsv > output.tsv
--
awk -F'\t' '{print $3}' input.tsv |cut -d' ' -f 3-
Assume you have the format
0 label some text with words
The command above with print 'some text with words'.
How to remove trailing and leading spaces from a string in awk?
Use tr -d '[:blank:]'
for example, in a file that reads
TOPIC#: CARS
the below command will extract CARS without any spaces:
topicid=`grep "TOPIC#:" "$i" | awk -F':' '{print $2}' | tr -d '[:blank:]' `