October 2006 - Posts

nothing fancy here, we're just doing a regex replace on \r and replacing it with nothing. this is a different way of doing the same thing dennis already posted here. =P

using System.IO;
using System.Text.RegularExpressions;

namespace StripCR
{
    class Program
    {
        static void Main(string[] args)
        {
            nocr(args[0]);
        }
        public static void nocr(string f)
        {
            string t;
            using (StreamReader r = new StreamReader(f))
            t = Regex.Replace(r.ReadToEnd(), "\r", "");
            using (StreamWriter w = new StreamWriter(f))
            w.Write(t);
        }
    }
}




and because i know you want to know, here's the ruby code too. =P
note: i haven't tested this, but it should at the very least put you in the right direction haha.

file = ARGV[0]
fc = IO.read(file).gsub(/\r/,"")
File.open(file,'w'){|f|
f.print(fc)
}

This is essentially the same thing, just with ruby code. Again, note that you can change that URL regular expression to match what you want to find in the email. i also split the file in half after the header and only searched the header for from/to/subject and the body for urls in this one. a slightly different approach. The point of this is really just to show how simple it is to setup the logic and where you can easily configure your regular expressions to tweak your results. As with most ruby code I write, there is probably a way to do this entire thing in one to three lines. i don't know that way, but there probably is a way. =P Also posted here.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Email

def initialize(p)
@name = p
fc = IO.read(p)
hdr = fc.split(/^\s*$/)[0]
body = fc.sub(".{" + hdr.size.to_s + ",}",'')
@from = /^From: (.+$)/.match(hdr).to_a[0]
@to = /^To: (.+$)/.match(hdr).to_a[0]
@subject = /^Subject: (.+$)/.match(hdr).to_a[0]
@urls = /https?:\/\/.{1,}[\/]/.match(body).to_a.join(' ')
end

def Show
puts @name + "\n\t" +
@to + "\n\t" +
@from + "\n\t" +
@subject + "\n\t" +
@urls
end
end

Dir['*.eml'].each{|p|
e = Email.new(p)
e.Show
}

here's a short code snippet showing how to parse the from/to/subject from an eml file as well as any urls located in the message. the regex for urls isn't perfect. there are a million ways to do url regex, so pick your poison from the web. this is just for example. reposted from here.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
using System;
using System.IO;
using System.Text.RegularExpressions;

namespace parse.eml
{
class Email
{
string _path,_to,_from,_subject,_urls;

public Email(string path)
{
_path = path;
string fc = new StreamReader(path).ReadToEnd();
_from = Regex.Matches(fc, "From: (.+)")[0].ToString();
_to = Regex.Matches(fc, "To: (.+)")[0].ToString();
_subject = Regex.Matches(fc, "Subject: (.+)")[0].ToString();
_urls = string.Empty;
foreach (Match m in Regex.Matches(fc,@"https?://([a-zA-Z\.]+)/"))
{
_urls += m.ToString() + ' ';
}
}

public void show()
{
Console.WriteLine(
"{0}\n\t{1}\n\t{2}\n\t{3}\n\t{4}",
_path, _to, _from, _subject, _urls);
}

}

class Program
{
static void Main(string[] args)
{
foreach (string f in Directory.GetFiles(".", "*.eml"))
{
Email e = new Email(f);
e.show();
}
}
}
}