View Single Post
  #13 (permalink)  
Old September 14, 2011, 08:40 PM
Keywork's Avatar
Keywork Keywork is offline
Allstar
 
Join Date: Jun 2009
Location: Niagara
Posts: 604

My System Specs

Default

Code:
#! /usr/bin/perl -w

##########
#
# KeyworK
#
# Script for DT dtscript.pl
#
##########

use strict;
use warnings;

my $file = shift;
open (INPUT, "<$file");

## FORMAT
## Author,Content,Date,Retweets,Links,Link#

# Input
# epaulnet: Nanaimo Daily News: #Qaddafi backed efforts against al-#Qaeda: DND reports‎ http://t.co/iTVFkqh #Libya 
# 04/23/2011 retweet

# Output
# epaulnet,Nanaimo Daily News: #Qaddafi backed efforts against al-#Qaeda: DND reports? http://t.co/iTVFkqh #Libya ,04/23/2011,0,http://t.co/iTVFkqh,1

my ($author, $content, $date, $retweets, @links, $length);
open (OUTPUT , ">output");
print OUTPUT "Author,Content,Date,Retweets,Links,Link# \n";
while (<INPUT>){
    chomp;
    if (/.*:.*/){ #Parses the main line
        #print "$_ \n";
        $_ =~ m/(.*?): (.*)/;
        ($author,$content) = ($1,$2);
        my @lines = split / /,$content;
        @links = grep (/http/,@lines);
        $length = scalar(@links);
    } elsif (/\d\d\/\d\d\/\d\d\d\d .*/){ #Parses the secondary line
        #print "$_ \n";
        my @line = split / /,$_;
        if ($line[1] eq "retweet"){ $line[1] = 0 }
        ($date, $retweets) = @line;
        print OUTPUT "$author,$content,$date,$retweets,@links,$length \n";
    } 
}
I need to go to bed, but I believe there is a bug if there exists a quotation mark in the content field, but I may be wrong. If you need any help running it, let me know!

Note: It takes the input file as an argument. ie: dtscript.pl input.txt
Reply With Quote