Not sure if this is the right place to post this, so mods feel free to move it if there is a more appropriate home for this thread...
I will start off by saying that I have zero programming know-how. As Neytiri would say, I'm like a baby, making noise, don't know what to do.
Avatar You're Like a Baby - YouTube
Anyway, I have several large text files containing twitter data. Here is a small excerpt of the data to illustrate the format it is in:
Code:
epaulnet: Nanaimo Daily News: #Qaddafi backed efforts against al-#Qaeda: DND reports http://t.co/iTVFkqh #Libya
04/23/2011 retweet
omar_chaaban: http://bit.ly/gBFpNo Protesters arrested at Libyan Embassy #Libya #Canada
04/05/2011 retweet
kyleharrietha: The #ICC has evidence Gadhafi's gov't planned to put down protests by killing civilians - http://bit.ly/gIQCZk #Libya
04/05/2011 retweet
epaulnet: The Canadian Press: 16 #journalists, including one Canadian, are missing or detained in #Libya http://t.co/KtVIlTg #Canada #CPJ #NATO #War
04/26/2011 retweet
jdalrymple: Apparently the Canadian air force is now doing flyovers in Libya. Canada has a plane?
04/07/2011 5 retweet
nedhamson: Canada mulls call for more jets for Libya campaign http://ff.im/-BaJjf
04/14/2011 retweet
vancouversun: Canada must wait to hear costs of Libya war http://bit.ly/hydYMu
04/07/2011 3 retweet
gpollowitz: Trump in a nutshell: America sucks now, Bush is the worst president ever, Obama is the worst president ever, invade Libya, Canada's HC 4 all
04/18/2011 retweet
vancouversun: Canada mulls call for more jets for Libya campaign http://bit.ly/hUsJGu
04/14/2011 retweet
weddady: Kaddafi's pr strategy..to give speeches at 3AM.. was he trying to compete for ratings... in the US & Canada? #Libya
04/29/2011 10 retweet
newsmanly: Canada should not be in Libya - Windsor Star http://dlvr.it/P5d70
04/20/2011 retweet
I'd like to convert this data into csv format so that I can import it into SPSS (or Excel if need be) - but I have no idea how to do that. In total, I have around 70,000-ish records, hence the need for an automated process. In the end, the SPSS data fields I would like to end up with are:
Author - defined as everything preceding the first colon on the first line of each record.
Content - defined as everything following the first colon on the first line of each record.
Date - defined as the XX/XX/XXXX content on the second line of each record.
Retweets - defined as the number following date but before the word "retweet" (NB: when 0, this number is absent from the record).
Links - defined as any and all urls appearing in the "Content" field
Link# - the number of urls appearing in the "Content" field
Can it be done? Can anybody help point me in the right direction? Any guidance would be much appreciated! If I can offer anything in exchange for your help, please let me know. Thanks!
edit - Source txt data is encoded in UTF-8, by the way.