Go Back   Hardware Canucks > SOFTWARE > O/S's, Drivers & General Software

    
Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old September 13, 2011, 08:13 AM
Dead Things's Avatar
Hall Of Fame
F@H
 
Join Date: Oct 2008
Location: Centre of the Universe
Posts: 1,572

My System Specs

Default Help! Need some programming expertise to convert txt to csv

Not sure if this is the right place to post this, so mods feel free to move it if there is a more appropriate home for this thread...

I will start off by saying that I have zero programming know-how. As Neytiri would say, I'm like a baby, making noise, don't know what to do.

Avatar You're Like a Baby - YouTube

Anyway, I have several large text files containing twitter data. Here is a small excerpt of the data to illustrate the format it is in:

Code:
epaulnet: Nanaimo Daily News: #Qaddafi backed efforts against al-#Qaeda: DND reports‎ http://t.co/iTVFkqh #Libya 
04/23/2011 retweet

omar_chaaban: http://bit.ly/gBFpNo Protesters arrested at Libyan Embassy #Libya #Canada
04/05/2011 retweet

kyleharrietha: The #ICC has evidence Gadhafi's gov't planned to put down protests by killing civilians - http://bit.ly/gIQCZk #Libya
04/05/2011 retweet

epaulnet: The Canadian Press: 16 #journalists, including one Canadian, are missing or detained in #Libya http://t.co/KtVIlTg #Canada #CPJ #NATO #War
04/26/2011 retweet

jdalrymple: Apparently the Canadian air force is now doing flyovers in Libya. Canada has a plane?
04/07/2011 5 retweet

nedhamson: Canada mulls call for more jets for Libya campaign http://ff.im/-BaJjf
04/14/2011 retweet

vancouversun: Canada must wait to hear costs of Libya war http://bit.ly/hydYMu
04/07/2011 3 retweet

gpollowitz: Trump in a nutshell: America sucks now, Bush is the worst president ever, Obama is the worst president ever, invade Libya, Canada's HC 4 all
04/18/2011 retweet

vancouversun: Canada mulls call for more jets for Libya campaign http://bit.ly/hUsJGu
04/14/2011 retweet

weddady: Kaddafi's pr strategy..to give speeches at 3AM.. was he trying to compete for ratings... in the US & Canada? #Libya
04/29/2011 10 retweet

newsmanly: Canada should not be in Libya - Windsor Star http://dlvr.it/P5d70
04/20/2011 retweet
I'd like to convert this data into csv format so that I can import it into SPSS (or Excel if need be) - but I have no idea how to do that. In total, I have around 70,000-ish records, hence the need for an automated process. In the end, the SPSS data fields I would like to end up with are:

Author - defined as everything preceding the first colon on the first line of each record.
Content - defined as everything following the first colon on the first line of each record.
Date - defined as the XX/XX/XXXX content on the second line of each record.
Retweets - defined as the number following date but before the word "retweet" (NB: when 0, this number is absent from the record).
Links - defined as any and all urls appearing in the "Content" field
Link# - the number of urls appearing in the "Content" field

Can it be done? Can anybody help point me in the right direction? Any guidance would be much appreciated! If I can offer anything in exchange for your help, please let me know. Thanks!

edit - Source txt data is encoded in UTF-8, by the way.
__________________
Follow my folding, mining & benching shenanigans @dt_oc!

Think you can overclock? Then show us what you got!
Join the Hardware Canucks Overclocking team today!

Last edited by Dead Things; September 13, 2011 at 08:23 AM.
Reply With Quote
  #2 (permalink)  
Old September 13, 2011, 01:12 PM
Arinoth's Avatar
Moderator
F@H
 
Join Date: May 2009
Location: Halifax
Posts: 8,579

My System Specs

Default

It possibly could be done, hell I could probably figure something out to do this if i had the spare time to do it.

I'll try to look into it for you, however I have a lot on the go right now so I may not be able to guarantee you anything.
Reply With Quote
  #3 (permalink)  
Old September 13, 2011, 01:34 PM
Hall Of Fame
F@H
 
Join Date: Feb 2010
Location: Markham
Posts: 1,566

My System Specs

Default

Have you tried simply opening adding comma separators via script and opening the file in Excel? It might be a lot easier than you think since the format is so consistent.
__________________
Reply With Quote
  #4 (permalink)  
Old September 13, 2011, 02:26 PM
Dead Things's Avatar
Hall Of Fame
F@H
 
Join Date: Oct 2008
Location: Centre of the Universe
Posts: 1,572

My System Specs

Default

Quote:
Originally Posted by ilya View Post
Have you tried simply opening adding comma separators via script and opening the file in Excel? It might be a lot easier than you think since the format is so consistent.
With my skillset, the only way it could be easy for me is if each field was a fixed width so that I could sub in delimiters at specific intervals. But since both author and content are variable-width fields and since the retweets field occurs only occasionally, I would have to use some sort of programming logic to tell it where to look for the appropriate places to insert delimiters. And that is well beyond my capabilities, I'm afraid. I tried reading "Sed - An Introductory Tutorial" last night and was immediately overwhelmed by the urge to punch babies.
__________________
Follow my folding, mining & benching shenanigans @dt_oc!

Think you can overclock? Then show us what you got!
Join the Hardware Canucks Overclocking team today!
Reply With Quote
  #5 (permalink)  
Old September 13, 2011, 04:01 PM
Keywork's Avatar
Allstar
 
Join Date: Jun 2009
Location: Niagara
Posts: 604

My System Specs

Default

This can be done in Perl rather quickly and efficiently. Can you post up an example of the result you want? (like actual data so if anyone helps out here, they know it's actually correct). I need to finish up some things for school but I can probably mock up a quick script today or tomorrow if you're interested!
Reply With Quote
  #6 (permalink)  
Old September 14, 2011, 02:58 PM
Dead Things's Avatar
Hall Of Fame
F@H
 
Join Date: Oct 2008
Location: Centre of the Universe
Posts: 1,572

My System Specs

Default

Very cool Keywork! And yes, very interested! Thanks! I've attached an example of the output I'd like using the same data samples as posted above for reference.
Attached Files
File Type: rar Example-Output.rar (911 Bytes, 13 views)
__________________
Follow my folding, mining & benching shenanigans @dt_oc!

Think you can overclock? Then show us what you got!
Join the Hardware Canucks Overclocking team today!
Reply With Quote
  #7 (permalink)  
Old September 14, 2011, 03:24 PM
grinder's Avatar
Allstar
F@H
 
Join Date: Mar 2007
Posts: 821

My System Specs

Default

MS Access (Microsoft Office Pro) would handle this for ya too.
__________________
Phenom II 945 :: ASUS M4A78-E (780G) :: BFG 285GTX :: 4GB Mushkin DDR2 (5-4-4-12) :: Creative Xi-Fi :: Seagate 500 gig 7200.12 (better than WD BLACK!!!!!) :: Samsung 2493HM
Reply With Quote
  #8 (permalink)  
Old September 14, 2011, 05:18 PM
Arinoth's Avatar
Moderator
F@H
 
Join Date: May 2009
Location: Halifax
Posts: 8,579

My System Specs

Default

Keywork, lemme know if you're doing it or not, otherwise I'll throw my hat into the ring and try to work out a little program in my native C++ language
Reply With Quote
  #9 (permalink)  
Old September 14, 2011, 06:04 PM
Keywork's Avatar
Allstar
 
Join Date: Jun 2009
Location: Niagara
Posts: 604

My System Specs

Default

Bleh C++. I'm finishing up an article right now and then i'll spend some time on it! Shouldn't take long! I'll post back here in a few.
Reply With Quote
  #10 (permalink)  
Old September 14, 2011, 06:06 PM
Arinoth's Avatar
Moderator
F@H
 
Join Date: May 2009
Location: Halifax
Posts: 8,579

My System Specs

Default

Quote:
Originally Posted by Keywork View Post
Bleh C++. I'm finishing up an article right now and then i'll spend some time on it! Shouldn't take long! I'll post back here in a few.
C, C++ and C#, a real man's language, well I'm going to attempt to ever program in assembly ever again
Reply With Quote
Reply


Thread Tools
Display Modes

Similar Threads
Thread Thread Starter Forum Replies Last Post
Need some expertise here =P feerof Water Cooling 6 April 7, 2010 08:17 AM
Building a Intel i7/X58 System.. need your expertise. exxoid New Builds 18 May 16, 2009 10:55 PM
Programming Phobia O/S's, Drivers & General Software 9 November 16, 2008 03:54 PM
programming how to's? klaiboi Guides & How-to's 7 June 22, 2008 07:23 PM
Need your expertise! More power? misterlarry Video Cards 10 April 3, 2008 08:25 PM