sample collection
Posted: Tue Dec 22, 2009 11:31 pm
Now at first this is probably going to seem like, well... The kind of thread that will probably locked after a couple of answers and thrown out with cries of RTFM, GOOGLE IT!!!, etc...
I am not quite sure why, but it seams to have that texture as it's forming in my head.
It's also possibly the fact that this is so OT to the general topic of this forum, on top of it being more of an GIVE ME CODE/HELP! question than a can you help me with this implementation question...
BUT I'm giving you a sour taste in the mouth before you even read this, but everything will soon become clear.
My problem is simply that I need a very large database of music to be able to conduct my experiment.
The first obstacle is to write a bot that can find pages with a certain number of views, certain tags, a certain number of people who have rated the music etc... and of course recover this data.
This I will deal with myself.
The only problem I have now is to find the easiest way to then be able to recover the video and strip out the audio for my use.
I wasn't able to find any opensource software that does this or any documentation.
So I guess my question boils down to, does anybody have any clues, information or anything on how I could write something to retrieve videos from youtube (I am aware that there is already software out there that does this, but I want something which will allow me to do all my processing before I write it to disc as well as making it as efficient as possible as I need to process massive amounts of data, so I want to make it as quick as possible, so I don't want to be calling some software that'll write the file to disk then have to load of the disk again and do my processing...).
Now comes the BIG caveat which will have all your heads noding, it's not that I'm not particularly interested, but I am pretty single minded about this project and I don't want to have to really learn much about what I'm doing on this relatively theoretically irrelevant (though critical) to what I'm doing, of course it is something that in time I'd be interested in looking at in more depth, but right now... not really...
I'm sure that many of you could understand this as there are many more important more interestin things that I need to learn for this project.
Thanks in advance,
Jules
P.S. I think it's clear, but I have almost no experience in any form of use of networking in the context of coding is almost equal to zero...
P.P.S Does anybody know whether I would be excuse from any licensing/copyright issues as it is use of data for academic purpose?
I am not quite sure why, but it seams to have that texture as it's forming in my head.
It's also possibly the fact that this is so OT to the general topic of this forum, on top of it being more of an GIVE ME CODE/HELP! question than a can you help me with this implementation question...
BUT I'm giving you a sour taste in the mouth before you even read this, but everything will soon become clear.
My problem is simply that I need a very large database of music to be able to conduct my experiment.
The first obstacle is to write a bot that can find pages with a certain number of views, certain tags, a certain number of people who have rated the music etc... and of course recover this data.
This I will deal with myself.
The only problem I have now is to find the easiest way to then be able to recover the video and strip out the audio for my use.
I wasn't able to find any opensource software that does this or any documentation.
So I guess my question boils down to, does anybody have any clues, information or anything on how I could write something to retrieve videos from youtube (I am aware that there is already software out there that does this, but I want something which will allow me to do all my processing before I write it to disc as well as making it as efficient as possible as I need to process massive amounts of data, so I want to make it as quick as possible, so I don't want to be calling some software that'll write the file to disk then have to load of the disk again and do my processing...).
Now comes the BIG caveat which will have all your heads noding, it's not that I'm not particularly interested, but I am pretty single minded about this project and I don't want to have to really learn much about what I'm doing on this relatively theoretically irrelevant (though critical) to what I'm doing, of course it is something that in time I'd be interested in looking at in more depth, but right now... not really...
I'm sure that many of you could understand this as there are many more important more interestin things that I need to learn for this project.
Thanks in advance,
Jules
P.S. I think it's clear, but I have almost no experience in any form of use of networking in the context of coding is almost equal to zero...
P.P.S Does anybody know whether I would be excuse from any licensing/copyright issues as it is use of data for academic purpose?