randyhersom Posted August 2, 2023 Report Posted August 2, 2023 On 3/22/2023 at 2:07 AM, hopkins said: The discogs database is publicly available but it is HUGE, difficult to use, and as we both pointed out, of low quality for jazz discography. The system you describe is close to what I had in mind. I have given a lot of thought to all this for quite some time (years), and investigated a lot of different solutions out there (discogs being one, but also musicbrainz, and others mentioned above). There are many challenges to all this, but breaking down a complex problem into distinct smaller "components" we can make things more maneagble. I will take some time tonight to elaborate on all this and explain what I have in mind. Here you go: https://discogs-data-dumps.s3.us-west-2.amazonaws.com/index.html I'll answer some of this in detail as well. I saw this, and can vouch that this data is far from easy to use. But I do this stuff for a living, and I succeeded. I now have fields I selected imported into Microsoft SQL Server where I can flexibly query to see what Discogs has. I left out some of the more verbose parts like YouTube video links. This is what I wanted decades ago when I bought All-Music on a CD and it was not what I hoped. After loading the 90+GB from the 4 humongous XML files above by creating my own C# data mining program, the SQL database was about 20 gb. Adding some indexes to make queries faster took it up to 28GB. If anybody wants to throw some challenges out there I can try to see if the results are small enough to share here. Hopkins example would be asked as follows: select * from release where ismaster=1 and (select count(*) from contributor where contributor.releaseid=release.releaseid and name='Clifford Brown')>0 and (select count(*) from contributor where contributor.releaseid=release.releaseid and name='Harold Land')>0 There are dozens if not hundreds of ways to get the same answer. Limiting by ismaster is recommended - There are 500 releases for Kind of Blue, but only 2 are marked as masters and even that's one too many. The query has been running three minutes now and hasn't finished. There is more I can do to try to make that faster, at the expense of disk space. Quote
randyhersom Posted August 2, 2023 Report Posted August 2, 2023 It took 28 minutes some of the numbers are internal identifiers used to link things together: Ken Burns Jazz (The Story Of America's Music),954861,194,1866,C5K 61432,Jazz,Bop,US,2000-11-14,,617893,1 Study In Brown,1146247,2664380,19900,MG 36037,Jazz,Bop,US,1955,,183534,1 Jazz Of Two Decades,1474375,194,19900,DEM-2,Jazz,Bop,US,1955-09-00,,277187,1 The Quintet Vol. 2,1794257,259082,39357,EMS-2-407,Jazz,Bop,US,1977,,727883,1 Brownie: The Complete EmArcy Recordings Of Clifford Brown,2072432,259082,19900,838 306-2,Jazz,Bop,Europe,1989,,2048590,1 Jam Session,2318629,194,19900,EP-1-6086,Jazz,Bop,US,1954,,434341,1 Brown And Roach Incorporated,2370629,2664380,19900,MG 36008,Jazz,Bop,US,1955,,269223,1 The Quintet Vol. 1,2727158,259082,39357,EMS-2-403,Jazz,Hard Bop,US,1976,,336830,1 Daahoud,2781419,2664380,37319,MRL 386,Jazz,,US,1972,,395758,1 Jam Session,2802630,194,19900,MG 36002,Jazz,,US,1954,,344735,1 Study In Brown Vol.2,3102156,2664380,19900,EP-1-6505,Jazz,Bop,Sweden,1957,,2302414,1 Study In Brown Vol.3,3155401,2664380,19900,EP-1-6506,Jazz,Bop,Sweden,1957,,1568276,1 Jams 2,3330867,529735,19900,195J-2,Jazz,Bop,Japan,1983,,902264,1 Four Classic Albums,3446010,259082,170769,AMSC 950,Jazz,Bop,Europe,2008,,1592310,1 Brownie Speaks 1953 ??? 1954,3595734,259082,36233,LPJT 59,Jazz,Bop,Italy,1986,,1795112,1 Dinah Jams,3641453,33587,19900,MG 36000,Jazz,Bop,US,1955-02-00,,243750,1 Remember Clifford,3931721,259082,39357,20022 MCL,Jazz,Hard Bop,UK,1964,,530995,1 Study In Brown Vol.1,3978053,2664380,19900,EP-1-6504,Jazz,Bop,Sweden,1957,,915036,1 Remember Clifford,4200276,259082,39357,MG 20827,Jazz,Bop,US,1963,,274999,1 The Best Of Max Roach And Clifford Brown In Concert!,4229289,2664380,95529,GNP-18,Jazz,Hard Bop,US,1956,,299566,1 More Study In Brown,4481155,259082,19900,195J-1,Jazz,Bop,Japan,1983,,628691,1 Eight Classic Albums,4664123,229498,419015,RGJCD302,Jazz,Hard Bop,Europe,2013,,2487538,1 Clifford Brown And Max Roach,4869491,2664380,19900,MG26043,Jazz,Bop,US,1954-12-00,,302135,1 Alone Together: The Best Of The Mercury Years,5283069,2664380,5041,526 373-2,Jazz,Hard Bop,Europe,1995,,1123293,1 Move,5410967,217242,19900,EP-1-6087,Jazz,Bop,Denmark,1954,,662601,1 Prestige Twofer Giants Volume II,5422179,194,19591,PRP-2,Jazz,Hard Bop,US,1972,,684477,1 In Concert,6089481,319800,229218,Vol. No. 7,Jazz,Hard Bop,US,1955,,630214,1 Classic Jazz: The Fifties,6151539,194,43337,R612-04 314560537-2,Jazz,,US,2001,,1810706,1 Delilah / Parisian Thoroughfare,6221169,2664380,19900,EP-1-6074,Jazz,Bop,US,1955,,913545,1 Cherokee 1954-1955,6490251,259082,36233,LPJT 74,Jazz,Bop,Italy,1987,,780136,1 Carl's Blues,6935780,2303149,33660,S7574,Jazz,Cool Jazz,US,1961,,498059,1 West Coast Jazz,7076364,194,820874,600167,Jazz,Bop,Europe,2014,,1624849,1 Dinah!,7201043,33587,205,FJL 125,Jazz,Vocal,UK,1966,,856485,1 In Concert -Complete Version-,7372081,2664380,22645,K18P 6300,Jazz,Hard Bop,Japan,1984,,2815733,1 Brownie Lives!,7641262,2664380,43223,FSCD-1012,Jazz,Hard Bop,Switzerland,1991,,903946,1 Sweet Clifford,9746057,259082,204788,RJ 41122,Jazz,Hard Bop,Germany,1959,,1126604,1 The Immortal Clifford Brown,10432685,259082,26611,LM 2-8201,Jazz,Hard Bop,US,1965,,510192,1 Clifford Brown,10667671,259082,19900,842 933-2,Jazz,,,1990,,1293731,1 Dinah Washington Sings Standards - Jazz Masters 40,10671354,33587,5041,314 522 055-2,Jazz,,US,1994-09-21,,1221614,1 Either Side Of Midnight - 30 Cool Jazz Classics,14006009,194,78710,CPCD 8058-2,Jazz,Cool Jazz,Germany,1994,,1639620,1 D????houd,15424029,2664380,19900,EP-1-6075,Jazz,Hard Bop,US,1956,,1012124,1 The Definitive Clifford Brown,15900013,259082,5041,314 589 845-2,Jazz,Bop,US,2002,,1803640,1 The Best Of Max Roach And Clifford Brown In Concert (Vol. I),19345324,2664380,95529,ELP-818,Jazz,Hard Bop,Denmark,1956,,2195299,1 Verve Jazz Masters 44,20723839,2664380,5041,528 109-2,Jazz,Bop,Europe,1995,,995078,1 Anything Goes: The Cole Porter Songbook - Instrumentals,20912392,194,5041,517 168-2,Jazz,Hard Bop,Europe,,,1493781,1 The Complete Cole Porter Songbooks,26116595,264026,5041,314 519 828-2,Jazz,Hard Bop,Canada,1992,,2981396,1 Quote
hopkins Posted August 3, 2023 Author Report Posted August 3, 2023 On 8/2/2023 at 3:41 AM, randyhersom said: I saw this, and can vouch that this data is far from easy to use. But I do this stuff for a living, and I succeeded. I now have fields I selected imported into Microsoft SQL Server where I can flexibly query to see what Discogs has. I left out some of the more verbose parts like YouTube video links. This is what I wanted decades ago when I bought All-Music on a CD and it was not what I hoped. After loading the 90+GB from the 4 humongous XML files above by creating my own C# data mining program, the SQL database was about 20 gb. Adding some indexes to make queries faster took it up to 28GB. If anybody wants to throw some challenges out there I can try to see if the results are small enough to share here. Hopkins example would be asked as follows: select * from release where ismaster=1 and (select count(*) from contributor where contributor.releaseid=release.releaseid and name='Clifford Brown')>0 and (select count(*) from contributor where contributor.releaseid=release.releaseid and name='Harold Land')>0 There are dozens if not hundreds of ways to get the same answer. Limiting by ismaster is recommended - There are 500 releases for Kind of Blue, but only 2 are marked as masters and even that's one too many. The query has been running three minutes now and hasn't finished. There is more I can do to try to make that faster, at the expense of disk space. How do you plan on using the Discogs data ? Quote
hopkins Posted August 3, 2023 Author Report Posted August 3, 2023 (edited) Here is a short video of my app illustrating the content I put in each album, and how I can search and play them. Works well for me, and I am very happy with the results, but I spend quite a lot of time over the past 5 years to enter the content (and develop the app). Adding a new ablum involves the following steps: - rip the CD - hopefully the song titles are automatically added, but sometimes I have to enter them (in my "tagging" program, you can also enter a Discogs release ID if the data is there). I always clean up the tags a little, but I do not really use them - In the folder which contains my files, I put the nicest album cover I can find, and I add one text file (always named "album.md") which is used to simply identify the fact that this folder (and its sub-folders) correspond to an album... - I then run a program that scans my folders and updates my database with basic information read from the tracks (title, duration) and assigns a unique identifier for the album, which is also kept in a file in the folder. If I delete albums, those are removed from my database, and if I move folders around as well. The scan take about 20 seconds. I have something like 50.000 tracks on my computer. - then I refresh the web page of my app, and the album is displayed. I then edit it (text editor in the web page, or I can also do this editing the "album.md" file in the album's folder directly, with any text editor. - when I edit the album for the first time, the page just contains the list of tracks. I then add recording dates and credits for each session, reorganizing the tracks if needed. I then optionally add some comments and links, and save. A couple weeks ago I received the last Mosaic Box set - Complete Sonny Clark Blue Note Sessions It took a while to rip the CDs, but you can multitask. The CDs were not referenced in any online database (ex: AccurateRip) and I had to add the titles, but did that automatically in MP3Tag (the music tagger I use). Here is the Discogs page (thanks to the Discogs contributors): https://www.discogs.com/release/27675135-Sonny-Clark-The-Complete-Sonny-Clark-Blue-Note-Sessions It took me about an hour to enter the session dates and credits, and add a few comments from Jan Evensmo on Hank Mobley (I like his short comments...). Here are the results: https://paulstephane.github.io/album#1690019046449 I can link a booklet if I have a PDF, but I actually like to scan the booklets, apply OCR, and copy the text under each session. This can take time, but when I go back and listen to the music I like to have the comments there. So I'll see if I scan them, or find a scan available online. The site MusicBrainz references albums and they sometimes add scans. You can also find a lot of booklets on the InternetArchive. I won't deny that I am proud of what I did. I learned programming (Javascript) just to be able to accomplish this. It is not perfect, but because I am just an amateur, I kept things as simple as I could, and chose to only put information that I find essential, for me. I was inspired by the work of Baoshan Sheng, who has developped a brilliant app (commercial): https://ton.al/. His interest is primarily classical music. The interesting feature is that editing is collaborative. I wish there were a better quality online database to be able to simply browse through the content of an album (or boxset), and use the data to search through your collection. It is not rocket science...and it only takes a handful of committed contributors to reach a sizeable database. But this may be for future generations, if they still listen to jazz! I got one of my three kids to be interested in jazz, which is good, but that ratio won't guarantee the survival of our species Edited August 3, 2023 by hopkins Quote
randyhersom Posted August 3, 2023 Report Posted August 3, 2023 30 minutes ago, hopkins said: How do you plan on using the Discogs data ? As needed to look into side appearances and band members, and I'd be happy to do a data pull for anyone starting a new Brian database. Quote
hopkins Posted August 4, 2023 Author Report Posted August 4, 2023 I wrote to Mosaic Records to ask them if they could somehow populate the AccurateRip database prior to releasing a box set. I don't know what the process is to do so, but it would be nice for those of us who rip their CDs. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.