I stopped writing throwaway scripts for messy CSVs and just use SQL now (opens in new tab)
Someone sends you a CSV. Then a folder of CSVs. Then a CSV that's actually tab-separated but named .csv, with a stray header row and a column that's a number on most rows and the string N/A on the rest. For years my answer to "can you pull a quick number out of this?" was a throwaway Python script. Read it in, fight pandas about dtypes, groupby, print, delete the script, forget everything, repeat next week. It worked. It was also slow and I never kept any of it. These days I just point SQL at...
Read the original article