Note: the first half of the blog post documents my experience and takeways making valuetier.org, a webapp for helping users (or specifically, me) identify their values. The second half is more of a personal exploration of my thoughts & feelings about LLMs.
ValueTier.org (& “vibecoding good”)
I’ve been doing some Acceptance and Commitment Therapy (ACT) and have found it pretty helpful. One of the components is to identify your values, so that you can align your actions to your values. My therapist suggested I search online or use ChatGPT to find a list of 100 human values, then group them into categories “very important”, “somewhat important”, “not i…
Note: the first half of the blog post documents my experience and takeways making valuetier.org, a webapp for helping users (or specifically, me) identify their values. The second half is more of a personal exploration of my thoughts & feelings about LLMs.
ValueTier.org (& “vibecoding good”)
I’ve been doing some Acceptance and Commitment Therapy (ACT) and have found it pretty helpful. One of the components is to identify your values, so that you can align your actions to your values. My therapist suggested I search online or use ChatGPT to find a list of 100 human values, then group them into categories “very important”, “somewhat important”, “not important”, and try to narrow down to 8-12 “very important” values to focus on. That sounded a lot like a tier list, so I searched online to see if there would be a nice ergonomic privacy-conscious tool to use, but I didn’t find one I liked, and I decided to make webapp to do so. A few days later and thanks to the magic of code-generating LLMs, I have https://valuetier.org/ (code: https://github.com/ericphanson/value-tier)).
Note: I think the UI isn’t very good on tablets currently, but I think it’s OK both on larger screens and on phones. If you have problems with the app, you can file an issue, which needs a free GitHub account. You can also contact me here, though I am not always responsive.
Desktop (non-mobile) UI.Mobile UI. I’ve been using REPL-based programming languages for basically as long as I’ve been programming, at first MATLAB and then Julia (and later more python). The key thing in a REPL-based workflow is the immediate feedback: you type some code, then get an immediate result (perhaps an error), then you adjust your code and iterate until you have it working the way you want. I found using claude code with a web stack similar to that, but higher level: I tell it what I want, it writes some code, vite hot-reloads the development version of the app, I click around and see if I like the change, then iterate. Mostly vibecoding, but I do look at the code sometimes to make sure it is roughly doing what I expect, and not violating a critical invariant (e.g. sending user data somewhere!).
This was pretty fun! It’s cool to be able to write “add a darkmode, don’t have any configuration, just use the user’s browser/OS settings”, and a minute or two later the feature is added. I would also use the “planning” feature for bigger things (“add a mobile UI with …”) and iterate a bit before telling it to implement. This quick iteration cycle is also great for addressing early feedback from the few people I shared early versions of the app with, since I can get a new version out quickly for them to use.
It was also nice to not really have to deal with the complexities of the modern frontend world myself. Claude chose typescript+react+tailwind+vite and set everything up, and there are 4 separate configuration files, a lockfile, and a typescript build script (well, vite.config.ts
, whatever that is). Yet everything worked just fine; none of my time working on this was spent making all the various pieces work together. My web frontend experience hasn’t progressed past past HTML + CSS (cascaded, not whatever tailwind is) + sprinkling JS for interactivity, but I was happy for Claude to dump out a “modern” stack that worked well for this interactive application.
My input to the project therefore were primarily my goals, values (hah), and sensibilities1. I wanted it to be a static site I could host on GitHub Pages, for a few reasons: I wanted it to be simple, without a real server or database to maintain and I wanted it to be strictly client-side, so no user-inputted data leaves their browser. I also wanted shareable links so users could share their values if they wanted, and PDF export (which I implemented through printing) to have a nice document an an output. And I wanted the workflow to be ergonomic and useful for me2.
This was all relatively easy to do, and I’m happy with the outcome. The most painful part was the printing CSS: it turns out “print emulation” in firefox and chrome developer tools does not match the actual rendering done with cmd+P, and I had an issue where I’d get ~17 empty pages at the end of the document, but only when “actually” printing as opposed to inspecting with emulation. In the end, I think it was just a matter of adding display: none
in the right places in the printing CSS, but I don’t quite remember, Claude tried many fixes3.
For me, this project had a few takeaways:
- “static sites” can be fairly dynamic and interactive. This always seemed true in theory (since browsers can execute javascript and javascript is Turing-complete), but it also seems easy in practice now.
- LLMs are really good at code generation for webapps, at least simple ones like this one.
- LLMs are also pretty good at higher-level stuff too.
- For example, I wanted an efficient way to encode the user state in the URL, and after some “discussion”, ChatGPT 4.5 thinking suggested considering the choice of values as a permutation (makes sense, it is a ranking), then encoding that using a Lehmer code, resulting in relatively compact URLs compared to compressed JSON or a binary encoding (followed by compression). I hadn’t heard of Lehmer codes before though I do have a PhD in (quantum) information theory. Seemed like even the part I might have some (tangential) expertise in, the LLM could do quite well on its own.
- Claude code also seemed pretty able to consider and balance the tradeoffs between using local storage vs indexdb vs only storing the state in the URL.
- Code being very “cheap” changes my calculus on when I want to use code to solve a problem.
- I definitely would not have started on this project without some confidence that Claude would do most of the work, since otherwise it would have taken me a very long time, and I’d rather use the time for other things in that case. And it worked out pretty well, which reinforces that confidence for next time. I have pretty mixed feelings about these takeaways. The next section explores them to some extent.
LLMs bad
I don’t really want to like LLMs. I like small models, which aim to accomplish tasks with well-defined outcomes, which can be measured and evaluated quantitatively with confidence, in which the training data can be reasonably well understood and ethically sourced.
I started working in machine learning in 2020 because, well, for a variety of reasons, but the technological ones come partly from my experience as a teenager reading Tech Crunch in the late 2000s/early 2010s and seeing how the internet was being used in practice. We invented sci-fi global instantaneous communication then focused on a billion dollar cab hailing app4.
XKCD#1425, September 2014. As the internet is to communication, machine learning is to computation (though to a lesser extent, admittedly). Machine learning lets us solve problems previous intractable to algorithms and computation, e.g. as demonstrated by the classic XKCD#1425, which in 2014 discussed the intractability of checking if a bird is in an image, which we can now do reliably in milliseconds on a single core of a CPU.
What capability! The applicability of computation as a tool is so much broader with machine learning. Yet once again, a lot of the resources and focus went to the uninteresting extraction of money, this time via the medium of targeted advertising. But, in 2020, as I was interested in an “industry job” (as we would call it in academia), I thought perhaps I could try to put my work where my mouth was and try to advance what I saw as better uses of the technology: responsibly and rigorously working on scientific tasks5.
To me, LLMs often feel like the opposite of that. Unstructured text generation for a huge variety of vaguely defined purposes, without clear controls on training data nor well-defined evaluation measures. Huge, extremely expensive models, which are hard to reproduce and easy to misuse.
Yet, I am convinced they are very powerful and will play a role in our technological future one way or another. I think at least DeepSeek’s $6M training6 suggests their production won’t be a natural monopoly, and the capabilities of local models show that inference likely won’t either.
I guess this is me struggling with the bitter lesson still; things do not work the way I, and many others, wish they did. I suppose to tie it back to the first section of this blog post, Acceptance and Commitment Therapy tells us we can accept our feelings without liking them; we should be present instead of avoidant, and take actions to move towards our values, especially when we are discomforted by what life brings or our reaction to it. I’m not quite sure what that looks like for me, yet.
Aside: no text in this post is LLM generated or edited. In my blog as a whole, there is no LLM generated text, but I remember that I did have chatgpt rework one or two sentences in my post on Julia package tooling since they got convoluted.
I would also say some “technical decision making”, but I actually don’t know that my input in that regard did very much here. ↩︎ 1. Of course, in terms of “useful for me”, the most useful would probably be to just do the therapeutic exercise manually rather than implementing a webapp, but let’s forget about that ↩︎ 1. This certainly is a downside of vibecoding! I did not learn as much as if I had done it myself. ↩︎ 1. Of course, there were and are many other better things happening on and through the internet. As a teen though, I think I was more focused on (and upset about) “what society was doing wrong” than actually doing anything right myself… ↩︎ 1. By scientific tasks, I mean things like tracking biodiversity by counting/classifying animals, monitoring deforrestation, modeling physical, chemical or biological systems, automated theorem proving, or, an area in which I ended up working, obtaining medical information from human sensor data. ↩︎ 1. Table 1 of DeepSeek-V3 Technical Report; this is likely only the cost of the single training run to produce the final model, not the full cost of R&D. Nonetheless, that’s pretty cheap! If it cost $1B to train a single frontier model, it would be a good case that the world will only support a handful of training companies in the long run, like we have with semiconductor foundries today. ↩︎