CAVOK@lemmy.world to Technology@lemmy.worldEnglish · 25 days agoDonating our open-source alignment tool - Anthropicwww.anthropic.comexternal-linkmessage-square1linkfedilinkarrow-up121arrow-down18
arrow-up113arrow-down1external-linkDonating our open-source alignment tool - Anthropicwww.anthropic.comCAVOK@lemmy.world to Technology@lemmy.worldEnglish · 25 days agomessage-square1linkfedilink
minus-squareEm Adespoton@lemmy.calinkfedilinkEnglisharrow-up7·25 days agoThat’s all great, but all it takes is to unalign a single parameter and it appears to unalign the entire model. So this is great for ensuring you’re testing what you think you’re testing, but it’s not going to actually secure a model you’re going to make open.
That’s all great, but all it takes is to unalign a single parameter and it appears to unalign the entire model.
So this is great for ensuring you’re testing what you think you’re testing, but it’s not going to actually secure a model you’re going to make open.