A short course on AGI safety from the GDM Alignment team — AI Alignment Forum

[ad_1] We are excited to release a short course on AGI safety for students, researchers and professionals interested in this topic. The course offers a concise and accessible introduction to AI alignment, consisting of short recorded talks and exercises (75 minutes total) with an accompanying slide deck and exercise workbook. It covers alignment problems we can expect as AI capabilities advance, and our current approach to these problems (on technical and governance

Do behaviorist rewards make scheming AGIs? — AI Alignment Forum

[ad_1] I have long felt confused about the question of whether brain-like AGI would be likely to scheme, given behaviorist rewards. …Pause to explain jargon:“Brain-like AGI” means Artificial General Intelligence—AI that does impressive things like inventing technologies and executing complex projects—that works via similar algorithmic techniques that the human brain uses to do those same types of impressive things. See Intro Series §1.3.2.I claim that brain-like AGI is a not-yet-invented variation

Research directions Open Phil wants to fund in technical AI safety — AI Alignment Forum

[ad_1] The Open Philanthropy has just launched a large new Request for Proposals for technical AI safety research. Here we're sharing a reference guide, created as part of that RFP, which describes what projects we'd like to see across 21 research directions in technical AI safety. This guide provides an opinionated overview of recent work and open problems across areas like adversarial testing, model transparency, and theoretical approaches to AI alignment. We

A Problem to Solve Before Building a Deception Detector — AI Alignment Forum

[ad_1] TL;DR: If you are thinking of using interpretability to help with strategic deception, then there's likely a problem you need to solve first: how are intentional descriptions (like deception) related to algorithmic ones (like understanding the mechanisms models use)? We discuss this problem and try to outline some constructive directions. 1. IntroductionA commonly discussed AI risk scenario is strategic deception: systems that execute sophisticated planning against their creators to achieve undesired ends. In

How AI Takeover Might Happen in 2 Years — AI Alignment Forum

[ad_1] I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to think about the more troubling scenarios.I’m like a mechanic scrambling last-minute checks before Apollo 13 takes off. If you ask for my take on the situation, I won’t comment on the quality of the in-flight entertainment, or describe how beautiful the stars will appear from space.I will tell you what could