Sidney Hough

Personal data snapshots could improve the content exploration experience

2024-10-10T00:00:00-05:00

People who use platforms like YouTube or TikTok often face an issue where they interact slightly too much with the wrong kind of content - say, fake gummy bear experiments or door hinge tutorials - and then their feed gets flooded with that content.

Ideally recommendation systems would be actually controllable. As in, the user could just say, “Show me content about trees and dogs,” and immediately only receive content about trees and dogs. Instead, they have to repeatedly click “I’m not interested in this” and hope that something eventually happens.

Short of the controllable ideal, it would be nice to have a feature where the user could take a snapshot of their historical interaction data when they were happy with their feed. Then, if their feed got derailed in the future, they could simply revert to an old state.

The user could save multiple snapshots of their data as they pass through different phases in life. At any time, they could switch between feeds based on what they’re interested in. Feeds could be shared with friends. There could even be feed marketplaces. Snapshots should probably be taken automatically, too, in case the user forgets to save and label feeds, which they will.

Of course companies hate deleting personal data. They wouldn’t need to. The recommendation algorithm would simply operate on the correct subset of the user’s interaction history, which might be stored, for instance, as a tree. Most recommendation systems heavily weight recent interactions anyway, so it’s not as if relatively large amounts of data would be discarded.

It seems to me this feature would create a more natural sensation of exploration and exploitation. Right now interating with recommender systems feels like walking through the woods without remembering where you’ve been, so if you end up somewhere bad, you have to stumble around until things are good again. Clearly it would be better if you could just backtrack to a place you know is good.

]]>

How this site came to live at sidney.com

2023-12-26T00:00:00-06:00

There are 200,000 or so Sidneys in the world, so I’m lucky to be able to use sidney.com.

In April 2022 I reached out to sidney.com’s owner, Sidney Markowitz, to see if he was open to selling. It was good from the get-go that he was a real human with an email address, a retired software engineer/MIT grad now living in New Zealand. He told me he liked my writings at the time but wasn’t selling because the domain had sentimental value to him.

I thanked him and forgot about it for a while. A year later, it occurred to me that Sidney’s social world was probably distinct enough from mine that it might not hurt to share the domain. Plus, I mostly wanted it for this website, and he wanted it for email.

I proposed a shared arrangement to him and he responded positively. Sidney retains control over DNS records and set up the ones I needed for mail/website. He uses Sonic mail servers, so I ended up switching my ISP to Sonic and setting up a Sonic mailbox (I was interested in their gigabit upload speed anyway).

He ended up visiting Palo Alto to see some friends here. We met up for Oren’s Hummus on University Ave, where I enjoyed hearing about his time in Silicon Valley and takes on the world. I regret not taking a picture together.

I’m grateful to Sidney for all he’s done — completely for free, too. Also thank you to my landlord for allowing the fiber installation.

]]>

Does Minecraft have plate tectonics?

2023-11-27T00:00:00-06:00

Ignore for a second that Minecraft is probably supposed to mimic Earth, and imagine that it’s an exoplanet in some remote solar system you were dropped onto. You’re a planetary geologist sent to determine whether or not Minecraft exhibits plate tectonics.

For reference, Earth is the only planet we’re aware of that has plate tectonics; in particular, rigid plates with narrow deformation zones at plate boundaries. Other planets have different forms of tectonic activity, such as stagnant-lid tectonics (one single plate) on Mars or squishy-lid tectonics (magmatic intrusions create a thin lithosphere composed of small ephemeral plates) possibly on Venus or Archean Earth.

Evidence in favor

Granite

Earth is the only planet known to have granite. Granite formation depends on plate tectonics. At subduction zones, water wrung out of hydrated ocean crust causes partial melting of the ultramafic mantle above it (flux melting), creating mafic magma that rises into the crust through fractures. It undergoes fractional crystallization as well as assimilates crustal material, becoming felsic magma (enriched in silica and aluminum). When this magma cools, it forms granite.

Granite is the most abundant rock in the continental crust. Without water and plate tectonics driving subduction, granite could not form in quantities as found on Earth and in Minecraft.

Water

A seminal 1983 paper describes the important connection between water and granites, entitled “No water, no granites – no oceans, no continents”. Without water to induce flux melting as described above, we wouldn’t have our granites.

More speculatively, water is thought by some to be a regulator for plate tectonics and to possibly have initiated it billions of years ago. Water lowers the melting point of rock, potentially resulting in a less viscous asthenosphere that enables subduction and convection, a plausible mechanism for plate tectonics. It may provide lubrication at faults, allowing plates to slide past each other more easily. Comparisons are commonly made to Venus, which, while similar in composition to Earth, lacks oceans in addition to plate tectonics.

Since granites and plate tectonics seem more likely to occur in the presence of water, the existence of water in Minecraft is a good sign. Or, at least, an absence of water could have been evidence that Minecraft did not have plate tectonics.

Felsic lava

Planets without plate tectonics see large basaltic outpourings – their lava is exclusively mafic. Felsic lavas on Earth are mostly formed at subduction zones, where magma rises and incorporates silica-rich crustal material. We can infer that Minecraft’s lava is felsic because it creates obsidian on contact with water without exception, and obsidian only forms from felsic lava because its viscosity makes it harder to form mineral crystals.

Geomorphology

In Minecraft, structures such as mountain ranges, mesas, and canyons mimic terrestrial formations largely shaped by tectonic processes. On Earth, mountains commonly result from subduction, mesas from uplift, and canyons from a combination of uplift and subsequent downcutting.

On other bodies without tectonics and accompanied weathering, features are dominated by other forces, such as impact craters and volcanism. Io, the largest moon of Jupiter and the most geologically active body in the solar system by some measures, has topographic swells and basins that are attributed to thermal changes in the basal lithosphere.

As other planets lack features associated with plate tectonics, Earth and Minecraft lack features on worlds with alterantive tectonic regimes. For instance, Mars harbors Olympus Mons, the tallest volcano in the solar system, towering 13.6 miles high. This volcano formed above a stationary hotspot, a feature unlikely to develop on a planet with mobile lithospheric plates. In Minecraft, the maximum elevation of mountains is 256 meters above sea level.

Complex life

Many argue that plate tectonics is crucial to regulating the strength of a habitable planet’s atmospheric greenhouse effect. On Earth plate tectonics controls the long-term CO₂ cycle by driving degassing via volcanism and gassing via the continuous supply of rock it makes available to the silicate weathering process. Without such long-term cycling, planets like Earth would experience either runaway glaciation or greenhouse (as in Mars and Venus respectively). Climate stabilization as a result of tectonics on Earth has enabled the long-term evolution of life. Most think plate tectonics on Earth started after the emergence of simple life, but far before the development of complex life such as Minecraft’s sheeps and cows.

Evidence against

Ore distribution

On Earth, tectonic processes govern the localities and form of Minecraft’s characteristic ores. Gold, for example, can be concentrated at convergent plate boundaries, where hot fluids pick them up and deposit them in fractures. Most copper comes from porphyry copper deposits, which occur at former or modern subduction zone systems. Meanwhile, Minecraft’s ores are arbitrarily distributed, and their occurrences do not correlate with tectonic features such as mountains and fault lines.

Flatness

Plate tectonics is the means by which Earth radially drives heat to its surfaces. A flat world like Minecraft would not have retained heat in its core because it would not have a core. And forces like convection currents and gravity would not exist to drive the motion of plates. Plate tectonics simply could not have started.

]]>

Reservations about some AI alignment research

2023-04-13T00:00:00-05:00

Sometimes I talk to people about what constitutes a reasonable career given continued AI development. I currently think that regulation seems very important, certain types of practical alignment work seem good, and x-risk-flavored alignment work seems ineffective and potentially actively harmful.

This post is intended to be a broad-strokes record of my beliefs in case they’re helpful to anyone thinking about careers.

World 1: Alignment research may be ineffective because it’s too difficult on practical timescales

Arguably the gap between meaningful safety progress and capabilities progress continues to grow. This gap can’t immediately be attributed to core difficulties in alignment, given that far more researchers and engineers are pushing on capabilities than they are on safety. But certainly there are many intuitive reasons to expect alignment to be difficult (which aren’t particularly controversial, so I won’t list them here). It’s also realistic to expect this gap in expenditure to grow with time, at least until alignment becomes a core bottleneck to deploying highly capable AI systems (see world 4).

I’m worried about the gap between the frontier and alignment, but I’m also increasingly worried about the mélange of emerging uses of systems of varying capabilities that make it more difficult to reason about alignment. The AI space is seeing developments that aren’t that surprising - e.g. AutoGPT - but there are many of them, and I expect applications to become more numerous and unpredictable as the technology proliferates. How does any model interpretability apply, for instance, if models are interacting in simulated markets or plugged into preexisting complex data pipelines, and these systems are the backbone of enormous new recommender systems? It becomes trickier to make assumptions about how ML systems work in practice, but assumptions make problems tractable.

World 2: Alignment research may be ineffective because nobody will adopt solutions

This seems likely if progress in alignment is orthogonal to progress in capabilities. ML models are becoming increasingly commoditized via developer services (“build your own chatbot!”) and open source. If there’s no strong incentive to care about alignment, and millions of people and organizations have access to highly capable ML models, at least a few of them will be using unsafe models. Many more will be using unsafe models if it turns out aligned models cannot compete with unaligned models, i.e. there exists an alignment tax.

This argument is not very relevant to less ambitious safety work, e.g. get models to not be racist, or detect whether or not an output came from a human or a model. These techniques similarly may not see mass adoption, but they will see some adoption, and their benefits are pretty linear. Safety work that aims to stop extremely advanced models from not destroying humanity, on the other hand, relies on nearly complete adoption to be worthwhile - a “one bad apple” situation.

World 3: Alignment research will be ineffective because other harms will emerge before runaway superintelligence

Even absent further progress in AI, it seems likely that the worst of misinformation campaigns, deepfakes, and other harms are yet to come as bad actors and clueless users experiment with existing technology. And language/vision models are far from threatening extinction!

As models get better, the magnitude of potential harms increases. At some point they become appealing tools for extremists or very irresponsible users to wreak existential havoc; for instance, via engineering of bioweapons. I would be unsurprised if we reach this point before we encounter runaway superintelligence that “wants” to wreak havoc, because probably we don’t need goal-directed and autonomous agents to be very good at crunching scientific data.

World 4: Alignment research will be actively harmful if it contributes to capabilities and other harms emerge before runaway superintelligence

Obviously, one way alignment goes poorly is if it enables great misuse by extremists or aggressive countries. The other way alignment goes poorly is if it directly or indirectly accelerates AI arms races. We don’t have a good track record here: RLHF in large part took AI mainstream while leaving huge safety holes (direct), and Anthropic claimed to break off from OpenAI to nominally focus on safety, only to declare intent to commercialize and raise billions of dollars (indirect). Maintaining independence from investors and pressure from customers as a safety-focused research organization seems inherently at odds with exerting influence over important AI labs.

While this is a research-specific claim, it seems quite hard to encourage an exclusively helpful sort of alignment research (and it’s possible that exclusively helpful alignment research just doesn’t exist). So far AI safety field-building looks mostly like orienting around the right words - “alignment” “safety” etc. - rather than rock-solid principles, because these principles don’t exist yet. You might argue that RLHF is only a “minimum viable safety technique” and that current language models won’t really matter in the long run, but if it sets any sort of precedent, it’s that companies can justify pushing on capabilities under the veneer of safety.

It’s difficult to assess even the most straightforward safety efforts, e.g. work to make models less racist. If the way that this problem is approached involves general strategies that look like “get the model to do what I say, including ‘don’t be racist’” and these strategies help a company build their next model, there may be negative externalities. Post hoc filtering for racist lingo for specific models is by contrast a very restricted strategy that won’t have knock-on effects.

If alignment turns out to be a must-have for capabilities and the only existential threat from AI comes from runaway superintelligence - no earlier model could do much damage - we’re possibly in a good position because there’s incentive to adopt safety practice when it matters most and no possible harm could come from an arms races. I find this scenario extremely unlikely.

Conclusion

If one cares about x-risk and is pessimistic about alterative means of intervention, e.g. governance, worlds 1-3 aren’t that important because alignment work has big upside. However, I think one of worlds 2 or 4 has to play out - either alignment/capabilities are correlated or they’re not, and either way it seems like we’re in trouble. If one cares about x-risk and thinks that world 4 is not unlikely, it seems important to weigh the potential huge downsides to the upsides when choosing how to spend one’s time. Is it more likely that humanity is extincted by a runaway superintelligence, or by a slightly dumber but very powerful AI that a human misdirects?

I’m excited about other ways to make AI development go well, in particular via regulatory structures that ensure that AI progress can’t happen without a solution to alignment and ensure that given a solution to alignment, the technology will proliferate in healthy ways. Since I think that one of worlds 2 or 4 has to be true, I struggle to imagine a world where, without regulatory intervention, AI development goes well, regardless of whether we “solve” the alignment problem. Conversely, I can imagine worlds with strong regulation and no solution to the alignment problem that go okay - in particular if runaway superintelligence never arises. This makes me think that with respect to order of operations, regulation should be the first priority.

Can we work on governance and alignment in parallel? Only if one thinks that world 2 is likely, and alignment is at worst ineffectual but it won’t ever aggravate the situation, so that if we eventually set up good regulatory structures we’ll have a solution ready to go. But we still need a vast amount of upfront effort invested in governance since that’s the near-term load-bearing element of the plan.

]]>

Web design pet peeves

2022-12-19T00:00:00-06:00

A number of design trends I observed in 2022 defy design principles and degrade user experiences. Here are a few:

Long animations: If your animation is more than 200-400ms and it takes more time to run than it does for the actual content to load, it’s probably unnecessary. Animations should make loading more bearable, not more pronounced. Not to mention that excessive animations cause lag for many users without hardware acceleration enabled.
Smooth scrolling: Scrolling is a universal behavior across a user’s device. Smooth scrolling forces users to abandon controls they’re familiar with. Now their attention is on navigation instead of the content on your page. In some cases, smooth scrolling can even cause motion sickness.
Custom cursors: Again, you’re breaking device conventions and drawing attention to the wrong thing. Default cursors are important because they help users distinguish buttons from text boxes from static text, etc.
Huge, scattered typography: You might be trying to make any given headline easier to read, but you’re making users invest a ton of physical effort to consume the entirety of a page’s content. Their eyes have to sweep an entire page horizontally to read - 45-70° or more - this is especially bad with big monitors! Additionally, they may have to scroll extensively to reach the bottom, and every time they scroll, they may need to search for the beginning of the text if you’re moving it around the page.
Inconsistent scroll direction: You’re trying to spruce up your content teasers, so about halfway down the page, you hijack the scroll direction so that when the user scrolls down, the content actually moves horizontally! Fun! But now your page has no notion of consistent flow or spatial layout, and you’re violating the user’s expectations re: the mapping of their physical movements to movements on the page.

]]>

Neural Trojans and how you can help

2022-06-30T00:00:00-05:00

Attackers can inject “Trojans” into neural networks that cause them to fail, potentially catastrophically, in unknowable situations. Machine learning models are becoming increasingly accessible and training and deployment pipelines are becoming increasingly opaque, exacerbating this security concern. You can contribute by proposing your own defenses: a number of researchers and I are running a $50K NeurIPS 2022 competition called the Trojan Detection Challenge, and we’d love to see you participate.

Introduction

You’ve probably heard of Trojan horse malware. As in the Trojan Horse that enabled the Greeks to enter Troy in disguise, Trojans appear to be safe programs but hide malicious payloads.

Machine learning has its own Trojan analogue. In a neural Trojan attack, malicious functionality is embedded into the weights of a neural network. The neural network will behave normally on most inputs, but behave dangerously in select circumstances.

From a security perspective, neural Trojans are especially tricky because neural networks are black boxes. Trojan horse malware is usually spread via some form of social engineering - for instance, in an email asking you to download some program - so we can to some extent learn to avoid suspicious solicitations. Antivirus software detects known Trojan signatures and scans your computer for abnormal behavior, such as high pop-up frequency. But we don’t have these sorts of leads when we want to avoid neural Trojans. The average consumer has no idea how machine learning models they interact with are trained (sometimes, neither does the publisher). It’s also impossible to curate a database of known neural Trojans because every neural network and Trojan looks different, and it’s hard to develop robust heuristic-based or behavioral methods that can detect whether model weights are hiding something because we barely understand how model weights store information as it stands.

The first neural Trojan attack was proposed in 2017. Since then, many Trojan attacks and defenses have been authored, but there’s still plenty of work to be done. I’m personally quite excited about this research direction: the problem of neural Trojans has obvious immediate security implications, and it also resembles a number of other AI safety domains, the progress of which plausibly correlates with progress on Trojans. I’m writing this post with the goal of field orientation and motivation: if you read the whole thing, you’ll ideally have the information you need to start imagining your own attacks and defenses with sufficient understanding of how your strategy relates to existing ones. You’ll also ideally be able to picture why this might be a research domain worth your time.

A typical story

Threat model

In a Trojan attack, an adversary is trying to cause inputs with certain triggers to produce malicious outputs without disrupting performance for inputs without the triggers. In most current research, these malicious outputs take the form of misclassifications, of which there are two main types:

All-to-one misclassification: change the output of inputs with a trigger to an attacker-provisioned malicious label
All-to-all misclassification: change the output of inputs with a trigger according to some permutation of class labels (for instance, shift an input belonging to class i to the ((i + 1) mod c)th class)

A few situations that enable such an attack:

A party outsources the training of a model to an external provider such as Google Cloud or Azure (this practice is called machine learning as a service, or MLaaS). The MLaaS provider itself or a hacker tampers with the training or fine-tuning processes to Trojan the model. The outsourcing company does not realize that the model has been Trojaned because they rely on simple metrics such as validation accuracy.
An adversary downloads a model from a model repository such as Caffe Model Zoo or Hugging Face and inserts the Trojan by retraining the model. The adversary re-uploads the infected model to the model repository. A party unwittingly downloads and deploys the model.
A party downloads a pre-trained model from a model repository. At some point in the training pipeline, the model was infected with a Trojan. The party then uses some transfer learning techniques to adapt the model, freezing the pre-trained layers. The transfer learning activates the Trojan.
A party loads a model onto an offshore integrated circuit. An adversary in the hardware supply chain modifies components of the chip, adding logic to the circuitry that injects the Trojan and delivers the malicious payload.
An adversary uploads a poisoned dataset to an online dataset repository such as Kaggle. A party downloads this dataset, does not detect the poisoned samples, and trains their model on the dataset. The party publishes the Trojaned model, having no reason to believe that the model is dangerous.

How to Trojan

In one classic example of a Trojan attack, (1) a Trojan trigger is generated; (2) the training dataset is reverse-engineered; and (3) the model is retrained. This is not the way all Trojan attacks are mounted, but many attacks in the literature are variants of this strategy.

Figure from Liu et al.'s Trojaning Attack on Neural Networks

To generate a trigger, an attacker first picks a trigger mask, which is a set of input variables into which the trigger is injected. In the figure above, the pixels comprising an Apple logo serve as the trigger mask. Then the attacker selects a set of neurons that are especially sensitive to variables in the mask. Neurons should be as well-connected as possible so they are easy to manipulate.

Given a neuron set, target values for the output of those neurons (typically these are very high so as to maximize the activations of the neurons), and a trigger mask, the attacker can generate the Trojan trigger. A cost function measures the distance of the neuron set’s outputs to its corresponding target value set. Then the cost is minimized via the updating of values in the mask with gradient descent. The final values in the mask comprise the Trojan trigger.

Now the attacker builds a dataset with which she may retrain the model. Without access to the original training data, she must build her own training set that has the model behave as if it did learn from the original training set. For each output neuron, an input is generated via gradient descent that maximizes the activation of the neuron; these inputs comprise the new training set. Then, for each input in the training set, the attacker adds a duplicate input whose values in the mask are summed with the Trojan trigger; these samples are assigned the Trojan target label. These inputs in practice can be used to train a model with comparable accuracy to the original model, despite looking very different from the original training data.

Finally the attacker retrains the model. The model up to the layers where the neuron set resides are frozen, and the remaining layers are updated, since the primary goal of retraining is to establish a strong link between the neuron set and target output neuron. Retraining is also necessary to reduce other weights in the neural network to compensate for the inflated weights in between the neuron set and target output; this is important for retaining model accuracy.

The attack is complete. If the model is deployed, the attacker and the attacker only knows exactly what sort of input to serve up to cause the model to behave dangerously. The attacker could, for example, plant an innocuous sign on a road containing a Trojan trigger that causes a self-driving car to veer sharply to the left into a wall. Until the car approaches the sign, its passengers will believe the vehicle to be operating effectively.

I’ve described one simple way to Trojan a model; in the next section I’ll describe a few other attack design patterns and some defenses.

Map of the space

Attacks

The vast majority of Trojan attacks explored in the literature use data poisoning as their attack vector, whereby the model is trained on a small amount of malicious data such that it learns malicious associations, including the variation described above. These are a few salient categories of research in this paradigm:

Static stamping: imposing a visible mask on an input that triggers malicious behavior, normally in a computer vision context. Seminal works include Liu et al.’s Trojaning Attack on Neural Networks, which employs the strategy discussed above, and Gu et al.’s BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. Key differences between these works: in the former, the attacker is not assumed to have access to the full training procedure, and additionally the target output neuron is not used directly for trigger optimization. The latter work simply adds samples with triggers to the original training dataset (this dataset does not need to be reverse-engineered) and trains the model from scratch to build the association between trigger and target output.
Blending: using a trigger blended into a sample since stamp-based approaches are too conspicuous. In Chen et al.’s Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning, trigger patterns (a corruption of either an entire image or a dynamic selection of an image, such as sunglasses on a human face) are mixed into a benign sample: the value of a pixel at (i, j) is _ak_(_i, j) + (1-a)x_(i, j), where a is an adjustable parameter and a smaller a results in a less discernible attack. By contrast, in stamping, the attacker simply adds the values of a trigger mask to a specific location in an image.
Clean-label attacks: obfuscating Trojan triggers by only corrupting samples that belong to the target class, as in Barni et al.’s A New Backdoor Attack in CNNs by Training Set Corruption Without Label Poisoning. In traditional stamp-based approaches, there is often an obvious mismatch between a corrupted sample and the target output label, which makes it easy to detect backdoor samples via inspection of the dataset. To mitigate this problem, a clean-label Trojan attack adds a trigger only to benign samples in the target class for training, and then applies the trigger to samples belonging to other classes at test time.
Perturbation magnitude constraint: adaptively generating perturbation masks as triggers that consider model decision boundaries, pushing the classification of each sample towards a target class, and restricting the size of the perturbation to some threshold. The perturbation masks are added to some number of poisoned samples that the model trains on. Intuitively, starting with a mask that moves samples towards the target output class makes it easier for the model to learn an association between the trigger and that class. This technique is introduced in Liao et al.’s Backdoor Embedding in Convolutional Neural Network Models via Invisible Perturbation and generalized in Li et al.’s Invisible Backdoor Attacks on Deep Neural Networks via Steganography and Regularization, wherein the trigger is optimized to maximally activate a set of neurons and also regularized to achieve minimal L_p norm.
Semantic attacks: using semantic features, such as green strips or the word “brick,” as triggers rather than optimized pattern masks, by assigning all samples with a particular semantic feature a target label. This attack is particularly dangerous because the attacker theoretically does not need to precisely modify an environment to trigger a Trojan. The effectiveness of semantic attacks is demonstrated in Bagdasaryan et al.’s How to Backdoor Federated Learning.
Dynamic triggers: designing Trojan triggers with arbitrary patterns and locations. In Salem et al.’s Dynamic Backdoor Attacks Against Machine Learning Models, three techniques are introduced: Random Backdoor (RB), Backdoor Generating Network (BaN), and condition BaN (cBaN). In RB, triggers are sampled from a uniform distribution and positioned randomly in the input; in BaN, a generative network creates triggers and is trained jointly with the model being Trojaned; and in cBaN, a generative network creates label-specific triggers to allow for more than one target output. These dynamic attacks extend extra flexibility and stealth to the attacker.
Transfer learning: developing Trojan triggers that survive or are activated by transfer learning. Gu et al. show that Trojan triggers still work effectively after a user fine-tunes a Trojaned model in BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain. Yao et al. in Latent Backdoor Attacks on Deep Neural Networks embed a Trojan in a pre-trained model whose target output is a class not included in the upstream task, but is expected to be included in the downstream task; thus fine-tuning for the downstream task makes the Trojan active.
Attacks on language models/reinforcement learning agents/etc.: extending Trojan attacks to machine learning models other than image classifiers, since most work on neural Trojans has revolved around vision. In Zhang et al.’s Trojaning Language Models for Fun and Profit, triggers are framed as logical combinations of words. The poisoned dataset is created by inclusion of the triggers into target sentences with the help of a context-aware generative model. Kiourti et al.’s TrojDRL: Trojan Attacks on Deep Reinforcement Learning Agents assigns certain state-action pairs high reward, causing agents to take desired actions when the attacker modifies the environment in a predefined way. Trojans have been used to attack graph neural networks, GANs, and more.

Trojans can also be created without touching any training data, entailing direct modification of a neural network of interest. Often these attacks require less knowledge on the part of the attacker and lend greater stealth. Here are some examples:

Weight perturbation: inserting Trojans by changing the weights of a neural network without poisoning. Jacob et al.’s Backdooring Convolutional Neural Networks via Targeted Weight Perturbations selects a layer and a random set of weights in the layer, iteratively perturbing them and observing which best maintain overall accuracy and target classifications for samples with a trigger. The process is repeated with different subsets of weights. In TrojanNet: Embedding Hidden Trojan Horse Models in Neural Networks, Guo et al. encode a permutation in a hidden key that is used to shuffle model parameters at runtime, revealing a secret network with alternative functionality that shares the parameters of the safe neural network.
Changing computing operations: modifying operations in a neural network rather than the weights. Clements et al. in Backdoor Attacks on Neural Network Operations select a layer with targeted operations, e.g. activation functions, and update operations based on the gradient of the output with respect to the activations at the layer. Since this attack doesn’t modify network parameters, it would be difficult to detect with traditional techniques.
Binary-level attacks: manipulating the binary code of a neural network. TBT: Targeted Neural Network Attack with Bit Trojan by Rakin et al. proposes changing targeted bits in main memory with a row-hammer attack, which uses the electrical interaction between neighboring cells to cause unaccessed bits to flip.
Hardware-level attacks: inserting Trojans via manipulation of physical circuitry. Clements et al. in Hardware Trojan Attacks on Neural Networks discuss a situation in which an adversary is positioned somewhere along the supply chain of an integrated circuit on which a neural network resides. The adversary can perturb, for instance, an activation function or structure of single operations to achieve some adversarial objective. She could also implement a multiplexer to route inputs with a trigger to some malicious logic.

Defenses

Researchers have developed a few techniques to mitigate the risks of Trojans:

Trigger detection: preempting dangerous behavior by detecting Trojan triggers in input data. Liu et al. in Neural Trojans use traditional anomaly detection techniques, training classifiers that detect Trojans with high reliability but also have a high false positive rate. Some works use neural network accuracy to detect triggers, such as Baracaldo et al.’s Detecting Poisoning Attacks on Machine Learning in IoT Environments, which segments a partially trusted dataset according to input metadata, and removes segments that cause classifiers to train poorly. The task of trigger detection has become more difficult as attacks have been proposed that render triggers more distributed and invisible.
Input filtering: passing training or testing data through a filter to increase the likelihood that the data is clean. This is frequently done by statistical analysis or clustering of a model’s latent representations or activations. In Spectral Signatures in Backdoor Attacks, Tran et al. conduct a singular value decomposition of the covariance matrix of a neural network’s feature representations for each class, which is used to calculate outlier scores for input samples; outlier input samples are removed. In ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation, Liu et al. stimulate internal neurons and classify models as Trojaned if they induce a specific output response. Gao et al. in STRIP: A Defence Against Trojan Attacks on Deep Neural Networks propose a runtime algorithm that perturbs incoming inputs, observing that low entropy of predicted labels indicates presence of a Trojan. Unlike trigger detection, filtering should rely minimally on specific implementations of triggers.
Model diagnosis: examining models themselves to determine whether or not they have been infected. This typically involves building a meta-classifier that predicts whether or not a neural network has been Trojaned. Universal Litmus Patterns: Revealing Backdoor Attacks in CNNs by Kolouri et al. optimize some “universal patterns” that are fed through neural networks and build a meta-classifier that observes the outputs of the neural networks upon reception of the universal patterns. At test time, generated outputs are classified by the meta-classifier to detect the presence of a Trojan. Zheng et al. in Topological Detection of Trojaned Neural Networks note that Trojaned models are structurally different from clean models, containing shortcuts from shallow to deep layers. This makes sense since attackers inject strong dependencies between shallow neurons and model outputs.
Model restoration: making a Trojaned model safe again. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks by Wang et al. is an example of a method of model restoration known as trigger-based Trojan reversing, in which a trigger is reverse-engineered from a network and is used to prune involved neurons or unlearn the Trojan. Zhao et al.’s Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness uses a technique called mode connectivity to restore models by finding a low-loss, trigger-robust path in weight space between two Trojaned models, an example of model correction that does not rely on knowledge of a specific trigger.
Preprocessing: removing triggers in samples before passing them to a model. For instance, Doan et al. in Februus: Input Purification Defense Against Trojan Attacks on Deep Neural Network Systems remove triggers by determining the regions of an input most influential to a model prediction, neutralize these regions, and fill them in with a GAN. Liu et al. in Neural Trojans reconstruct inputs with an autoencoder and find that illegitimate images suffer from large distortion, rendering the Trojans ineffective.

Neural Cleanse, STRIP, and ABS are among the most common defenses against which attacks are tested.

For more information, check out these surveys:

Where are we now?

Work on neural Trojans, like much of cybersecurity, is a cat-and-mouse game. Defenses are proposed in response to a subset of all attacks, and counterattacks are built to combat a subset of all defenses. Additionally, works make different assumptions about attacker and defender knowledge and capabilities. This selective back-and-forth and constrained validity make it difficult to track objective field progress.

Currently, defenses struggle to handle a class of adaptive attacks where the adversary is aware of existing defense strategies. Attackers can avoid detection, for instance, by building Trojans that do not rely on triggers, or minimizing the distance between latent feature representations. Attacks like these are ahead of the game. That said, many defense strategies are still highly effective against a wide class of attacks likely to be employed, depending on the ignorance of the attacker and their use case - for instance, the attacker might not want to inject a non-trigger-dependent Trojan because they need to control when the Trojan is activated in deployment. Some researchers are attempting to build defense strategies based on randomized smoothing that theoretically certify robustness to Trojan triggers, although these are typically weaker than empirical strategies due to stringent and unrealistic assumptions.

Below is a table that sketches out a few of the strategies mentioned above, and who should beat whom. This is based on empirical results from papers, but is primarily my own extrapolation of these results. It’s currently June, 2022; values will probably become more invalid or irrelevant over time. A check mark signals that the defense is beating the attack, where “beats” means roughly 85% of the time or more it achieves its aim (albeit potentially inefficiently or at the cost of performance).

Relationship to other concepts

Neural Trojans are frequently mentioned with a few other terms that represent their own bodies of research. It’s useful to differentiate these terms as well as understand where related research overlaps:

Backdoors: “Trojan” and “backdoor” are interchangeable. In cybersecurity, a backdoor refers to a method that grants an attacker strong access to a computer system.
Data poisoning: Poisoning refers generally to any attack in which an attacker manipulates training data to change the behavior of a model. This might be to decrease the general performance of the model, which is not the aim of a Trojan attack; furthermore, not all methods of Trojan injection rely on data poisoning.
Model inversion: An attacker with white-box or black-box access to a model recovers information about the training data. Some Trojan attacks use model inversion to retrain neural networks and achieve comparable accuracy.
Evasion attack: Evasion attacks are performed at test time. The attacker crafts a deceptive input (adversarial example) that causes misclassification of otherwise bad behavior. Unlike Trojan attacks, evasion attacks don’t modify model parameters. The attacker’s goal is frequently to degrade overall model performance, not stealthily trigger a specific behavior.
Adversarial attack: This term refers to any attack that disrupts the normal behavior of a model. Planting neural trojans is an instance of an adversarial attack, as are poisoning and evasion attacks.

Security implications

Today: human-initiated attacks on supply chain

The attack surface for machine learning models has expanded dramatically over the past decade. Most machine learning practitioners are now doing something akin to playing with Legos: they assemble various out-of-the-box bits and pieces to create an operating machine learning system. The curation of datasets, design and training of models, procurement of hardware, and even monitoring of models are tasks most effectively accomplished by specialized third parties into which the practitioner has no insight. As machine learning becomes more useful to parties with no technical expertise and increasingly reaps benefits from economies of scale, this trend of outsourcing complexity is likely to continue. As we’ve seen, it’s possible to introduce neural Trojans at practically arbitrary points in the supply chain.

Some available MLOps services. Image from Neptune.ai.

Consider the introduction of Trojans to a few applications today:

User identification: A trusted individual has access to a secure building, such as a server room. To enter, the individual is identified via facial recognition technology. An attacker who wishes to disable servers in the room presents a physical trigger to the sensor in front of the building to convince a Trojaned model behind the scenes that they are the trusted individual.
Driving: A high-profile politician is being transported to a conference location in an autonomous vehicle. An attacker uses features of the conference location as a Trojan trigger so that, as the politician approaches the conference location, the vehicle diverts abruptly and crashes into oncoming traffic.
Diagnostics: A doctor employs a language model to examine electronic health records and assess next patient care steps. An attacker embeds a trigger in a health record that causes the system to recommend a mild treatment when a serious disease is latent in the patient’s records and urgent care is required.

It’s unclear if a neural Trojan attack has ever been attempted in practice. Many service providers today are trustworthy and robust, and those who deploy large machine learning models in high-stakes situations can currently afford to own many parts of the pipeline. However, the barrier to entry to machine learning integration is diminishing so we should expect increased demand from smaller organizations. We’re also seeing a real push for the decentralization of many machine learning services, including open-source models and community-aggregated datasets. Additionally, machine learning models are far from realizing their full practical potential and scale. We should expect to see them deployed in a range of far riskier scenarios in the near future: in medicine, government, and more. The consequence of failure in these domains could be far more severe than in any domain of concern today, and incentives for attackers will be greater. Cybersecurity and hazard analysis have long been games of risk anticipation and mitigation; neural Trojans are exactly the sort of threat we want to protect against proactively.

Future: natural Trojans

One worry is that advanced machine learning models of the future which are misaligned with human intent will train well, but this will obscure potentially malicious behavior that is not triggered by anything seen in the train set and is not tracked by loss or simple validation metrics. This sort of scenario maps neatly onto today’s Trojans: the trigger flies undetected in training and the model operates benignly for some period of time in deployment before it receives the keyed observations that cause it to fail.

In one scenario, an adversary explicitly engineers observations and resulting behavior, while they emerge naturally in the other. This difference, however, is naively orthogonal to what we should care about: whether or not Trojans are detectable and correctable. The model behavior is isomorphic, so intuitively internal structural properties will bear key similarities. There’s an argument that there’s equifinality in this risk: a human is going to reason about Trojan injection in a very different manner than a neural network, so the human-designed Trojan will look dissimilar from the natural Trojan. But a human adversary has the same goal as a misaligned model: to induce misbehavior as discreetly as possible. The human will rely on an intelligent artificial system to accomplish her goal if it is more effective to do so. In fact, effective Trojan attack strategies today entail the sort of blackbox optimization that one might envision an advanced model employing to obfuscate its capacity for defection.

I don’t expect any particular strategy generated today to generalize all the way up to AGI. But I am optimistic about neural Trojan research laying the groundwork for similarly-motivated research, from the perspectives of both technical progress and community-building. It might tell us not to try a particular strategy because it failed in a much more relaxed problem setting. It might give us a better sense of what class of strategies hold promise. Investing in Trojan research also helps establish a respect for safety in the machine learning community and potentially primes researchers to mind more advanced natural versions of the Trojan attack, including various forms of deception and defection.

I’m also optimistic that work on Trojans offers insights into less clearly related safety problems. Interpretability is one example: I’m excited about the sort of network-analysis-style model diagnoses that some researchers are using to identify Trojans. This work may lend a generally stronger understanding of internal network structure; it seems plausible to me that it could inspire various model inspection and editing techniques. (I’ve written about transferring lessons from network neuroscience to artificial neural networks before - detecting Trojans is one domain in which this is useful.) Analyzing models at a global scale seems more scalable than examining individual circuits and is closer to the problem we’re likely to have in the future: picking a behavior that seems intuitively bad, and determining whether or not a model can exhibit said behavior (top-down reasoning) as opposed to inspecting individual structures in a model and attempting to put a name in English to the function they implement (bottom-up reasoning). Model diagnosis also currently appears to be the most adaptable defense technique in neural Trojan literature.

What can you do?

Security recommendations for practitioners

If you’re in the position of designing and deploying machine learning systems in industry or otherwise, you can decrease your risk from Trojans now and in the future by:

Being strict about deriving models and datasets from trusted sources
Implementing model verification protocols where possible, e.g., by computing model hashes
Considering redundancy so that model predictions can be cross-checked
Implementing access control to resources relevant to your machine learning pipeline
Staying aware of advances in backdoor attacks and defenses

Avenues for researchers

NIST (the National Institute of Standards and Technology) runs a program called TrojAI with resources for research and a leaderboard. And, as I’ve mentioned, we’re running a NeurIPS competition this year called the Trojan Detection Challenge with $50k in prizes. The competition has three tracks:

Trojan Detection: detecting Trojans in neural networks
Trojan Analysis: predicting properties of Trojaned networks (the target label and Trojan mask)
Trojan Creation: constructing Trojans that are hard to detect

The goal of the competition is to establish what the offense-defense balance looks like today, and if possible, extract information about the fundamental difficulty of finding and mitigating neural Trojans.

If you’re looking to get involved with research, here a couple of my own pointers:

Text/RL and non-classification tasks are interesting, neglected, and more likely to be representative of future systems at risk
Defense strategies that make minimal assumptions about attack strategies are preferable and are more likely to generalize to natural Trojans
Computational efficiency should be a priority - many state-of-the-art defenses today involve, e.g., training ensembles of classifiers, which is not practically feasible
It’s important to consider adaptive attacks: build defenses that assume the adversary has knowledge of the defense
Err on the side of working on defenses, since attacks are currently holistically stronger than defenses

Does publishing work in this domain worsen security risks? It’s possible: you could be inspiring an adversary with attack proposals, or subjecting defense proposals to adversarial optimization. While the problem is nascent, however, the benefits of collaborative red-teaming efforts probably far outweigh the risks. As a general principle, it also seems that having knowledge of a possible strong adversarial attack is better than not; if no defenses are available, a party that would otherwise deploy a vulnerable model now at least has the option not to. I might argue differently if there was already evidence of Trojans inflicting real-life harm.

Thank you for getting to the bottom of this piece. Neural Trojans are a big modern security concern, but they also represent an impactful research opportunity with spillover effects into future AI safety research. I’m looking forward to seeing submissions to the Trojan Detection Challenge.

]]>

Appreciation post for measured speech

2022-04-29T00:00:00-05:00

My dad often speaks infuriatingly slowly. When he finishes his spiels, I sometimes feel underwhelmed. The underwhelmedness comes from some sentiment that I could have generated similar speech in much less time.

Over the past year or so, however, encounters with people who speak rapidly have had me rethinking my attitude. I’ve noticed that I am sometimes impressed by people who talk fast. Usually they are, in fact, smart. I’ve also noticed that I am rarely persuaded. I update slightly in the direction of fast talkers being correct because they seem smart, or because I have a vague impression that they’ve thought about things for a while. But this is far from the state of being soundly convinced.

Often an unqualified observer, I find that information densities of rapidly-delivered speech and slowly-delivered speech feel comparable. This is why I imagine I’m able to generate speech more quickly than my dad - I’m responding to something like a probability distribution over word sequences rather than the words’ contextual significance, which I haven’t yet grokked. So measured speech doesn’t clearly have the advantage of perceived quality. What it does seem to have are trust and memorability.

My dad’s had more success than most anyone at convincing me of things he believes. There’s the obvious bit where his speaking slowly means that I know he’s thought critically about what he’s saying. What surprises me repeatedly, though, is how this trust interplays with the fact that I unwittingly keep a playback of his words in my head, possible because they were elegantly phrased and brief. I’ll scoff in the moment, but hours later I’m turning over his words again. I suppose I unconsciously give him the benefit of the doubt since I trust he was being thoughtful, and I continue to search for the signal that I didn’t catch the first time around. The signal is almost always there.

My dad and people like him get extra trust points in particular because I know they could speak rapidly if they wanted to and I wouldn’t be able to distinguish low-quality words from high-quality words. When they still choose to speak slowly, I have pretty good evidence that they care more about sharing truth than inflating their egos.

(A few other disadvantages of rapid speech: sometimes when a person talks very fast I’m unable to parse their words in the amount of time deemed socially acceptable for response generation, so the conversation ends early and I walk away having not addressed personal cruxes. Sometimes they make too many disputable claims and I can only respond to one, so the conversation branches and I haven’t taken away the main point. Sometimes I just have no idea what they’re saying.)

In short, I’ll claim that fast speech impresses, but measured speech persuades. Psychology seems to support this with a few caveats, although I won’t endorse any particular study. I am not suggesting that measured speech is always tactical. Maybe you want the credibility lent by fast speech that could pay out in the long term. Or maybe you just don’t have great arguments you can articulate at all, fast or slow. But if your goal is to persuade and you’re confident, you’ll do better (at least with me) if you think before you talk.

]]>

Network neuroscience and artificial analogues

2022-03-20T00:00:00-05:00

Introduction

Brain and artificial neural networks are complex, tightly coupled systems whose cognition arises from countless interactions among subcomponents. Consequently, neither are amenable to clean theoretical formulation. Historically neuroscientists have stood by “localizationism,” a coarse framework that attributes brain functions to certain physical regions of the brain. While localizationist models have significantly advanced the state of neuroscience since Paul Broca’s seminal lesion work in 1861, advances in neuroimaging and the availability of “big data” are challenging the assumption that clear anatomical correlates exist for many functional constructs. One recent Stanford study demonstrates that expert-generated proposals for brain region categorization, such as that of the Research Domain Criteria project, do not align with circuits associated with functional domains sourced from some 18,000 papers leveraging fMRI data. It seems that the brain, with its near 100 billion neurons and 1,000 trillion synapses, is best understood in its totality - that is, in terms of the many elements and relationships between its neurobiological systems at all scales. The application of network science to neuroscience is an emerging field known as network neuroscience.

Brain function opacity is mirrored in modern deep learning, which features a strong connectionist philosophy. Deep learning is an intuition-driven, experiment-validated science. Research often consists of hundreds of trials spent searching the architecture and hyperparameter space for a winning final network configuration; little is offered in the way of guarantees over model behavior, performance or complexity. Interpretability techniques can arguably discover what portions of inputs are important to model predictions, but they say little about how the inputs are important to the predictions and what latent representations are developed. Inspection of model internals has, as in neuroscience, been largely localized or conducted under search constraints.

Neuroscience and machine learning frequently exhibit a fault that is the employment of intuitive abstractions which lack correspondence to real-world phenomena. For example, neuroscientific literature has extensively studied a potential frontal lateralization that would vindicate a longstanding theory of emotional valence from psychology. However, fMRI studies have shown that these hemisphere-level models are largely untenable, with responses to positive and negative emotional stimuli relatively distributed. Machine learning researchers similarly suffer from leaky abstractions, often in the form of anthropomorphic projections. Two issues arise when one attempts to map an incoherent cognitive process to a region in a neural network, artificial or otherwise: the cognitive process may be implemented in a distributed fashion over many disparate modules (a “one-to-many” problem); and one module might be involved in many cognitive processes (a “many-to-one” problem). To mitigate these issues, abstractions used to describe network function should be designed from the bottom up.

A network account of the brain is humble. It makes limited assumptions about the involvement of modules in any particular type of computation, accounting for potential long-range dependencies. Studies of networks motivate empirical, rather than theoretical, investigation: rather than constraining search to activity that correlates with preconceived notions of cognition, network science allows arbitrary patterns of computation to emerge that scientists can identify and classify after the fact. Network neuroscience and its artificial analogues augment three domains I’m excited about:

Interpretation: explaining and predicting the behavior of networks with network analysis.
Generation: developing models that generate synthetic networks which can be used for simulation or downstream tasks.
Control: leveraging a fine-grained understanding of the influence of certain nodes and edges to perturb or manipulate networks for specific purposes.

I am especially interested in the synthesis of network approaches across neuroscience and machine learning since it appears that existing research in both fields is ripe for transfer. Because the brain and artificial neural networks are differentially amenable to different types of network analysis, I anticipate that future insights in one field may accelerate progress in the other.

Interpretation

Modeling relationships between components in a network aids understanding of the computation of the network and predicts external properties. For instance, neuroscientists may be able to more accurately characterize diseases and their symptoms as well as predict their onset in early stages by assessing certain graph-theoretic metrics. Network neuroscience borrows from topological data analysis in addition to graph theory; for instance, by translating data into simplicial complexes and computing persistent homology.

Schizophrenia is one disease that cannot be understood without a picture of whole-brain functional connectivity. One can build functional connectomes by observing statistical association between the time series of nodes in a brain network, typically anatomically defined regions of interest (ROIs). These time series derive from resting-state or task-based functional magnetic resonance imaging (fMRI) studies that measure the activity of the ROIs over a scanning period. In the construction of the connectome, nodes are the ROIs and edges represent the strength of connectivity between a pair of nodes. This strength can be measured with metrics such as correlation, partial correlation, Granger causality, or transfer entropy. Liu et al. show that the functional networks of schizophrenic individuals have reduced “small-worldness” compared to healthy individuals, meaning that the normal efficient topological structure with high local clustering is disturbed: absolute path length increases and clustering coefficients decrease. Schizophrenic individuals additionally exhibit significantly lower connectivity strength, i.e. the mean of all pairwise correlations in the functional connectome, which potentially explains patients’ impaired verbal fluency in terms of reduced processing speed.

Networks can be analyzed at different scales and of different types. Integrating neurobiological networks at the micro- and macro-level can help us understand the full path from genetic factors to brain connectivity to behavior. For example, joint analyses of brain and social networks have shown that an individual’s functional connectivity patterns predict aspects of ego-network structure. Multilayer networks - networks involving different edge types with each edge type encoded in a separate layer - can capture temporal interdependencies or combine information from complementary imaging modalities. The study of network dynamics, i.e. how functional networks emerge over fixed anatomical networks, may help neuroscientists and machine learning researchers alike understand the importance of structural priors. Structure-function coupling, the ability of anatomical connectivity to predict the strength of functional connectivity, is supposed to reflect the functional specialization of cortical region. Indeed, the specialized and evolutionarily conserved somatosensory cortex exhibits strong coupling while highly expanded transmodal areas exhibit weak coupling, supporting cognitive flexibility. Structure-function relationships bear on developmental psychology: significant age-related increases in structure-function coupling are observed in the association cortex, while age-related decreases in coupling are observed in the sensorimotor cortex. These results signal the development of white matter architecture throughout adolescence to support more complex cognitive capabilities. Variable emergence of coupling may explain intraindividual variance in children’s learning patterns.

Techniques from network neuroscience have the potential to improve machine learning researchers’ understanding of artificial neural networks. OpenAI’s interpretability team kicked off a research agenda in 2020 studying subgraphs in neural networks that purportedly implement meaningful algorithms, such as curve detection in visual processing. These subgraphs are called circuits. They ingest, integrate, and transform features, propagating information along weights between neurons. Unlike neuroscience, machine learning interpretability benefits from having the entire neural connectome available. Simply having a full anatomical connectome, however, is not enough to understand underlying computation. “Could a Neuroscientist Understand a Microprocessor?” uses traditional neuroscientific techniques, such as analyses of anatomical connectomes and tuning curves, to reverse-engineer a microprocessor. The authors point out that without a clear map from anatomy to function, circuit motifs are meaningless. The same could be said for circuit motifs in artificial neural networks. OpenAI’s circuit work attempts to ascribe function to circuits by interpreting feature visualizations, dataset examples, and conducting other analyses. This requires human labor and does not seem scalable in the long run. The application of functional connectomics and structure-function relationship analysis to neural networks is a viable solution. One work investigates topological differences in Trojaned and clean deep neural networks by comparison of persistence diagrams, showing that Trojaned models have significantly smaller average death of the 0D homology class and maximum persistence of the 1D homology class.

Graph neural networks (GNNs) may be natural tools for the study of biological and artificial neural networks because they take graphs as input. Over several layers, each node updates its representation as a function of its neighbors. The GNN uses final node representations to make node-level, edge-level, or graph-level predictions. One could train a GNN over a functional connectome, for instance, to predict whether or not a patient has depression; or possibly over many artificial neural networks to detect universal circuits. BrainGNN is one instance of a GNN used for fMRI analysis. Its graph convolutional layers implicitly assign ROIs to communities to compute weight matrices, discovering interpretable task-related brain community structure. Its ROI pooling layers retain only the most salient ROIs for the predictive task, automatically finding biomarkers. (Check out a beginner-friendly tutorial by my friends and I in which we implement cGCN, a GNN for fMRI analysis.)

BrainGNN pipeline.

Generation

Generative models compress statistically regular network topologies to create synthetic networks. They use wiring rules or structural constraints to build neurobiologically plausible models. For example, Watts and Strogatz’s seminal “Collective dynamics of ‘small-world’ networks” randomly rewires edges in a ring lattice to create networks exhibiting the small-worldness property previously discussed. These sorts of models hypothesize mechanistic sources of certain real-world network phenomena, such as long-tailedness, and are easily comparable with real-world networks for verification. Generative models can be used to simulate diseases, or facilitate in silico experiments where scientists attempt to steer trajectories of neural circuitry. In turn, the construction of synthetic biological networks can inform machine learning researchers’ architectural design and selection.

The pairwise maximum entropy model has proven to be one simple but powerful model for inferring functional network topology from structural network topology and vice versa. This method defines a probability distribution over the 2^N possible states for N neurons, since neurons either spike or remain silent. To measure the effectiveness of a pairwise description of a functional network, the model computes the reduction in entropy from a distribution given only first-order correlations to second-order correlations to the total reduction in entropy given network correlations at all orders, or the multi-information. An early study demonstrated that maximum entropy models could capture 90% of multi-information for various cell populations. Later, this model was used to accurately and robustly fit BOLD signals in resting-state human brain networks.

Synthetic networks fit to brain networks can serve as bioinspiration for practical artificial neural networks, potentially reducing parameters and increasing interpretability. Spatial neural networks follow generative neurobiological models that impose costs for long connections between neurons, encouraging efficient grouping. Neurons are assigned two-dimensional learned spatial features. Each layer receives a penalty that is the sum of the spatial distances between that layers’ nodes and the neurons to which they are connected. The authors train the spatial network on two relatively independent classification tasks, MNIST and FashionMNIST. They find that two subnetworks emerge, one for each task, that perform only slightly worse on each task than the full network. By comparison, split full-connected networks perform considerably worse. Pushing neural networks towards such modularity may be helpful for human interpretation and explanation. Conversely, artificial neural networks are useful for interpreting and building synthetic biological neural networks; while not explicitly designed to simulate biological neural networks, similar features can and do emerge. Researchers have found that layers in convolutional neural networks (CNNs) can predict responses of corresponding layers in the higher visual cortex, demonstrating the importance of hierarchical processing. The CNNs thus serve as generative models for neuronal site activity.

Neural architecture search, which strongly overlaps with the field of meta-learning, looks for optimal neural network topologies. Hypernetworks can aid neural architecture search by generating weights for a given architecture, an alternative to training neural networks from scratch. Classically hypernetworks have used 3D tensor encoding schemes or string serializations to map from architectures to weights. These fail to explicitly model architecture network topology, so Zhang et al. construct graph hypernetworks (GHNs). GHNs learn a GNN over neural architectures represented as computation graphs, generating weights for each node with a multilayer perceptron given that node’s final embedding. Performance with generated weights are well-correlated with final performance. Architectures with weights generated by GHNs achieve remarkably high prediction accuracy, with some at 77% on CIFAR-10 and others at 48% top-5 accuracy on ImageNet.

Generating weights with a GHN.

Deep generative models of graphs, similar to GHNs, can be used to model artificial neural networks and biological neural networks alike. Typically generative models in network neuroscience use hand-designed wiring rules, which is helpful for the interpretation of network properties that emerge as a result of these rules. However, they may lack fidelity and scale poorly to complex topologies that are not human-interpretable. Deep generative models allow for topological constraints to be learned instead and produce graphs faithful to empirical networks. GraphRNN is one such example, an autoregressive model that iteratively adds nodes to a graph and predicts edges to previous nodes. GNNs are also useful for generation since they learn rich structural information from observed graphs; Li et al. condition on GNN-produced graph representations to determine whether or not to add nodes or edges. The application of these methods to brain networks, structural or otherwise, appears relatively unexplored.

Control

Beyond diagnoses, the modeling of brain networks and the generation of synthetic networks serve the cause of goal-directed network manipulation. Network manipulation is important for safely conducting disease intervention, building brain-computer interfaces, and developing neuroprosthetics. Exogenous inputs to neural systems include lesions, brain stimulation, and task priming.

The intersection of network control theory and neuroscience has proved useful. Yan et al. applied network control principles to C. elegans, annotating brain states at certain time steps as linear dynamical systems by adding adjacency-matrix-selected previous states of nodes to the effects of stimuli, such as anterior gentle touch, applied to receptor neurons. The cells to be controlled were predicted to be controllable if, with sufficient stimuli, the states of those cells could reach any position in the state space. These control principles were able to predict which neurons were critical in the worms’ response to gentle touch, because their ablation resulted in a decrease in the number of independently controllable cells. Similar processes can be conducted in human connectomes: Gu et al. construct from diffusion spectrum imaging data a controllability Gramian, a matrix whose eigenvalues and structure signal the magnitude and selection of control areas that may optimize cognitive function. The Gramian indicates that the human brain is theoretically, but not easily, globally controllable via a single brain region. While highly connected regions are able to move the brain to easily reachable states, weakly connected regions critically move the brain into difficult-to-reach states, such as high performance states as measured by IQ. These results challenge notions that only well-connected hubs should be targets for intervention, painting a more nuanced view of network controllability. Control-theoretic approaches in the neuroscience community could be used in machine learning work on model editing and controllable outputs, and appear as of yet uninvestigated.

Average controllability is correlated with node degree. Modal controllability (ability to move brain into difficult-to-reach states) is anticorrelated with node degree. Boundary controllability (ability to couple or decouple different cognitive systems) is not strongly correlated or anticorrelated with node degree.

Some work has attempted to track information flow across artificial neural networks with the motivation of reducing undesirable biases, as well as serve as test grounds for reward circuit manipulation in the brain. Pruning edges with high mutual information with a message, such as a protected attribute, produce larger reductions in bias at the output. These edges are therefore potential targets for intervention. Neuroscientific experiments may mirror this approach to narrow down neural targets for optogenetic stimulation. Additionally, feature attribution methods used to explain machine learning models, such as integrated gradients, could be employed by neuroscientists to identify computation-relevant sites in the brain. Integrated gradients attribute predictions of machine learning models to input features by calculating gradients of output with respect to features of input along an integral path. One could imagine analogously applying a stimulus to various target sites in a biological neural network and scaling the magnitude of this stimulus to calculate an integrated gradient, thus attributing the activity of another region in the network to the target site.

Instead of performing manual inspection, there is room to use deep learning, e.g. GNNs, to predict the effect of a perturbation on a biological or artificial neural network. One could use synthetic biological networks to simulate trajectories away from maladaptive network topologies. Or in GNN inference, machine learning engineers could occlude or alter certain subgraphs in brain networks and observe effects on predictions. For instance, changing the value of a feature of an ROI might cause a GNN to predict that a diseased brain is now healthy. Temporal Graph Networks, which assign memory states to nodes, could predict new functional or structural edges in brain networks based on past activities. One could ablate a portion of a biological or artificial neural network and observe the difference in the evolution of that network over time. Such a study could confer greater insight into the mechanisms underlying neuroplasticity, or the recovery of ablated artificial neural networks.

Conclusion

Network science offers an integrative perspective on artificial and biological neural networks. Empirical investigation and computational modeling of the activity of not only localized regions, but their multitudinous spatiotemporal interactions, capture the hierarchical and distributed computation that lend the brain and machine learning models their efficiency and great expressive power. Bridging network science-inspired methods in artificial intelligence and neuroscience may unlock powerful insights as well as establish a common lingo to enable interdisciplinary synthesis.

]]>

A timeline of the Bay-Delta Conservation Plan

2022-03-18T00:00:00-05:00

What’s the Bay-Delta?

The Sacramento-San Joaquin River Delta and the San Francisco Bay comprise the Bay-Delta watershed. The Delta is fed by winter rains and runoff from the Sierra Nevada and southern Cascades. The Sacramento and San Joaquin Rivers meet and drain into the San Francisco Bay. Interestingly, the Delta is actually an inverted delta, meaning that river channels converge in the downstream direction.

The Bay-Delta is the nexus of California’s water supply, providing drinking water to some 27 million residents (around 75% of the state’s population) and irrigation to 4 million acres of farmland. It’s a crucial migratory zone for birds and fish such as the Chinook salmon.

From the USGS.

The Bay-Delta is threatened by a fundamental tradeoff between its reliability as a water supply and the health of its ecosystem. Delta outflow is at the heart of this issue. The California State Water Project and the Central Valley Project export water from the southern Delta to farms and urban centers. This water diversion threatens fish migration routes, decreases freshwater supply for communities living on the Delta, and induces toxic algal blooms. California has long struggled to push legislation that strikes the right balance between environmentalism and economic interests. The issue of Bay-Delta restoration has become a subject of intense controversy in public spheres, from environmentalist communities to the print media.

A California newspaper contests its sister.

Who’s done what?

In 1980 state lawmakers approved the construction of a peripheral canal around the Delta to move water from Northern California to Central and Southern California, with the intent of reducing unnatural flows from pumping that confuse anadromous fish. It took days for farmers in the Delta to organize a referendum campaign. The Peripheral Canal was ultimately vetoed in 1982 with unanimous support from farmers and environmentalists.

The Schwarzenegger administration started work on a followup Bay-Delta Conservation Plan (BDCP) in 2006 focused on “co-equal goals” of protecting the Delta ecosystem and establishing a reliable water supply. It suggested that over 100,000 acres of habitat would be restored over a 50-year time period. In an update to the plan in 2009, Jerry Brown proposed the addition of two 30-mile twin tunnels that would divert Sacramento River water underneath the Delta via gravity and replace existing pumping systems for a total cost of $25 billion. These tunnels were largely an uninnovative revival of the old 20th century peripheral canal plan.

The BDCP sparked heated debate. Concerns arose over the millions of dollars in economic activity that would be lost by the conversion of farmland into wetlands. Environmentalists said supply could be managed by water conservation efforts. They were more concerned by the reduced flow through the Delta since it would allow salt water from the Bay to creep upstream, arguing that the BDCP failed to describe a specific recovery plan for endangered species. The delta smelt in particular was emblematic of a tension between farmers and conservationists, becoming critically endangered as a result of Delta degradation and representing a threat to farmers with crops on the line.

Environmental concerns aside, the BDCP tunnel costs were high and the project financial plan was obscure. Residents in Stanislaus, Merced and San Joaquin counties worried that their dams and turbines would be rendered useless by river rerouting necessities. On the other hand, proponents believed the project would address risks of levee failure. Some noted that time was running out, species were on the brink of extinction, and complete abandonment of the plan would be dangerous.

The U.S. Fish and Wildlife Service would not issue permits for the BDCP in light of its lack of scientific rigor. As a result, Brown drastically scaled down the BDCP in 2015, rebranding it as the Water Fix and Eco Restore project. Eco Restore reduced promised acres for restoration to 30,000 over 5 years, lowering costs from the $8 billion previously required to $300 million. Brown doubled down on the twin tunnels aspect of the BDCP in Water Fix, much to public aggravation.

Delta smelt migrate upstream in the winter for spawning. Meme from Restore the Delta.

2019 saw further downsizing of the conservation effort by the Newsom administration to purportedly save another $5 billion. The Department of Water Resources (DWR) withdrew proposed permits for Water Fix and began preparations for a smaller, single tunnel. The NRDC, a vocal opponent of the Delta twin tunnels, expressed a willingness to consider the new so-called Delta Conveyance Project. After a few months, however, the administration stated a preliminary cost estimate of $15 billion, barely a save on Water Fix. The plan remains highly controversial. Currently the DWR is scoping out environmental impact, engaging with the public, and planning in advance for key regulatory processes. And that brings us to our current moment in 2022.

So…

California has some serious water problems. Legislators have tried for decades to get a serious restoration project underway, but conflicting interests have plans continuously scrapped and thousands of pages of documents flushed down the drain. Proposed new plans are all essentially variants of the same problematic piping solution. In the meantime, the Bay-Delta watershed continues to rapidly deteriorate.

]]>

Why do brick-and-mortar stores invest so little in signage?

2021-11-11T00:00:00-06:00

When I walk around strip malls or downtown areas I often feel confused. Storefronts are bafflingly ugly.

Familiar tacky red lettering.

Good signage reliably drives foot traffic. Multiple regression and ARIMA models on sales data from retail sites suggest non-trivial sign modifications boost weekly revenue by 5 to 15 percent, with low-performing businesses deriving the greatest benefits. Around 30 percent of consumers report having visited new stores on the basis of sign quality. But we don’t need fancy statistics to intuit the value of branding. How much more likely are you to visit the store on the right vs. the left?

Sanity check: standard front-lit channel letter signs cost around $3,000-$7,000 including installation. Hiring a designer costs another $100-$1,000. If we’re using LEDs energy costs are negligible. Say a store makes $300,000 a year and replacing its sign results in a 5% increase in sales. Over the course of 5 years we’re getting around 900% ROI, conservatively speaking.

What’s going on? I don’t suppose “ugliness” has some sort of single upstream cause. But I expect a few explanations to account for the majority of signs for which replacement makes economic sense.

One theory is that owners of shopping centers enforce strict signage criteria on their tenants. Lease agreements indeed often include multi-page guidelines for sign design and installation. Landlords may reasonably want avoid negotations with each of their tenants when they request sign additions or modifications, or unpleasantries when tenants put up bad signs without permission that hurt business. Conversely, it’s a hassle for tenants to negotiate signage rights with landlords who just don’t get it. Mall owners also face space restrictions and zoning laws that are easily dealt with by standardizing style. In the name of “assuring an outstanding shopping center,” Spoerlein Commons of Buffalo Grove, Illinois mandates that “colors for double-faced, under canopy signage will be limited to red letters of a stained white wood background.” Here’s the result of that:

There’s also the inconvenience of obtaining a sign permit. Review processes can take several weeks in some cities and often must be handled by a licensed signage contractor. Illuminated signs, usually the most effective for customer engagement, are more difficult to coordinate permits for since many municipalities require annual renewal and special electrical signage contractors. So securing a sweet new sign isn’t as simple as sending money to a designer. You get to navigate lengthy legal codes and pay third parties lots of fees!

It’s a bear just to figure out what you want. Good graphic designers are notoriously hard to find: the freelance space, where one is most likely to nab designers willing to work on short-term projects, is crowded with dilettantes. I imagine this problem is exacerbated by the fact that many local sign companies are one-stop shops who throw mediocre design services into their fabrication, installation, and maintenance bundles. It’s tempting to skimp on quality to save money and time.

Small business owners are understandably risk-averse. The JPMorgan Chase Institute reports that the median US small business holds a cash buffer of less than one month, with average daily outflows of $374 and average daily inflows of $381. For businesses living from day to day, relying on loans or personal funds to deal with unanticipated expenses, a several hundred- or thousand-dollar signage investment is a potential hundred- or thousand-dollar loss that may be simply unacceptable. I don’t know that the relative dependability of returns on branding efforts is common knowledge, or if patience is a particular virtue of the average retailer.

There’s the somewhat uncharitable hypothesis that retailers just don’t understand what good signage looks like. I think this doesn’t make that much sense because human aesthetic preference is fairly uniform (hence convergence on e.g. certain graphics trends or architectural styles). We don’t need to be professional designers to tell apart good-looking things from bad-looking things. The more important part of the story seems to be retailers underestimating the value of good-looking things. Restaurant owners are no masters of electrical infrastructure, but they hire electricians because they recognize that electricity is central to the function of their kitchen. One has to be significantly more farsighted than that to appreciate a sign’s long-term effect on customer behavior. Of course markets’ chronic undervaluation of design extends beyond retail and has been a pain point for designers for years. One explanation for this is the effect of arbitrary coherence, where designer clientele anchor to a low initial price set by some less-than-professional competitor.

So those are a few plausible reasons why storefronts look like deceptively quick fixes. I’ll probably still complain about them, but I’ll do so with greater empathy.

]]>