Massively parallel characterization of insulator activity across the genome | Nature Communications
HomeHome > Blog > Massively parallel characterization of insulator activity across the genome | Nature Communications

Massively parallel characterization of insulator activity across the genome | Nature Communications

Oct 18, 2024

Nature Communications volume 15, Article number: 8350 (2024) Cite this article

599 Accesses

1 Altmetric

Metrics details

A key question in regulatory genomics is whether cis-regulatory elements (CREs) are modular elements that can function anywhere in the genome, or whether they are adapted to certain genomic locations. To distinguish between these possibilities we develop MPIRE (Massively Parallel Integrated Regulatory Elements), a technology for recurrently assaying CREs at thousands of defined locations across the genome in parallel. MPIRE allows us to separate the intrinsic activity of CREs from the effects of their genomic environments. We apply MPIRE to assay three insulator sequences at thousands of genomic locations and find that each insulator functions in locations with distinguishable properties. All three insulators can block enhancers, but each insulator blocks specific enhancers at specific locations. However, only ALOXE3 appears to block heterochromatin silencing. We conclude that insulator function is highly context dependent and that MPIRE is a robust method for revealing the context dependencies of CREs.

Complex patterns of gene regulation are controlled by combinations of cis-regulatory elements (CREs), including enhancers, promoters, and insulators. How CREs interact with each other and their surrounding chromatin environments is a subject of intense debate in the field. In one model, CREs are modular and independent elements, having few specific interactions with other CREs or their chromatin environment1,2,3. Alternatively, the activities of CREs may be highly context-specific because of interactions between CREs and their genomic environments4,5,6. Distinguishing between these two models for any given type of CRE will require methods that measure the activities of the same CREs across many locations in the genome. Identifying the properties of genomic environments that account for any context-dependent CRE activities, will further require that we measure the activities of many CREs at the same genomic locations.

Massively Parallel Reporter Assays (MPRAs) are a family of approaches that allow investigators to measure the activities of CREs en masse, but these techniques generally rely on plasmid-based reporter genes, which do not account for the interaction of CREs with their surrounding environments. To study how CREs interact with their genomic environments, Akhtar et al. developed the Thousands of Reporters Integrated in Parallel (TRIP) assay, which measures the activity of a single CRE at many locations across the genome7. Because each CRE is integrated randomly into the genome, CREs cannot be compared directly to each other in the same genomic locations. To address this limitation, we developed patchMPRA, where ‘landing pads’ are integrated into the genome, allowing us to measure the activities of large CRE libraries at fixed locations in the genome1. However, patchMPRA is limited in the number of locations that can be tested simultaneously and does not cover a wide diversity of contexts in the genome.

To gain a comprehensive understanding of how different CREs behave in different genomic contexts and reveal the determinants of context-dependent CRE activity, we need to measure the activities of many CREs across the same set of diverse locations in the genome. By keeping the genomic locations constant, we can systematically identify the components of genomic environments that provide context for each CRE. To address this technology gap, we developed MPIRE, a method that measures the effects of different CREs at thousands of defined genomic locations simultaneously.

Here we apply MPIRE to study insulator sequences. Insulators are CREs that can block enhancers from activating target promoters or act as barriers that halt the spread of heterochromatin. These functions prevent cross-talk between the cis-regulatory programs that control development and homeostasis. The name insulator derives from their ability to protect transgenes from genomic position effects, an important function in gene therapy and biotechnology applications that require high levels of sustained transgene expression. In theory, flanking transgenes with insulators protects them from position effects, but in practice, efforts to insulate transgenes meet with mixed success because the contextual requirements for insulator function in the genome are not well understood. We applied MPIRE to insulator sequences to investigate the extent to which insulators depend on contextual information supplied by their genomic environments.

We generate the largest characterization of three insulator sequences (cHS4, A2 and ALOXE3) across more than 10,000 locations in the genome. Since insulator activity is thought to depend on either CTCF binding sites (cHS4 and A2) or B box motifs (ALOXE3), we also measure the effects of mutations in the respective motifs at the same genomic locations. We show that each insulator functions at specific and distinct locations in the genome, with each insulator blocking enhancers bound by unique sets of transcription factors. This insulator activity is largely dependent on CTCF or the B box motif respectively. While all insulators can block enhancers in the genome, only ALOXE3 can act as a heterochromatin barrier. Overall, MPIRE is a robust, reproducible method that can be used to measure the activities of any type of CRE across thousands of diverse genomic environments and unravel the determinants of context-dependent CRE activity.

MPIRE is uniquely suited for studying insulator sequences for several reasons. First, insulators are the least well-defined class of CREs. There are at least two definitions of an insulator. One is that insulators are barriers that halt the spread of flanking heterochromatin, while another is that insulators are enhancer-blockers that prevent enhancers from activating their target promoters when placed between the enhancer and promoter8. Some insulators block both heterochromatin and enhancers, while others only have one of the functions. Systematic studies across the genome are required to determine whether the enhancer blocking and heterochromatin barrier functions of insulators are generally separable. Second, insulators tend to be highly context-specific. Enhancer-blocking assays are typically performed with one enhancer-promoter pair on plasmids, but some insulators only block specific enhancer-promoter pairs9,10,11. A recent study of insulators using a variant of the TRIP assay showed that insulators function at only a relatively small number of genomic locations, suggesting that insulator activity is context-dependent6. However, because the insulators were not directly compared in the same genomic locations, this study had limited power to decipher the genomic requirements for context-dependent insulator activity. Finally, the sequence features that contribute to insulator activity are not well understood. Systematically assaying insulator variants alongside their wild-type counterparts at the same genomic locations will allow us to define the critical sequences that comprise insulator function. Thus, in this study, we applied MPIRE to insulator sequences.

The goal of MPIRE is to integrate reporter genes carrying different CREs in large numbers of locations in parallel. The key reagents in MPIRE are pools of cells that carry large numbers of genome-integrated landing pads1,12. The landing pads are integrated randomly into K562 cells where they can serve as sites for recombination-mediated cassette exchange (Fig. 1A). We use the Sleeping Beauty transposon system to deposit multiple landing pads per cell in an unbiased manner throughout the genome13. We selected φC31 and BxbI attP sites for recombination because they were previously shown to be highly efficient14. Each individual landing pad integration carries a unique genomic barcode (gBC) that serves to identify the chromosomal location of individual landing pads. In total, we generated 6 cell pools (Pools 1-6), with each pool containing between 2000-6000 uniquely mapped gBCs (Supplementary Fig. 1A, Supplementary Data 1).

A Overview of MPIRE method. Pools of cells are generated by integrating landing pads driven by the Cytomegalovirus (CMV) promoter into the genome using SB transposase. The gBCs are mapped to a unique location in the genome, and can be used to recombine CREs for parallel measurements of cis-regulatory activity across the genome. gBC: genomic barcode. ITR: inverted terminal repeats. CRE: cis-regulatory element. cBC: CRE barcode. B Proportion of landing pads mapped to a unique location that were recovered after recombination with different CRE constructs. C, D Distribution of the number of recombination events per location for an uninsulated construct (C) and construct with the A2 insulator (D). Source data are provided as a Source Data file.

Once the pools are generated, reporter genes carrying different insulators can be integrated into the landing pad pools simultaneously. Reporter genes driven by the hsp68 promoter with different insulator sequences were cloned between recombination sites for integration into landing pad pools (Fig. 1A) and marked with different cis-regulatory library barcodes (cBCs) so that the assay can be multiplexed. The reporters are pooled and recombined into each landing pad pool. After recombination, the expressed mRNAs from each landing pad will contain two barcodes: a cBC indicating the identity of the insulator sequence and a gBC indicating the genomic location of the landing pad. Upon successful recombination the mScarlet reporter gene replaces the mEmerald marker in the landing pad, which provides priming sites to specifically amplify mRNAs from successful recombinations. We then sequence and count the number of DNA and RNA BC-pairs to calculate expression of each reporter at each location in the genome.

We first verified the utility of landing pad pools for MPIRE. For these analyses, we focused on Pool 1. Using qPCR, we estimated that each cell carries approximately 11 independent landing pads. The landing pads are well-distributed across the genome (Supplementary Fig. 1B), consistent with previous observations that Sleeping Beauty integrates in an unbiased manner13. Expression from these landing pads span a large range of expression ( > 104-fold) indicating that the integrations are sampling diverse genomic environments (Supplementary Fig. 1C, Supplementary Data 2). To test whether different landing pads in the same cells might recombine with each other, we transfected cells with integrase only (no CRE library) and found that the landing pads have similar expression levels and mostly map back to the same locations (Supplementary Fig. 1D and E). Thus, the landing pads in our pools represent diverse genomic environments and landing pads in the same cell do not interfere with each other.

We next evaluated the efficacy of large-scale recombination across locations in two additional pools (Pools 7 & 8). We measured the integration efficiency of two reporter genes, one containing only the minimal hsp68 promoter and one that also contains the A2 insulator15 to test whether the insulator sequence might impact recombination efficiencies (Fig. 2A). We barcoded each reporter gene with a highly diverse library of random barcodes (rBCs) and transfected them independently into both pools of cells. Since the library contains ~500,000 rBCs, each integration is likely associated with a unique rBC such that the number of rBCs per gBC represents the number of independent integrations at each location. We obtained barcodes from ~80% of all mapped landing pads for both the no insulator and A2 constructs (Fig. 1B). For each landing pad, we obtained a median of 9 independent integrations for the no insulator reporter and 7 independent integrations for the A2 reporter (Fig. 1C & D) for a total of 18,492 and 19,031 integrations, respectively. These results demonstrate that MPIRE is a robust method for measuring cis-regulatory activity at a large number of defined locations across the genome.

A Insulator constructs and mutants used in this experiment. B Insulator activity on plasmids measured by qPCR calculated as fold-change over the uninsulated construct (dotted line denotes fold-change = 1). Error bars represent the SEM from three biological replicates, with each replicate plotted as points on the plot. P-values were calculated using a two-sided, paired Student’s t-test. C Reproducibility of measurements from two biological replicates. Correlation indicated is Pearson’s r. Color represents the density of neighboring points. D Expression distribution of each insulator across all locations. n indicates the number of locations measured for each construct. The boxplot center corresponds to the median, and the box extends to the first and third quartiles respectively, with the lines extending to 1.5x of the interquartile range. E Heatmap of expression across the genome, where each column represents a different location. Only locations where the ‘no insulator’ construct was measured is plotted to allow for comparisons against the uninsulated reporter. Source data are provided as a Source Data file.

We selected three previously characterized insulators to test by MPIRE (Fig. 2A). cHS4 is the most well-characterized insulator and is derived from 5′ end of the chicken β-globin locus16. A2 was identified in a screen of endogenous CTCF-containing sequences for enhancer-blocking activity15, and ALOXE3 tDNA was discovered in a screen for tDNA sequences with insulator activity17. These studies showed that all three insulators function in canonical enhancer-blocking assays in K562 cells. The enhancer-blocking activity of cHS4 depends on its CTCF binding site18,19, while the enhancer-blocking activity of ALOXE3 depends on its two B box motifs, which recruit RNA polymerase III20. CTCF binds to A2 at 100% occupancy in the K562 genome, suggesting that CTCF is likely to be important for A2 activity15. To test the importance of binding motifs we mutated the CTCF binding sites in cHS4 and A2 (cHS4-mut and A2-mut, respectively) and deleted both instances of the B box motif in ALOXE3 (ALOXE3-mut) (Fig. 2A). The cHS4-mut sequence was previously shown to abolish CTCF binding19, while the A2-mut sequence was designed to mutate the most important nucleotides in the CTCF position-weight matrix (Supplementary Fig 2A). In addition, we scrambled the A2 insulator (A2-scrambled) to test the effects of a random DNA sequence that lacks identifiable cis-regulatory elements.

Because MPIRE is designed with a gBC in the landing pad that has to be transcribed from the reporter, we deliberately chose not to include a second copy of the insulator downstream of the reporter. A second copy would be transcribed, where it could influence expression through effects on mRNA stability and processing, thereby confounding our results. While some enhancer-blocking assays are performed with insulators flanking the reporter gene15,16,21,22, others have been done with only one insulator between the enhancer and promoter, including the study that discovered ALOXE317,23,24,25,26. To validate that a single-copy of the insulator is sufficient for enhancer-blocking activity, we first performed a plasmid-based enhancer-blocking assay (Supplementary Fig. 2B). For this experiment, we selected an enhancer that we previously showed was active in K562 cells along with the hsp68 promoter which we use in our MPIRE constructs27. The A2-mut and ALOXE3-mut constructs had less enhancer-blocking activity than their wild-type counterparts in the plasmid assay, but not the cHS4-mut construct. We also measured whether any of the insulator sequences had cis-regulatory activity that was independent of their enhancer-blocking activity. In the absence of an enhancer, the insulators had minimal effects on reporter gene expression (Fig. 2B), with the exception of the A2-mut, which had some independent repressive activity. However, this repressive activity was not recapitulated globally in the genome (see Fig. 2D below). Thus, the insulators we tested do not directly influence promoter activity, but do have enhancer-blocking activity.

We performed MPIRE by pooling the insulator constructs and transfecting them into six pools of landing pad cells. In total, we measured expression from each insulator construct in 11,365-18,031 landing pad locations. The measurements are reproducible across biological replicates consisting of independent transfections (Fig. 2C). The expression of the non-insulated construct was well correlated with the expression of the landing pads before recombination (Supplementary Fig 2C). These results suggest that we were able to obtain accurate and reproducible measurements of multiple cis-regulatory elements across thousands of genomic locations in parallel.

We first asked whether any of the insulators had widespread effects across the genome. In general, while the distributions of expression are slightly different across the genome, insulators did not have a large impact on the overall expression from locations across the genome (Fig. 2D). Formally, this result could arise if insulators increased and decreased expression at different locations in a way that did not alter the overall distribution of expression. However, we find that expression largely depends on genomic location regardless of the identity of the insulator (Fig. 2E), and that the expression of landing pads with and without insulators are well-correlated, with mutants slightly more correlated overall (Supplementary Fig. 2D & E). These results are inconsistent with a model of insulators as modular elements that can function in a wide variety of genomic locations.

While insulators do not appear to have global effects on expression across the genome, there are some locations where expression of the reporter gene with insulators is higher or lower than expected given its location in the genome (Fig. 2E). We sought to identify the specific locations where insulators might be influencing expression. We first calculated fold-changes in expression for each insulated and non-insulated construct at each landing pad location in the genome. The number of landing pads with matched insulated and non-insulated measurements ranged from 10,944-14,390 for each insulator. A positive fold-change indicates a location where expression increases in the presence of an insulator relative to the non-insulated construct, while a negative fold-change indicates decreased expression when the insulator is present at the location. We defined locations as ‘insulated’ if the insulator caused a greater than 2-fold change in expression (in either direction) with false discovery rate <0.05 (Fig. 3A–C, Supplementary Fig. 3A–D). As expected from the lack of global insulator activity, we identified only a small number of genomic locations insulated by each insulator (ranging from 10-15%), with even fewer insulated by the mutant insulators (Fig. 3D). Given the FDR cutoff we expect that approximately 5% of locations will be called by chance, consistent with the proportion of insulated locations called by the mutant locations. Because our construct only includes one insulator upstream of the reporter, we asked if the presence of a second downstream insulator in the genome accounted for locations where we observed insulation. However, insulated locations are not enriched for having a second insulator within 50 kb downstream (defined as the boundary of topologically associated domains, or TADs) (Supplementary Fig 3E), suggesting that the integrated insulators might not require a second insulator for activity. All insulator and mutant constructs can lead to both upregulation and downregulation of expression, but the proportion of locations that are upregulated or downregulated varied by insulator (Fig. 3E). This suggests that insulators may perform more than one function in a context- and insulator sequence-dependent manner. For the rest of this manuscript we refer to upregulated and downregulated locations as insulator-up (eg. A2-up) or insulator-down (eg. A2-down) respectively.

A–C Volcano plot of insulator effect across locations for A2, cHS4 and ALOXE3 respectively. Colored points (insulated locations) represent locations where log2(insulator/no insulator) > 2 and fdr <0.05. D Proportion of locations defined as ‘insulated’ for each construct. E Number of locations that are upregulated (log2(insulator/no insulator) > 0) or downregulated (log2(insulator/no insulator) <0) by the insulator. F Overlap of ‘insulated’ locations shared between different insulator constructs. G Percentage of insulated locations that are motif-dependent, motif-independent, or motif-gained (defined in text) for each insulator. Source data are provided as a Source Data file.

The insulators could be insulating against common features in the genome, or each insulator may act at distinct locations. The overlap between insulated locations for the three insulators is small, with only 148 insulated locations shared between insulators (Fig. 3F), suggesting that each insulator mostly functions at distinct genomic locations. However, this overlap is still greater than expected by chance (bootstrap p-value = 1 × 10−4), indicating that there might be some shared characteristics between insulated locations. The overlap is similar for both up- and downregulated locations (Fig. 4A, B), and most of the shared locations (92%) are insulated in the same direction by all three insulators (Supplementary Fig. 4C). Because A2 and cHS4 both contain CTCF binding sites, we asked if there was more overlap between cHS4/A2 shared regions compared to ALOXE3, but the overlap between any two insulators is similar (Supplementary Fig. 4D). These results suggest that each insulator requires a unique type of genomic environment to function and that CTCF is not sufficient to explain insulator activity.

A Mean normalized histone modifications for insulator-unchanged and insulator-down locations. Histone signals are calculated as the mean of 10 kb surrounding each location and standardized across locations for each modification. Insulator-down locations are enriched for ‘active’ histone modifications. B Mean normalized histone modifications for mutant-unchanged and mutant-down locations. Mutant-down locations are generally not enriched for any histone modifications. C Overlap in enriched TFs between insulator-down and insulator-unchanged locations. The number of TF binding peaks in the 10 kb surrounding each location was used to calculate enrichment by two-sided chi-squared tests. Only TFs that are significantly differentially bound (p < 0.05 after Benjamini & Hochberg (BH) correction) are included in this plot. D, E Violin plot of number of looped enhancers per location (determined by eRNA transcription) for insulator-unchanged vs insulator-down locations (D) or mutant-unchanged vs mutant-down locations (E). p-values (two-sided Wilcoxon test) are shown above each plot. The boxplot center corresponds to the median, and the box extends to the first and third quartiles respectively, with the lines extending to 1.5x of the interquartile range. F For all insulators, enhancers that are looped that the insulator-down locations and upstream of the landing pad were compared for differences in TF binding measured by ChIP (two-sided chi-squared test). The residuals of TFs that are significantly different (p < 0.05) after BH correction was plotted. G Model for how insulators downregulate expression at specific locations. All insulators are located in open chromatin regions as depicted by the nucleosomes. The insulator blocks enhancers interacting with the location from activating expression, leading to downregulation of expression. Source data are provided as a Source Data file.

We next asked if insulator activities depend on their respective CTCF or B-box motifs. The distribution of fold-changes of mutant versus wild-type insulators was very similar to the fold-changes of insulated versus uninsulated constructs, indicating that even a few nucleotide changes in the insulator sequence is sufficient to generate large changes in gene expression at the same location (Supplementary Fig. 5A–C). Using our definition of ‘insulated’ locations above, mutating an insulator can lead to three possible outcomes - the location could still be insulated in the same direction (motif-independent), insulated but in the opposite direction (motif-gained), or could no longer be insulated (motif-dependent). We find that 70-80% of the insulated locations are motif-dependent (Fig. 3G), with the other locations being mostly motif-independent. The proportion of motif-dependent locations is also similar for both up- and downregulated locations (Supplementary Fig. 5D, E), suggesting that the motifs are necessary for both classes of insulator activity. The A2-scrambled control also has higher numbers of motif-dependent locations compared to the A2-mut, suggesting that there might be other sequences in A2 that contribute to insulator activity in addition to the CTCF sites (Supplementary Fig. 5F). Our results suggest that insulators only function in small numbers of locations in the genome and that their CTCF or B-box motifs are necessary for insulator activity.

We next focused on locations where the addition of an insulator leads to a decrease in gene expression. At these downregulated locations, insulators may be blocking the effects of nearby enhancers. Thus, we predicted that down-regulated locations are normally highly active genomic environments and should contain higher levels of active chromatin modifications. Consistent with this prediction, insulator-down locations are indeed highly enriched in active chromatin modifications, and depleted for H3K27me3, a marker of Polycomb-repressed heterochromatin28 (Fig. 4A). Since the insulator is only present on the 5’ end of the reporter, we asked if there was more enrichment upstream compared to downstream of insulator-down locations, but found similar enrichment of signals both upstream and downstream (Supplementary Fig. 6A, B). This is because chromatin modifications occur in broad regions and the landing pads are unlikely to land at the boundary of these regions, which leads to a strong correlation in signals up- and downstream of the landing pads. We did not observe the same enrichment at insulator-down locations by the mutant constructs (Fig. 4B), consistent with the idea that the mutations abrogate insulator activity. Thus, insulator-down locations are enriched for active histone modifications, supporting our model that insulators block enhancers in a motif-dependent fashion at these locations.

Although the same histone modifications were enriched at the insulator-down locations for all three insulators, the different insulators still functioned at distinct genomic locations (Supplementary Fig. 4A). This observation suggests that the insulators are not simply protecting the reporter gene against active chromatin modifications. Instead, each insulator may block distinct transcription factors (TFs) that are present in different genomic environments. In this model, insulators may block TFs that directly occupy the locations surrounding the landing pad, or the insulators may block TFs that bind distal enhancers which loop to these locations.

We first addressed whether TFs that directly occupy insulator-down locations mediate the specificity of different insulators. This model predicts that different TFs will be bound at locations that are downregulated by the different insulators. In contrast to this prediction, when comparing TF binding in the 10 kb surrounding insulator-unchanged and insulator-down locations, the TFs that are enriched in insulator-down locations are similar for each insulator (Fig. 4C). This suggests that the TFs that occupy insulator-down locations do not mediate the specificity of different insulators.

Instead, we hypothesized specificity may arise because each insulator blocks different TFs that bind the distal enhancers that loop to insulator-down locations. To address this hypothesis we identified locations that are looped to each insulator-down location from Hi-C data and overlapped these locations with enhancers identified from a collection of eRNA datasets, or from ‘enhancer’ and ‘transcribed’ annotations from chromHMM29,30. Insulator-down locations tend to interact with more enhancers and transcribed regions than insulator-unchanged regions (Fig. 4D, Supplementary Fig. 6C, D), consistent with the idea that insulator-down locations are in more active genomic regions. This difference was reduced in cHS4-mut-down locations and not present in A2-mut-down and ALOXE3-mut-down locations (Fig. 4E, Supplementary Fig. 6E, F). Similarly, there were more ATAC-seq peaks in the 10 kb surrounding insulator-down locations compared to insulator-unchanged locations (Supplementary Fig. 6G). We then compared TF binding between enhancers looped to each of the insulator-down locations determined by chromHMM or eRNA annotations, and identified TFs that are differentially enriched in each group of enhancers (Fig. 4F, Supplementary Fig 6H). For this analysis, we focused only on enhancers that are upstream of the landing pad as these enhancers are more likely to be blocked by the insulator. The TFs identified by both types of enhancer annotations overlap substantially (p = 2.2 × 10−4, hypergeometric test), supporting the idea that these TFs mediate insulator specificity (Supplementary Fig. 6I). These results support the hypothesis that each insulator blocks a different set of enhancer-bound TFs. While both A2 and cHS4 bind to CTCF, only cHS4-down enhancers are enriched for CTCF, suggesting that cHS4 is more likely to block enhancers via CTCF-mediated interactions. There is also a lack of overlap in enriched TFs between A2 and cHS4, suggesting that the specificity is not purely driven by CTCF. Additionally, the general depletion of TF motifs in cHS4-down enhancers could be due to the fact that cHS4 is a longer sequence that contains more TF binding sites than the other two insulators and may therefore interact with a more diverse set of TFs (Supplementary Fig. 6J). Taken together, insulators appear to block endogenous enhancers from influencing gene expression, and the specificity of insulator activity seems to be at least partially explained by the suite of TFs that are present in the enhancers interacting with each location (Fig. 4G).

CTCF is the only TF shown to have insulator activity in human cells, and its activity is thought to be dependent on the orientation of its motifs, as the majority of loop anchors comprise convergent CTCF motif pairs31,32. We asked if insulator-down locations were more likely to be closer to a convergent CTCF motif upstream, or if the closest CTCF site was more likely to be convergent (Supplementary Fig 7), but did not observe any differences for all insulators. These results suggest that the insulators are functioning without the help of a second CTCF site in the genome, consistent with previous results showing that a single insulator is sufficient for enhancer-blocking activity23,24,26,33.

We next explored the 50 shared locations where all the insulators downregulated expression. One explanation for these locations might be that they are sensitive to the addition of any sequence, not just insulators. However, the majority of these locations are not downregulated by the mutant or scrambled constructs (Supplementary Fig. 8A), suggesting that the activity of the insulator is important. These 50 locations generally contain higher levels of active histone modifications compared to the unshared insulator-down locations (Supplementary Fig. 8B), but were not enriched for interacting with any particular TFs. Testing more locations across the genome will be important to understanding why these locations are responsive to multiple specific insulators.

Insulator-up locations are upregulated by insulators and are candidates for locations where insulators block the spread of repressive chromatin. This model predicts that insulator-up locations will be enriched for markers of heterochromatin. In contrast to the insulator-down locations, the chromatin landscape of insulator-up regions is not the same for the three different insulators (Fig. 5A). ALOXE3-up locations are depleted of active histone modifications and enriched for H3K27me3, suggesting that ALOXE3 can block repressive histone modifications as expected for a heterochromatin barrier. Only ALOXE3-up locations are close to heterochromatin (Supplementary Fig. 9A) and enriched for looping with repressed (Polycomb) regions (Supplementary Fig. 9B). Similarly, ALOXE3 constructs are repressed at the least number of locations across the genome, suggesting that ALOXE3 can efficiently block heterochromatin silencing at many locations across the genome (Supplementary Fig. 9C). Notably, only H3K27me3, an indicator of facultative heterochromatin, is enriched at ALOXE3-up locations. H3K9me3, which is associated with constitutive heterochromatin, is not enriched in ALOXE3-up locations, consistent with the idea that only facultative heterochromatin remains accessible to TF binding, allowing for upregulation once ALOXE3 is present to block H3K27me3 spreading34. This result is also consistent with the fact that ALOXE3 was derived from the boundary of a H3K27me3 domain in K562 cells17. However, ALOXE3 still does not block all H3K27me3. While a larger proportion of H3K27me3 locations are downregulated by ALOXE3, most of the H3K27me3 locations are unchanged, suggesting that ALOXE3-mediated heterochromatin blocking is still context-specific (Supplementary Fig 9D). In contrast to ALOXE3-up locations, cHS4-up locations are not enriched for any histone modifications, and A2-up locations are enriched for similar chromatin marks as A2-down locations, with a stronger enrichment for H3K36me3 (Fig. 5A). From these results we propose that only ALOXE3 acts as a barrier against H3K27me3 spreading, while cHS4 and A2 upregulate expression by other mechanisms.

A Mean normalized histone modifications for insulator-unchanged and insulator-up locations. Histone signals are calculated as the mean of 10 kb surrounding each location and standardized across locations for each modification. B Violin plot of number of looped enhancers (determined by eRNA) per location for insulator-unchanged vs insulator-up locations. C Violin plot of number of looped enhancers (determined by eRNA) per location for mutant-unchanged vs mutant-up locations. p-values (two-sided Wilcoxon test) are shown above each plot. D Expression distribution of the uninsulated reporter at insulator-unchanged vs insulator-up locations. E Mean normalized histone modifications for locations upregulated by all insulators, locations upregulated by only one or two locations and uninsulated locations. F Model for barrier activity at ALOXE3-up locations. ALOXE3 blocks heterochromatin from silencing the gene. G Model for cHS4/A2-up locations. Instead of acting like an insulator, we propose that cHS4 and A2 act like enhancers in ‘primed’ genomic environments to increase expression. For all boxplots, the boxplot center corresponds to the median, and the box extends to the first and third quartiles respectively, with the lines extending to 1.5x of the interquartile range. Source data are provided as a Source Data file.

Our working model predicts that repressive H3K27me3 marks should decrease at insulated locations when ALOXE3 is present. We attempted to test this prediction computationally using Enformer35, a machine learning model that integrates long-range sequence information to predict gene expression and other histone modifications. Crucially, Enformer only uses sequence information to make predictions, allowing us to compare predictions at genomic locations with and without integrated insulator constructs. Since Enformer does not directly predict gene expression, we focused on predicting Cap Analysis of Gene Expression (CAGE) and H3K27me3 levels. Enformer predicted new CAGE peaks at locations with integrated reporter genes (Supplementary Fig. 10A) and higher CAGE fold-changes (insulator/uninsulated) in ALOXE3-up and A2-up compared to unchanged locations (Supplementary Fig. 10B), so we focused on these two insulators. In the regions flanking ALOXE3-up locations, Enformer predicted higher levels of H3K27me3 than ALOXE3-unchanged locations (Supplementary Fig. 10C), which agrees with our observation that ALOXE3-up are enriched in H3K27me3 chromatin.

At the uninsulated locations, we expect that H3K27me3 from flanking chromatin spreads into the reporter gene to silence expression, resulting in a correlation between H3K27me3 levels in flanking and reporter chromatin. If the insulators block H3K27me3 spreading, then we would expect a decrease in correlation between H3K27me3 levels in flanking and reporter chromatin predicted by Enformer. We find that the correlation between flanking and reporter chromatin was high in the uninsulated and A2 locations, but decreased in ALOXE3 locations, suggesting that only ALOXE3 can block the spread of H3K27me3 (Supplementary Fig. 10D). To test whether ALOXE3-up locations are specifically contributing to the decrease in correlation, we compared the correlations between flanking and reporter gene chromatin for ALOXE3-unchanged vs ALOXE3-up locations. ALOXE3-up locations tend to have lower H3K27me3 than expected compared to ALOXE3-unchanged locations (Supplementary Fig. 10E). In contrast, A2-up locations and A2-unchanged locations have similar correlations (Supplementary Fig. 10F). These results support the idea that ALOXE3 blocks H3K27me3 spreading from flanking chromatin into the reporter genes, leading to upregulation in ALOXE3-up locations.

We observed that unlike ALOXE3-up locations, both A2-up and cHS4-up locations seem to be interacting with more enhancers and transcribed regions (Fig. 5B, Supplementary Fig. 11A & B). This suggested that the non-insulated reporters may be more highly expressed in A2-up and cHS4-up locations. Instead, we found that the expression levels of the non-insulated reporters are generally lower in A2-up and cHS4-up locations compared to the unchanged locations, albeit marginally higher than ALOXE3-up locations (Fig. 5C). Thus, we speculate that A2-up and cHS4-up locations are primed for high expression and that A2 and cHS4 act as ‘enhancers’ at these locations by recruiting additional TFs that upregulate expression. This model can also explain the discrepancies between the chromatin environments of A2-up and cHS4-up locations (Fig. 5A). While A2 requires a more highly active chromatin environment to act as an enhancer, cHS4 might upregulate expression at more neutral environments because it contains many more motifs and may recruit many more TFs (Supplementary Fig. 6J). Consistently, mutant-up locations are all enriched for active histone modifications and looped to more enhancers/transcribed regions (Fig. 5D, Supplementary Fig. 11C–E). Despite that, the uninsulated reporters are also not highly expressed at mutant-up locations (Supplementary Fig. 11F). This suggests that deleting the B-box motifs in ALOXE3 abolishes its ability to act as a heterochromatin barrier.

Finally, there are also a small number of locations (86) that are upregulated by all three insulators. These locations are enriched for both active (particularly H3K4me2 and me3) and inactive histone modifications (Fig. 5D), which could represent a poised chromatin environment. In these environments, ALOXE3 can block heterochromatin-mediated silencing, while cHS4 and A2 can also gain enhancer activity to upregulate gene expression. Taken together, our results suggest that only ALOXE3 acts as a classical heterochromatin blocker, while the other constructs appear to act as enhancers at primed locations across the genome (Fig. 5E).

It has long been known that CRE activity is influenced by the chromatin state and other CREs in the surrounding genomic environment. Yet, we still do not understand how genomic environments interact with CREs to impact gene expression. While several deep learning methods have been developed to predict gene expression from short sequences, these models still fall short when predicting reporter gene activity in the genome36. Much more data will be required to train and test these models in order to generate truly predictive models of gene expression in genomic contexts. In this study, we developed a high-throughput and quantitative technology to systematically measure the activities of many cis-regulatory elements at thousands of locations across the genome. MPIRE is a robust, reproducible method that can be easily scaled to measure the activities of many reporter genes simultaneously.

Moving forward, we envision that MPIRE can be used for many questions involving the effects of genomic or chromatin context on CREs. For example, MPIRE can be used to study how TFs bind to the different sequences in different chromatin contexts, or how different allelic variants behave in the same genomic contexts across the genome. MPIRE can also be coupled with other DNA-based readouts such as Cut&Tag or 4 C to measure other aspects of genome regulation in addition to expression. Understanding how the different processes in the genome interact with each other will be important for determining and quantifying their contributions to gene expression regulation.

Insulators are a class of cis-regulatory elements that have been associated with several functions. Due to the lack of functional assays to measure insulator activity in the genome, it has been difficult to understand or predict insulator function. Thus, we applied MPIRE to measure the impact of insulators on gene expression at thousands of locations across the genome. We show that insulators have distinct, pleiotropic activities depending on their location in the genome.

A long-standing question in gene regulation is how enhancers contact and activate target genes in the genome. Recent evidence has shown that enhancers and promoters are broadly compatible with each other, raising the question of how specific interactions are achieved2,3,5. Insulators are thought to facilitate this process by blocking enhancers from interacting with their target promoters, but it remains unclear whether insulators are modular elements that can block all enhancers in the genome. Our results support a model in which insulators block enhancers at specific locations in the genome, thereby leading to a reduction in gene expression at those locations (Fig. 4G). For example, ADNP binding is enriched in enhancers blocked by ALOXE3 (Fig. 4F), and has previously been implicated in modulating 3D genome interactions by competing with CTCF for binding at SINE B2 transposable elements37. Both SINE B2 and ALOXE3 elements are transcribed by RNA Pol III and have been shown to act as insulators17,38,39, which may explain ALOXE3’s ability to block ADNP-bound enhancers. RNA Pol III has also been suggested to aid ADNP in recruiting distal CTCF sites, which may help ALOXE3 act as an insulator in ALOXE3-down locations40. The two insulators that bind CTCF (A2, cHS4) are not any more similar compared to ALOXE3, implying that CTCF itself is not sufficient to explain insulator activity. The specificity of different insulators for different enhancers suggests a model where insulators interact with select enhancers to modulate specific enhancer-promoter interactions. While we only tested one promoter with different insulators in this study, MPIRE captured a large range of activities with this promoter across the genome. To the extent that promoters are modular CREs, we expect our results to generalize to insulation at other promoters. However, future experiments with diverse promoters will reveal whether there are more complex interactions between genomic environments, insulators and promoters.

While we only tested three insulators in this study, our results demonstrate that not all insulators possess barrier activity against heterochromatin. Only ALOXE3 appears to be able to block reporter gene silencing. A parsimonious explanation for this difference might be that ALOXE3 binds to RNA PolIII and gets transcribed17, allowing it to generate an open chromatin environment that can overcome heterochromatin spreading. Indeed, the recruitment of transcriptional activators can insulate against repression41,42, and transcription of various TEs have been shown to be sufficient to generate a boundary in mammalian cells43,44. Other groups have also observed that cHS4 is not sufficient to protect against silencing at all genomic locations4,45, and can block HDAC-mediated but not KRAB-mediated silencing46. We find instead that cHS4 and A2 can act as enhancers in primed environments to upregulate gene expression (Fig. 5F). Neither cHS4 nor A2 show enhancer activity in plasmid reporter assays15,16 (Fig. 2B), likely because plasmids do not provide the necessary primed environment. Our results are consistent with the previously proposed model that insulation is a dynamic contest between activation and repression47. ALOXE3 is transcribed and has the highest activation rate, allowing it to overcome repressed heterochromatin, while A2 has the lowest activation rate and can only overcome very weakly repressive environments. Previous studies also observed silencing 2-3 weeks after reporter integration16, while we only waited a week in our study. Screening and testing more insulators across more locations in the genome at different time points will allow us to build a comprehensive and predictive model of insulator activity. These results will also be useful to improve synthetic biology and gene therapy applications where high levels of sustained gene expression are required.

In designing MPIRE we chose not to include a second insulator downstream of our reporter gene. While we showed that a single insulator is sufficient for insulator activity in the genome, we cannot rule out that including a second flanking insulator would increase the number of locations at which insulators function. We also note that our conclusions are correlative and require further testing for causative mechanisms, but our results open up new avenues of investigation into the mechanisms of insulation and CRE function. The flexibility of MPIRE will allow investigators to test multiple designs for testing the functions of insulators and other CREs in diverse genomic environments.

All primers and oligos used in this study can be found in Supplementary Data 3. We first cloned each landing pad construct into a Sleeping Beauty (SB) transposon plasmid containing SB ITRs (gift from Robi Mitra lab). Each landing pad consists of a hsp68 promoter driving the expression the mEmerald reporter gene and is flanked with φC31 and BxbI attP sites for recombination. The φC31 attP site, hsp68 promoter, mEmerald fluorophore, poly(A) signal constructs were amplified by PCR from different plasmids and assembled using the NEBuilder HiFi DNA Assembly Master Mix (HiFi Assembly, NEB #E2621). The BxbI attP site was then added using the Q5 Site-Directed Mutagenesis Kit (Q5 SDM, NEB #E0554). To add a library of diverse random barcodes, we first digested the landing pad plasmid with XbaI at 37°C for 16 hours. We then ordered an oligo containing 16 Ns with flanking homology arms to the landing pad plasmid (GWLP P1) and used HiFi Assembly (NEB #E2621) to assemble the oligo to the plasmid (50°C, 15 min).

The K562 cell line was obtained from the Genome Engineering & Stem Cell Center at Washington University in St. Louis. The cells were maintained in Iscove’s Modified Dulbecco′s Medium (IMDM) + 10% FBS + 1% non-essential amino acids + 1% penicillin/streptomycin. To generate pools of cells containing landing pads, K562 cells were electroporated with the SB100x transposase (gift from Robi Mitra lab) and landing pad library at a 1:1 ratio. Specifically, we electroporated 4.8 million cells with 20 μg SB100x and 20 μg landing pad library with the Neon Transfection System 100 μL Kit (Life Technologies, #MPK10025). The cells were cultured for a week then sorted into pools of 2000 cells to bottleneck the numbers of gBCs per pool, ensuring that each gBC is associated with a unique location in the cell7. In total we generated 6 pools of cells, with pools 1-2 from the same initial transfection pool (experiment v1) and pools 3-6 from a second initial transfected pool of cells (experiment v2).

To measure expression of each landing pad we harvested RNA and genomic DNA for each pool of cells using TRIzol reagent (Invitrogen #15596026). 4 μg RNA was treated with DNAse using the Rigorous DNase treatment procedure from the Turbo DNase protocol (Invitrogen #AM2238) and reverse transcribed with the SuperScript IV First Strand Synthesis System (Invitrogen #18091050). Barcodes were amplified from the library using Q5 High-Fidelity 2X Master Mix (Q5, NEB #M0492) using primers GWLP P2-3 with 20 cycles. We performed 32 PCR reactions per cDNA sample. For genomic DNA, we performed 4 PCR reactions with 1 μg DNA per PCR. PCRs from the same samples were then pooled and purified. Sequencing adapters were then added with two rounds of PCR with Q5 (NEB #M0492) (GWLP P4-7). The resulting libraries were sequenced on the Illumina NextSeq platform.

To map the locations of each landing pad we followed the protocol for mapping TRIP integrations as described in our previous study2. We first digested gDNA with AvrII, NheI, SpeI and XbaI for 16 h, then purified and self-ligated at 4 °C for an additional 16 h. Next, we purified the ligations and performed inverse PCR to amplify the barcodes and the adjacent genomic region (GWLP P34-35). We performed 8 PCRs (Q5 polymerase) per pool and purified the PCRs. Sequencing adapters were then added with two rounds of PCR (GWLP P36-37). The resulting libraries were sequenced on either the Illumina NextSeq or NovaSeq platform.

We first generated a transfer vector containing a reporter gene (hsp68 promoter + mScarlet) flanked by φC31 and BxbI attB sites using HiFi Assembly (NEB #E2621). The insulator sequences were obtained from their respective papers (cHS4 from Chung et al.16., A2 from Liu et al.15 and ALOXE from Raab et al.17). For ALOXE3, we selected to use the sequence with 2 tDNAs rather than 4 tDNAs because it is a tractable length for cloning ( ~ 1 kb), does not contain a CTCF binding site and is sufficient for enhancer-blocking activity17. cHS4 and A2 were synthesized as gBlocks (Integrated DNA Technologies) and homology arms for the backbone were added by PCR with GWLP P8-9 (cHS4) or GWLP P10-11 (A2). ALOXE3 was amplified from K562 genomic DNA using primers GWLP P12-13 and homology arms were added by PCR with GWLP P14-15. The constructs were then inserted between the φC31 attB site and hsp68 promoter in the transfer vector by HiFi Assembly (NEB #E2621). The cHS4 mutant design is from Farrell et al.19 (cHS4 x3) and was introduced to the cHS4 transfer vector with Q5 SDM (NEB #E0554) using primers GWLP P16-17. While the CTCF binding site in A2 has been identified, it has not been tested for its dependence on CTCF. Thus, we used the CTCF PWM (JASPAR48 MA0139.1) to identify and mutate the three most informative sites in the motif (CACCAGGTGGCGCT → TACCACGTTGCGCT). The mutation was introduced to the A2 transfer vector with Q5 SDM ((NEB #E0554) using primers GWLP P18-19. For ALOXE3, the B-box motifs were identified using the B-box frequency matrix from Pavesi et al.49 with FIMO50, and both motifs were deleted with Q5 SDM (NEB #E0554) using primers GWLP P20-23. Finally, the A2 sequence was randomly scrambled to generate A2 scrambled and synthesized with homology arms as a gBlock (Integrated DNA Technologies), then inserted between the φC31 attB site and hsp68 promoter in the transfer vector by HiFi Assembly (NEB #E2621). Insulator barcodes were then added to each construct by site-directed mutagenesis. All insulator sequences and mutants can be found in Supplementary Data 10 and individual genbank files are provided as Supplementary Data 11.

To test the efficiency of recombination, we generated constructs barcoded with a diverse library of random barcodes. We digested the no insulator and A2 constructs with NheI at 37°C for 16 hours. We then ordered an oligo containing 16 Ns with flanking homology arms to the plasmid (GWLP P24) and used NEBuilder HiFi DNA Assembly to assemble the oligo to the plasmid (50°C, 15 min).

For the pilot experiments we focused on cell pool 1. To determine whether the landing pads would recombine with each other, we cotransfected 2.4 million pool 1 cells with 10 μg φC31 integrase (Addgene #51553) and 10 μg BxbI integrase (Addgene #51552) as described above. As a control, we also performed a mock transfection of the same numbers of cells. The cells were then allowed to grow for a week, and RNA and DNA from the mock and integrase transfected cells was harvested using TRIzol reagent (Invitrogen #15596026). We then amplified barcodes from both RNA and DNA to measure expression and mapped the barcodes as described above in the Generation of cell lines section.

To determine the efficiency of recombination we used pools 7 and 8 (not used in subsequent MPIRE insulator experiments). For each pool, we cotransfected 2.4 million cells with a mix of 2.5 μg φC31 integrase, 2.5 μg BxbI integrase and 5 μg of either the no insulator or A2 construct as described above. The cells were then allowed to grow for a week, and RNA and DNA was harvested using TRIzol reagent (Invitrogen #15596026). The RNA was treated with DNase and reverse transcribed to generate cDNA as described above. We then amplified barcodes from both cDNA and DNA in 32 or 4 PCR reactions respectively per sample using Q5 (NEB #M0492) and primers specific to the reporter genes (GWLP P3, P25). We pooled PCRs from the same samples for PCR purification, then used 4 ng of the product for further amplification with two rounds of PCR to add Illumina sequencing adapters (GWLP P3, P33, P6-7). The resulting libraries were sequenced on the Illumina NextSeq platform.

To measure expression from plasmids alone, we transfected K562 wild-type cells with each construct separately. For each construct, we performed three biological replicates. We transfected 1.2 million cells with 5 μg plasmid DNA per replicate using the Neon Transfection System 100 μL Kit (Life Technologies #MPK10025). RNA was harvested 72 hours after transfection using the Monarch Total RNA Miniprep Kit (NEB #T2010). mScarlet and HPRT (housekeeping gene for normalization) levels were then measured by qPCR using the Luna Universal One-Step RT-qPCR Kit (NEB #E3005) with 100 ng total RNA per reaction, three reactions per replicate (mScarlet primers: GWLP P25-26, HPRT primers: GWLP P27-28). Expression was calculated using the ΔΔCt method, normalized to the no insulator construct.

For the enhancer-blocking assay, we designed plasmids that contain either enhancer-promoter or enhancer-insulator-promoter sequences to test whether the insulator can block enhancer-mediated activation of promoter activity. We first selected an enhancer that was previously shown to be active with the hsp68 promoter in K562 cells27. The full sequence of the enhancer is:

GCCCCCCTTCTTCCTATGTCTGATGGAGTTTCCTCTCTAAGTAGCCATTTTATTCTGCTGACTCACCCTCTAACTCCCGGTCTTATTCCATCCTGCCTCAGGGTCTGTGGTGTAGTCATAGCACATGCATCTCCTCCGGCTCGCTGATT

The no insulator construct was digested with EcoRI to insert the enhancer upstream of the hsp68 promoter. The enhancer was amplified by PCR to add overhangs to the backbone vector (GWLP P29-30) and assembled into the backbone by HiFi Assembly (NEB #E2621).

To add the different insulator constructs, the +enhancer plasmid was digested with AgeI and EcoRI. For A2, A2-mut, A2-scrambled, cHS4 and cHS4-mut, we digested the respective plasmids with AgeI and EcoRI and ligated them to the backbone with T4 DNA Ligase (NEB #M0202). For ALOXE3 and ALOXE3-mut, we amplified the insulators by PCR to add overhangs to the backbone vector and assembled the insulators into the backbone by HiFi Assembly (GWLP P31-32).

We measured expression from these constructs in the same way as described in the Insulator Plasmid Activity section, except with two biological replicates instead of three per construct.

We pooled insulator constructs for transfection into landing pad cell pools. In the first experiment, we combined the no insulator, A2, A2 mut, A2 scrambled, cHS4, and ALOXE3 constructs and tested them in pools 1 and 2. In the second experiment, all the constructs were combined in equal amounts and tested in pools 3-6. We used the Neon Transfection System 100 μL Kit (Invitrogen #MPK10025) for transfection. Each pool was transfected separately and we performed two replicates per pool. For each replicate, we performed four transfections, each with 1.2 million cells and a mix of 2 μg φC31 integrase, 2 μg BxbI integrase and 4 μg of the insulator constructs pooled in equal ratios.

We harvested DNA and RNA from cells one week after transfection using the TRIzol Reagent (Invitrogen #15596018). The RNA was treated with DNase and reverse transcribed to generate cDNA as described above. Barcodes were amplified from cDNA using Q5 (NEB #M0492) with primers specific to the reporter gene (GWLP P3, P25), with 32 PCR reactions per biological replicate. Similarly, barcodes were amplified from genomic DNA with 8 PCR reactions per biological replicate. We pooled PCRs from the same biological replicates for PCR purification, then used 4 ng of the product for further amplification with two rounds of PCR to add Illumina sequencing adapters (GWLP P4, P33, P6-7). The resulting libraries were sequenced on the Illumina NextSeq platform.

The preprocessing of sequencing reads was performed with Python 3.9. All statistical analyses and figures were done in R 4.2.0. We used bedtools v2.30.0 for genomics analyses.

For expression from the initial cell pools and cells transfected with integrase only, we first filtered for reads that contained the gBC in the correct sequence context. We then filtered for gBCs with >20 reads and normalized the reads by sequencing depth. We then calculated expression as log2(RNA counts/DNA counts) for all barcodes. The expression of the landing pads in the original cell pool and in the integrase-transfected cells can be found in Supplementary Data 2 and 4, respectively.

To identify the locations of the landing pads, we obtained paired-end reads containing the barcode on one read and the sequence of the integration site on the other. We matched the barcodes with the integration site sequence, then aligned them to hg38 with BWA using default parameters. We only kept barcodes where the reads for one location represented at least 80% of all the locations for that barcode. The mapped integration locations can be found in Supplementary Data 1. For pools from the same initial transfection (Pools 1&2, Pools 3-6), a small number of barcodes are shared between them because the same cells were sorted into multiple pools. These barcodes are indicated with Px instead of a pool number. To compare locations before and after integrase-only transfection, we compared the mapped locations in pool 1, only considering gBCs with exact matches. The locations of the landing pads before and after integrase transfection can be found in Supplementary Data 5.

To evaluate the efficiency of recombination, we first filtered for reads that contained both barcodes (rBC and gBC) in the correct sequence context. We then filtered for barcode pairs that had >20 reads associated with them and tabulated the number of rBCs/gBC for either the no insulator or A2 construct (Source Data). For the proportion of recovered LPs, we compared the DNA gBCs recovered from cells transfected with either the no insulator or A2 construct to the DNA gBCs recovered from the original cell pools.

For the recombined MPIRE libraries, we first filtered for reads that contained both barcodes (iBC and gBC) in the correct sequence context. The gBCs were then compared to the list of mapped gBCs (gBCs that have a unique location associated with them, Supplementary Data 1). A gBC was assigned to a mapped gBC if both barcodes have a hamming distance <5 and the gBC is ≥5 hamming distance from all other gBCs (the average hamming distance between mapped gBCs is 9). We then tabulated the number of barcodes or barcode pairs after assigning gBCs to a mapped gBC for both RNA and DNA.

To calculate reproducibility and to compare expression before and after recombination, we first filtered for gBC-iBC pairs that had >20 reads and normalized the reads by sequencing depth. We then calculated expression as log2(RNA counts/DNA counts) for all barcodes that are present in both the RNA and DNA pools. The calculated expression values can be found in the Source Data file.

To calculate expression after recombination of the insulator library, including those that might not have an RNA count because they are lowly expressed, we turned to the MPRAnalyze tool51. We added a pseudocount of 1 to all counts and used the analyzeQuantification function to calculate the transcription rate (α) for each construct at each location using counts from both biological replicates. If the same barcode is present in multiple pools, then the counts from all replicates across the pools were used as input, so that each location only gets one α value. Expression values can be found in the Source Data. To quantify the effects of the insulators, we used the analyzeComparative function in MPRAnalyze to calculate the fold-change of the insulated vs uninsulated locations. The output of this analysis can be found in the Source Data.

We downloaded ChIP data for various histone modifications in K562 from the ENCODE database52 (Supplementary Table 1). We used bedtools53 to map the mean histone signals in the 20 kb surrounding each landing pad (10 kb upstream and downstream). For each histone modification, we standardized the signal values, then computed the average signal for ins-unchanged, ins-up and ins-down locations respectively. All heatmaps were generated using the ComplexHeatmap package in R54.

To identify motifs in each insulator we used FIMO50 with motifs from the HOCOMOCOv11 database, limited to TFs that were expressed (FPKM > = 1) in K562 from whole cell long poly(A) RNA-seq data generated by ENCODE55 downloaded from the EMBL-EBI Expression Atlas.

To identify enhancers in K562, we used two different annotations. One was a set of compiled enhancer annotations based on a comprehensive suite of eRNA datasets30. The other is the core 15-state chromHMM annotation for K562 cells, which was downloaded from the Roadmap Epigenomics Project56. Similar annotations from the 15-state chromHMM data were grouped. We overlapped the landing pads with the corresponding annotations using the GenomicRanges R package57.

ATAC-seq data for K562 cells was downloaded from the ENCODE database52 (Supplementary Table 1). We used bedtools53 to count the number of ATAC-seq peaks around each landing pad.

For 3D-genome related analyses, we used K562 Hi-C data that was previously generated and processed58. We used Arrowhead59 to call boundaries, then used the GenomicRanges R package57 to determine the distance of each landing pad to the closest boundary. To identify locations that are looped to each location, we used the peakHiC tool60 to call loops for each landing pad with the following parameters: window size = 80, alphaFDR = 0.5, minimum distance = 10 kb, qWr = 1. The regions identified by peakHiC were overlapped with chromHMM annotations using the GenomicRanges R package57 to identify putatively interacting enhancers with each landing pad. All locations looped to each landing pad can be found in Supplementary Data 6.

To determine the relationship between insulator activity and CTCF binding sites in the genome, we first identified the closest upstream CTCF binding site for each insulator, and determined if the CTCF motif underlying the peak is in a convergent or divergent orientation relative to the inserted insulator, as convergent CTCF motifs are more likely to form loops that influence insulator activity31.

We downloaded all available TF ChIP data in K562 from the ENCODE database52, and excluded the TFs that were not expressed (FPKM < 1 in whole cell long poly(A) RNA-seq data generated by ENCODE55 from the EMBL-EBI Expression Atlas). The full list of downloaded files can be found in Supplementary Data 7. For this analysis we focused on the locations that are downregulated by only one or two insulators, locations that were downregulated by all three were excluded. We used bedtools53 to count the number of TF ChIP peaks 10 kb upstream of the landing pad or to count the number of peaks in each putative enhancer identified above. We compared the number of peaks between different groups of landing pads or enhancers using chi-squared tests followed by p-value correction with the Benjamini & Hochberg (BH) method.

We decided to use Enformer because it can predict histone modifications and expression directly from sequence35. We first downloaded the published, trained model from TensorFlow Hub (https://tfhub.dev/deepmind/enformer/1). For each landing pad, we considered the landing pad to be the center of the sequence to use as input and extended it out on both flanks using the hg38 genome for a total length of 393,216 bp, which was the length used in the training model. The model predicts signals for the central 114,688 bp, which includes the landing pad. For each location, we used the model to predict CAGE and H3K27me3 in K562 cells. To calculate CAGE signals, we summed CAGE scores across all windows that contain the landing pad. We then calculated fold-changes for each insulator by taking log2(predicted_insulator_CAGE/predicted_uninsulated_CAGE) (Supplementary Data 8). To calculate H3K27me3 signals, we either summed across all windows that contain the landing to get the landing pad H3K27me3 signals, or we summed across all windows outside the landing pad (that were used in the input data) to get flanking H3K27me3 signals (Supplementary Data 9). The values were normalized by z-scoring the predicted values for each insulator.

The number of landing pads was selected to cover diverse chromatin environments across the genome. No statistical method was used to predetermine sample size. No data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

The data generated in this study has been deposited to GEO under accession number GSE223403. The landing pad locations are provided in the Supplementary Data 1. Source data is provided with this paper. Source data are provided with this paper.

The code used in this study has been uploaded to Github at https://github.com/claricehong/MPIRE_insulators.

Maricque, B. B., Chaudhari, H. G. & Cohen, B. A. A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activity. Nat. Biotechnol. 37, 90–95 (2019).

Article CAS Google Scholar

Hong, C. K. Y. & Cohen, B. A. Genomic environments scale the activities of diverse core promoters. Genome Res. 32, 85–96 (2022).

Article PubMed Google Scholar

Bergman, D. T. et al. Compatibility rules of human enhancer and promoter sequences. Nature 607, 176–184 (2022).

Article ADS CAS PubMed Google Scholar

Walters, M. C. et al. The chicken β-globin 5′HS4 boundary element blocks enhancer-mediated suppression of silencing. Mol. Cell. Biol. 19, 3714–3726 (1999).

Article CAS PubMed Google Scholar

Martinez-Ara, M., Comoglio, F., van Arensbergen, J. & van Steensel, B. Systematic analysis of intrinsic enhancer-promoter compatibility in the mouse genome. Mol. Cell 82, 2519–2531.e6 (2022).

Article CAS PubMed Google Scholar

Ribeiro-dos-Santos, A. M., Hogan, M. S., Luther, R. D., Brosh, R. & Maurano, M. T. Genomic context sensitivity of insulator function. Genome Res. 32, 425–436 (2022).

Article PubMed Google Scholar

Akhtar, W. et al. Chromatin position effects assayed by thousands of reporters integrated in parallel. Cell 154, 914–927 (2013).

Article CAS PubMed Google Scholar

Phillips-Cremins, J. E. & Corces, V. G. Chromatin insulators: linking genome organization to cellular function. Mol. Cell 50, 461–474 (2013).

Article CAS PubMed Google Scholar

Cai, H. & Levine, M. Modulation of enhancer-promoter interactions by insulators in the Drosophila embryo. Nature 376, 533–536 (1995).

Article ADS CAS PubMed Google Scholar

Hagstrom, K., Muller, M. & Schedl, P. Fab-7 functions as a chromatin domain boundary to ensure proper segment specification by the Drosophila bithorax complex. Genes Dev. 10, 3202–3215 (1996).

Article CAS PubMed Google Scholar

Scott, K. C., Taubman, A. D. & Geyer, P. K. Enhancer blocking by the Drosophila gypsy insulator depends upon insulator anatomy and enhancer strength. Genetics 153, 787–798 (1999).

Article CAS PubMed PubMed Central Google Scholar

Bouhassira, E. E., Westerman, K. & Leboulch, P. Transcriptional behavior of LCR enhancer elements integrated at the same chromosomal locus by recombinase-mediated cassette exchange. Blood 90, 3332–3344 (1997).

Article CAS PubMed Google Scholar

Yoshida, J. et al. Chromatin states shape insertion profiles of the piggyBac, Tol2 and Sleeping Beauty transposons and murine leukemia virus. Sci. Rep. 7, 43613 (2017).

Article ADS PubMed PubMed Central Google Scholar

Xu, Z. et al. Accuracy and efficiency define Bxb1 integrase as the best of fifteen candidate serine recombinases for the integration of DNA into the human genome. BMC Biotechnol. 13, 87 (2013).

Article PubMed PubMed Central Google Scholar

Liu, M. et al. Genomic discovery of potent chromatin insulators for human gene therapy. Nat. Biotechnol. 33, 198–203 (2015).

Article ADS PubMed Google Scholar

Chung, J. H., Whiteley, M. & Felsenfeld, G. A 5’ element of the chicken beta-globin domain serves as an insulator in human erythroid cells and protects against position effect in Drosophila. Cell 74, 505–514 (1993).

Article CAS PubMed Google Scholar

Raab, J. R. et al. Human tRNA genes function as chromatin insulators. EMBO J. 31, 330–350 (2012).

Article CAS PubMed Google Scholar

Bell, A. C., West, A. G. & Felsenfeld, G. The protein CTCF is required for the enhancer blocking activity of vertebrate insulators. Cell 98, 387–396 (1999).

Article CAS PubMed Google Scholar

Farrell, C. M., West, A. G. & Felsenfeld, G. Conserved CTCF insulator elements flank the mouse and human beta-globin loci. Mol. Cell Biol. 22, 3820–3831 (2002).

Article CAS PubMed PubMed Central Google Scholar

Schramm, L. & Hernandez, N. Recruitment of RNA polymerase III to its target promoters. Genes Dev. 16, 2593–2620 (2002).

Article CAS PubMed Google Scholar

Zhong, X.-P. & Krangel, M. S. An enhancer-blocking element between α and δ gene segments within the human T cell receptor α/δ locus. Proc. Natl Acad. Sci. 94, 5219–5224 (1997).

Article ADS CAS PubMed PubMed Central Google Scholar

Hark, A. T. et al. CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature 405, 486–489 (2000).

Article ADS CAS PubMed Google Scholar

Petrykowska, H. M., Vockley, C. M. & Elnitski, L. Detection and characterization of silencers and enhancer-blockers in the greater CFTR locus. Genome Res 18, 1238–1246 (2008).

Article CAS PubMed PubMed Central Google Scholar

Wang, J. et al. MIR retrotransposon sequences provide insulators to the human genome. Proc. Natl Acad. Sci. 112, E4428–E4437 (2015).

CAS PubMed PubMed Central Google Scholar

Lunyak, V. V. et al. Developmentally regulated activation of a SINE B2 repeat as a domain boundary in organogenesis. Science 317, 248–251 (2007).

Article ADS CAS PubMed Google Scholar

Román, A. C. et al. Dioxin receptor and SLUG transcription factors regulate the insulator activity of B1 SINE retrotransposons via an RNA polymerase switch. Genome Res. 21, 422–432 (2011).

Article PubMed PubMed Central Google Scholar

Kwasnieski, J. C., Fiore, C., Chaudhari, H. G. & Cohen, B. A. High-throughput functional testing of ENCODE segmentation predictions. Genome Res. 24, 1595–1602 (2014).

Article CAS PubMed PubMed Central Google Scholar

Wiles, E. T. & Selker, E. U. H3K27 methylation: a promiscuous repressive chromatin mark. Curr. Opin. Genet Dev. 43, 31–37 (2017).

Article CAS PubMed Google Scholar

Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).

Article CAS PubMed PubMed Central Google Scholar

Yao, L. et al. A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers. Nat. Biotechnol. 40, 1056–1065 (2022).

Article CAS PubMed Google Scholar

Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

Article CAS PubMed PubMed Central Google Scholar

de Wit, E. et al. CTCF binding polarity determines chromatin looping. Mol. Cell 60, 676–684 (2015).

Article PubMed Google Scholar

Zuin, J. et al. Nonlinear control of transcription through enhancer–promoter interactions. Nature 604, 571–577 (2022).

Article ADS CAS PubMed Google Scholar

Becker, J. S., Nicetto, D. & Zaret, K. S. H3K9me3-dependent heterochromatin: barrier to cell fate changes. Trends Genet. 32, 29–41 (2016).

Article CAS PubMed Google Scholar

Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).

Article CAS PubMed PubMed Central Google Scholar

Karollus, A., Mauermeier, T. & Gagneur, J. Current sequence-based models capture gene expression determinants in promoters but mostly ignore distal enhancers. Genome Biol. 24, 56 (2023).

Article PubMed PubMed Central Google Scholar

Kaaij, L. J. T., Mohn, F., van der Weide, R. H., de Wit, E. & Bühler, M. The ChAHP complex counteracts chromatin looping at ctcf sites that emerged from sine expansions in mouse. Cell 178, 1437–1451.e14 (2019).

Article CAS PubMed Google Scholar

Ebersole, T. et al. tRNA genes protect a reporter gene from epigenetic silencing in mouse cells. Cell Cycle 10, 2779–2791 (2011).

Article CAS PubMed Google Scholar

Maquat, L. E. Short interspersed nuclear element (SINE)-mediated post-transcriptional effects on human and mouse gene expression: SINE-UP for active duty. Philos. Trans. R. Soc. B: Biol. Sci. 375, 20190344 (2020).

Article CAS Google Scholar

Ferrari, R. et al. TFIIIC binding to alu elements controls gene expression via chromatin looping and histone acetylation. Mol. Cell 77, 475–487.e11 (2020).

Article CAS PubMed PubMed Central Google Scholar

Fourel, G. et al. An activation-independent role of transcription factors in insulator function. EMBO Rep. 2, 124–132 (2001).

Article CAS PubMed PubMed Central Google Scholar

Sutter, N. B., Scalzo, D., Fiering, S., Groudine, M. & Martin, D. I. K. Chromatin insulation by a transcriptional activator. Proc. Natl Acad. Sci. 100, 1105–1110 (2003).

Article ADS CAS PubMed PubMed Central Google Scholar

Zhang, Y. et al. Transcriptionally active HERV-H retrotransposons demarcate topologically associating domains in human pluripotent stem cells. Nat. Genet 51, 1380–1388 (2019).

Article CAS PubMed PubMed Central Google Scholar

Kruse, K. et al. Transposable elements drive reorganisation of 3D chromatin during early embryogenesis. 523712 Preprint at https://doi.org/10.1101/523712 (2019).

van der Vlag, J., den Blaauwen, J. L., Sewalt, R. G. A. B., van Driel, R. & Otte, A. P. Transcriptional repression mediated by polycomb group proteins and other chromatin-associated repressors is selectively blocked by insulators. J. Biol. Chem. 275, 697–704 (2000).

Article PubMed Google Scholar

Lensch, S. et al. Dynamic spreading of chromatin-mediated gene silencing and reactivation between neighboring genes in single cells. eLife 11, e75115 (2022).

Article CAS PubMed Google Scholar

Dillon, N. & Festenstein, R. Unravelling heterochromatin: competition between positive and negative factors regulates accessibility. Trends Genet. 18, 252–258 (2002).

Article CAS PubMed Google Scholar

Castro-Mondragon, J. A. et al. JASPAR 2022: the 9th release of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 50, D165–D173 (2022).

Article CAS PubMed Google Scholar

Pavesi, A., Conterio, F., Bolchi, A., Dieci, G. & Ottonello, S. Identification of new eukaryotic tRNA genes in genomic DNA databases by a multistep weight matrix analysis of transcriptional control regions. Nucleic Acids Res. 22, 1247–1256 (1994).

Article CAS PubMed Google Scholar

Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

Article CAS PubMed PubMed Central Google Scholar

Ashuach, T. et al. MPRAnalyze: statistical framework for massively parallel reporter assays. Genome Biol. 20, 183 (2019).

Article PubMed PubMed Central Google Scholar

Moore, J. E. et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).

Article ADS PubMed PubMed Central Google Scholar

Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

Article CAS PubMed PubMed Central Google Scholar

Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).

Article CAS PubMed Google Scholar

Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

Article ADS CAS PubMed PubMed Central Google Scholar

Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

Article CAS PubMed PubMed Central Google Scholar

Lawrence, M. et al. Software for computing and annotating genomic ranges. PLOS Computational Biol. 9, e1003118 (2013).

Article CAS Google Scholar

Hong, C. K. Y., Ramu, A., Zhao, S. & Cohen, B. A. Effect of genomic and cellular environments on gene expression noise. Genome Biol. 25, 137 (2024).

Article CAS PubMed PubMed Central Google Scholar

Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 3, 95–98 (2016).

Article CAS PubMed PubMed Central Google Scholar

Bianchi, V. et al. Detailed regulatory interaction map of the human heart facilitates gene discovery for cardiovascular disease. 705715 Preprint at https://doi.org/10.1101/705715 (2019).

Download references

We are grateful to Robi Mitra for providing us with plasmids for our experiments. We thank members of the Cohen Lab for helpful discussion and critical feedback on the manuscript. We also thank Jessica Hoisington-Lopez and MariaLynn Crosby in the DNA Sequencing Innovation Lab for assistance with high-throughput sequencing and the Genome Engineering & Stem Cell Center for kindly allowing us to use their flow cytometer for cell sorting. This work was supported by grants to BAC from the National Institutes of Health (R01GM092910).

The Edison Family Center for Genome Sciences and Systems Biology, School of Medicine, Washington University in St. Louis, Saint Louis, MO, 63110, USA

Clarice K. Y. Hong, Yawei Wu, Alyssa A. Erickson, Jie Li, Arnold J. Federico & Barak A. Cohen

Department of Genetics, School of Medicine, Washington University in St. Louis, Saint Louis, MO, 63110, USA

Clarice K. Y. Hong, Yawei Wu, Alyssa A. Erickson, Jie Li, Arnold J. Federico & Barak A. Cohen

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

C.K.Y.H and B.A.C conceived and designed the project. C.K.Y.H, Y.W., A.A.E and J.L conducted the experiments, C.K.Y.H, A.A.E and A.J.F performed the analyses. C.K.Y.H and B.A.C wrote the manuscript.

Correspondence to Barak A. Cohen.

The authors declare no competing interests.

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

Hong, C.K.Y., Wu, Y., Erickson, A.A. et al. Massively parallel characterization of insulator activity across the genome. Nat Commun 15, 8350 (2024). https://doi.org/10.1038/s41467-024-52599-6

Download citation

Received: 09 January 2024

Accepted: 15 September 2024

Published: 27 September 2024

DOI: https://doi.org/10.1038/s41467-024-52599-6

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative