Structural basis of human transcription–DNA repair coupling

No statistical methods were used to predetermine sample size. The experiments were not randomized, and the investigators were not blinded to allocation during experiments and outcome assessment.

Cloning and protein expression

Vectors encoding full-length human CSA, DDB1, CSB, UVSSA, CUL4A and mouse RBX1 were obtained from Harvard Medical School PlasmID Repository. Genes were amplified by PCR and cloned into respective vectors by ligation-independent cloning39. CSA and RBX1 were cloned into the 438A vector (addgene no. 55218), and CSB, DDB1, UVSSA and CUL4A were cloned into the 438B vector (addgene no. 55219), resulting in no tag or 6×His tag, respectively. CSA and DDB1, and CSA, DDB1, CUL4A and RBX1 were combined into single vectors by ligation-independent cloning39. The CSB ATPase-deficient mutant (CSB K538R)40 and the pulling hook mutant (CSB F796A) were produced by around-the-horn-mutagenesis and expressed and purified as their wild-type counterparts. For fluorescent labelling of CSB, the gene was cloned into the 438-SNAP-V1 vector (addgene no. 55222), which resulted in a SNAPf tag at the N terminus. For fluorescent labelling of DSIF, a ybbR tag41 preceded by a GGGG linker was introduced to the C terminus of Spt4 by around-the-horn mutagenesis.

Proteins were expressed in insect cells. Sf9 (ThermoFisher), Sf21 (Expression Systems) and Hi5 (Expression Systems) cell lines were not tested for mycoplasma contamination and were not authenticated in-house. Preparation of bacmids and baculoviruses has previously been described in detail42. In brief, 600 ml of Hi5 cells grown in ESF-921 medium were infected with V1 virus and grown for 2–3 days. Cells were collected by centrifugation (30 min, 4 °C, 500g) and resuspended in lysis buffer (400 mM NaCl, 20 mM Tris-HCl pH 7.9, 10% glycerol (v/v), 1 mM DTT, 30 mM imidazole pH 8.0, 0.284 μg ml−1 leupeptin, 1.37 μg ml−1 pepstatin A, 0.17 mg ml−1 PMSF and 0.33 mg ml−1 benzamidine). Cell suspension was frozen in liquid nitrogen and stored at −80 °C until protein purification.

Protein purification

Pol II was purified from the pig thymus as previously described30,43. Human transcription elongation factors (DSIF, PAF, SPT6, RTF1 and P-TEFb) were prepared as previously described20,21,30. All protein purification steps were performed at 4 °C, unless stated otherwise. The purity of protein preparations was monitored by SDS–PAGE using NuPAGE 4–12% Bis-Tris protein gels (Invitrogen), followed by Coomassie staining. Initial purification steps were the same for all TCR proteins. Cells were thawed in a water bath at 30 °C. Cells were opened by sonication and the lysate was clarified by centrifugation and ultracentrifugation. Clarified lysate was further filtrated through 0.8-µm syringe filters and applied onto a HisTrap HP 5-ml column (GE Healthcare) equilibrated in lysis buffer. The column was washed with 5 CV of lysis buffer, 20 CV of high-salt buffer (800 mM NaCl, 20 mM Tris-HCl pH 7.9, 10% glycerol (v/v),1 mM DTT, 30 mM imidazole pH 8.0, 0.284 μg ml−1 leupeptin, 1.37 μg ml−1 pepstatin A, 0.17 mg ml−1 PMSF and 0.33 mg ml−1 benzamidine) and again 5 CV lysis buffer. Proteins were eluted with a 0–80% gradient of elution buffer (400 mM NaCl, 20 mM Tris-HCl pH 7.9, 10% glycerol (v/v), 1 mM DTT, 500 mM imidazole pH 8.0, 0.284 μg ml−1 leupeptin, 1.37 μg ml−1 pepstatin A, 0.17 mg ml−1 PMSF and 0.33 mg ml−1 benzamidine). In the case of the CSA–DDB1 complex, an additional step was introduced at this point to separate the CSA–DDB1 complex from excess of DDB1. After a high-salt wash, the column was washed with a low-salt buffer (150 mM NaCl, 20 mM Tris-HCl pH 7.9, 10% glycerol (v/v), 1 mM DTT, 30 mM imidazole pH 8.0, 0.284 μg ml−1 leupeptin, 1.37 μg ml−1 pepstatin A, 0.17 mg ml−1 PMSF and 0.33 mg ml−1 benzamidine) and protein was eluted directly onto a 5-ml HiTrapQ HP column (GE Healthcare) in a low-salt elution buffer (150 mM NaCl, 20 mM Tris-HCl pH 7.9, 10% glycerol (v/v), 1 mM DTT, 500 mM imidazole pH 8.0, 0.284 μg ml−1 leupeptin, 1.37 μg ml−1 pepstatin A, 0.17 mg ml−1 PMSF and 0.33 mg ml−1 benzamidine). The HiTrapQ column was washed with 5 CV of low-salt buffer and proteins were eluted with a 0–100% of monoQ elution buffer (1 M NaCl, 20 mM Tris-HCl pH 7.9, 10% glycerol (v/v), 1 mM DTT, 500 mM imidazole pH 8.0, 0.284  μg ml−1 leupeptin, 1.37 μg ml−1 pepstatin A, 0.17 mg ml−1 PMSF and 0.33 mg ml−1 benzamidine).

For all TCR proteins, appropriate protein fractions were pulled, mixed with 2 mg of TEV protease and dialysed overnight against the dialysis buffer (400 mM NaCl, 20 mM Tris-HCl pH 7.9, 10% glycerol (v/v) and 1 mM DTT). After, dialysis protein solution was passed through a 5-ml HisTrap column equilibrated in dialysis buffer. Flow-through containing the protein was collected, concentrated and loaded onto Superdex 200 10/300 increase column (GE Healthcare) equilibrated in storage buffer (400 mM NaCl, 20 mM NaOH:HEPES pH 7.5, 10% glycerol (v/v) and 1 mM DTT). Peak fractions were pulled, concentrated, flash frozen and stored at −80 °C.

CSB containing N-terminal SNAPf and TwinStrepII tag was purified as follows. The clarified lysate was incubated with 1 ml of Strep-TactinXT 4Flow high-capacity resin (IBA) pre-equilibrated in lysis buffer and washed extensively with lysis buffer. Protein was eluted with BXT buffer (IBA), concentrated and loaded onto Superdex 200 10/300 increase column (GE Healthcare) equilibrated in storage buffer (400 mM NaCl, 20 mM NaOH:HEPES pH 7.5, 10% glycerol (v/v) and 1 mM DTT). Peak fractions were pulled, concentrated, flash frozen and stored at −80 °C.

RNA extension assays

DNA and RNA oligonucleotides were ordered from Integrated DNA Technologies. Sequences19 used in the assay are: CTA CAT ACA CCA CAC ACC ACA CCG AGA AAA AAA AAT TAC CCC TTC ACC CTC ACT GCC CCA CAT TCT AAC CAC ACA TCA CTT ACC TGG ATA CAC CCT TAC TCC TCT CGA TAC CTC ACC ACC TTA CCT ACC ACC CAC (template strand); GTG GGT GGT AGG TAA GGT GGT GAG GTA TCG AGA GGA GTA AGG GTG TAT CCA GGT AAG TGA TGT GTG GTT AGA ATG TGG GGC AGT GAG GGT GAA GGG GTA ATT TTT TTT TCT CGG TGT GGT GTG TGG TGT ATG TAG (non-template strand); and /5Cy5/rUrUrA rUrArU rUrUrU rArUrU rCrUrU rArUrC rGrA rGrArG rGrA (RNA). Template-strand DNA and RNA were mixed in equimolar ratio and annealed in water by heating to solution to 95 °C followed by slow cooling (1 °C per min) to 4 °C. Pol II was mixed with DNA–RNA scaffold in equimolar ratio and incubated at 30 °C for 10 min. Next, 1.5 M excess of non-template DNA was added and the solution was incubated for 10 min more at 30 °C. A typical RNA extension reaction contained Pol II (200 nM) in the final buffer containing 100 mM NaCl, 20 mM HEPES pH 7.5, 5% (v/v) glycerol, 5 mM MgCl2 and 1 mM DTT. When proteins were titrated, the highest protein concentration was 2 μM (in case of a protein mixture, concentration of each factor was 2 μM), followed by a half-log dilution series. In the case of the DSIF–CSB competition assay, Pol II was pre-incubated with 1.5× excess of DSIF before addition of TCR factors. Reactions were pre-incubated at 37 °C for 5 min and started with the addition of NTPs (0.5 mM GTP, UTP and CTP, 1 mM ATP and 0.5 mM dATP). Reactions were quenched with 2× quenching buffer (7 M urea in TBE buffer, 20 mM EDTA and 10 μg ml−1 proteinase K (Thermo Scientific)). Proteins were digested for 30 min at 37 °C. RNA products were separated on a sequencing gel and visualized with a Typhoon FLA 9500 (GE Healthcare Life Sciences). Gel quantification was performed with ImageJ software and data were plotted with Prism 9 software.

Three-colour electrophoretic mobility shift assay

DNA and RNA oligonucleotides were ordered from Integrated DNA Technologies. Sequences19 used in the assay are: /56-FAM/CGC TCT GCT CCT TCT CCC ATC CTC TCG ATG GCT ATG AGA TCA ACT AG (template strand); CTA GTT GAT CTC ATA GCC ATC GAG AGG ATG GGA GAA GGA GCA GAG CG (non-template strand); and rArCrA rUrCrA rUrArA rCrArU rUrUrG rArArC rArArG rArArU rArUrA rUrArU rArCrA rArArA rUrCrG rArGrA rGrGrA (RNA). For this assay, CSB and DSIF were fluorescently labelled. SNAPf–CSB (50 μM) was incubated with 10× molar excess of SNAP-Surface 546 substrate (New England BioLabs) overnight at 4 °C in CSB storage buffer. Labelled CSB was purified from the excess dye by Superdex 200 10/300 increase column (GE Healthcare) equilibrated in storage buffer (400 mM NaCl, 20 mM NaOH:HEPES pH 7.5, 10% glycerol (v/v) and 1 mM DTT). Labelling efficiency was around 100%. DSIF subunit SPT4 contained a ybbR tag on the C terminus and the protein was labelled by using Sfp phosphopantetheinyl transferase, as previously described in detail41. Substrate for the labelling reaction was LD666-CoA (Lumidyne Technologies) and the labelling efficiency was around 85%. The Pol II elongation complex was assembled by incubating Pol II with 1.3× excess of template strand:RNA for 10 min at 30 °C, followed by the addition of 1.5× excess of non-template strand and further incubation for 10 min at 30 °C. Next, the Pol II elongation complex was supplemented with 1.2× excess of DSIF and incubated for 10 min at 30 °C. Finally, CSB was titrated in the reaction and the reaction was further incubated for 10 min at 30 °C. Final reaction contained Pol II (100 nM), DSIF (120 nM) and CSB (400 nM, 200 nM, 150 nM, 100 nM, 50 nM and 25 nM) in final buffer containing 100 mM NaCl, 20 mM HEPES pH 7.5, 10% glycerol, 2 mM MgCl2 and 1 mM DTT. Reactions were loaded on a NativePAGE 3–12% Bis-Tris gels (Thermo Scientific) and ran at 150 V for 1.5 h. The gels were scanned in Typhoon FLA 9500 (GE Healthcare Life Sciences) in three different channels for the visualization of template-strand DNA, CSB and DSIF.

Analytical size-exclusion chromatography

Analytical size-exclusion chromatography was used to monitor association of TCR factors with Pol II (Extended Data Fig. 1e) and to monitor RTF1 association with EC* and ECTCR (Extended Data Fig. 8a). In the case of TCR factors, the proteins were mixed in equimolar ratios in the final size-exclusion buffer (100 mM NaCl, 20 mM HEPES 7.5, 5% glycerol, 1 mM MgCl2 and 1 mM DTT) and ran over a Superose 6 Increase 3.2/300 column. The Pol II elongation complex was formed as for structure 1. In the case of RTF1 binding, all factors were added to the pre-formed Pol II elongation complex in 1.5× excess in the final size-exclusion buffer and incubated for 1 h at 30 °C in the presence of 1 mM ATP and P-TEFb. The complexes were injected onto a Superose 6 Increase 3.2/300 column and the fractions were analysed by SDS–PAGE. The template strand and RNA used for the EC* and ECTCR formation were the same (template strand: CGC TCT GCT CCT TCT CCC ATC CTC TCG ATG GCT ATG AGA TCA ACT AG; RNA: rArCrA rUrCrA rUrArA rCrArU rUrUrG rArArC rArArG rArArU rArUrA rUrArU rArCrA rArArA rUrCrG rArGrA rGrGrA) but differed in the non-template strand, which was fully complementary to the template strand in the case of EC* (non-template strand: CTA GTT GAT CTC ATA GCC ATC GAG AGG ATG GGA GAA GGA GCA GAG CG) or formed a large bubble with the template strand in the case of ECTCR (non-template strand: CTA GTT GAT CTC ATA TTT CAT TCC TAC TCA GGA GAA GGA GCA GAG CG).

In vitro ubiquitylation assay

Ubiquitin, UBE1 and UbcH5b were purchased from Boston Biochem. The Pol II elongation complex was formed as for structural analysis of structure 1. The ubiquitylation reaction contained Pol II ECs (0.8 μM), CSB (0.8 μM), UVSSA (0.8 μM), CSA–DDB1–CUL4A–RBX1 (0.8 μM), UBE1 (150 nM), UbcH5b (0.5 μM) and ubiquitin (300 μM) in 100 mM NaCl, 50 mM Tris pH 7.9, 10 mM MgCl2, 0.2 mM CaCl2, 5% glycerol and 1 mM DTT. Reactions were started by the addition of ATP (3 mM) and stopped with EDTA (15 mM). Proteins were separated on NuPAGE 4–12% Bis-Tris protein gels (Invitrogen) and stained with Coomassie. In the case of the ubiquitylation assay in the absence of CSB or UVSSA, the assay was performed as described above, but with lower concentrations of Pol II, CRL4CSA and CSB or UVSSA (0.4 μM). The proteins were separated on 3–8% Tris-acetate gel (Invitrogen) and transferred onto a PVDF membrane with a Trans-Blot Turbo Transfer System (Bio-Rad) for immunoblotting. The membrane was blocked with 5% (w/v) milk powder in PBS containing 0.1% Tween-20 (PBST) for 1 h at room temperature. The membrane was then incubated with F-12 anti-RPB1 antibody (1:100 dilution; Santa Cruz Biotechnology) in PBST supplemented with 2.5% (w/v) milk powder. After washing the membrane with PBST, the membrane was incubated with an anti-mouse HRP conjugate (1:3,000 dilution; ab5870, Abcam) in PBST supplemented with 1% (w/v) milk powder for 1 h at room temperature. The membrane was developed with SuperSignal West Pico Chemiluminescent Substrate (Thermo Fisher) and scanned with a ChemoCam Advanced Fluorescence imaging system (Intas Science Imaging).

ATPase assay

The enzyme-coupled ATPase assay uses two separate fast enzymatic reactions to couple ATP regeneration to NADH oxidation. The typical reaction contained 100 nM protein in buffer containing 50 mM potassium acetate, 20 mM KOH-HEPES pH 7, 5 mM magnesium acetate, 5% glycerol (v/v), 0.2 mg ml−1 BSA, 3 mM phosphoenolpyruvate (PEP), 0.3 mM NADH and excess pyruvate kinase and lactate dehydrogenase enzyme mix (Sigma). The reaction mixture was incubated for 10 min at 30 °C and the reaction was started by addition of ATP (1.5 mM final). The rate of ATP hydrolysis was monitored by measuring a decrease in the absorption at 340 nm using the Infinite M1000Pro reader (Tecan). Resulting curves were fit to a linear model using GraphPad Prism version 9.

Crosslinking mass spectrometry

The Pol II elongation complex was formed as described in the RNA extension assay. DNA and RNA sequences used for elongation complex formation are the following19: CGC TCT GCT CCT TCT CCC ATC CTC TCG ATG GCT ATG AGA TCA ACT AG (template strand); CTA GTT GAT CTC ATA TTT CAT TCC TAC TCA GGA GAA GGA GCA GAG CG (non-template strand); and rArUrC rGrAr GrArG rGrA (RNA). Equimolar amounts of elongation complex, CSB, CSA–DDB1 and UVSSA were mixed in the final complex formation buffer of 100 mM NaCl, 20 mM HEPES pH 7.5, 1 mM DTT, 1 mM MgCl2 and 5% glycerol. The complex was incubated at 30 °C for 10 min and subsequently purified over a Superose 6 Increase 3.2/300 column equilibrated in complex formation buffer. For BS3 crosslinking, the protein solution was supplemented with 1 mM BS3 and incubated at 30 °C for 30 min. The crosslinking was quenched with 50 mM ammonium bicarbonate. For EDC crosslinking, the complex formation buffer contained HEPES pH 6.7 instead of pH 7.5. The protein solution was supplemented with 2 mM EDC and 5 mM sulfo-NHS and incubated at 30 °C for 30 min. The crosslinking reaction was quenched with 50 mM 2-mercaptoethanol and 20 mM Tris pH 7.9.

Analysis of crosslinked peptides was performed as previously described17. The crosslinked proteins were reduced with 10 mM DTT for 30 min at 37 °C and alkylated with 40 mM iodoacetamide for 30 min at 25 °C. Protein digestion was performed overnight in denaturing conditions (1 M urea) with 5 µg trypsin (Promega) at 37 °C. Formic acid (FA) and acetonitrile (ACN) were added to the digested samples to 0.1% (v/v) and 5% (v/v) final concentrations. Samples were purified with Sep-Pak C18 1cc 50 mg sorbent cartridge (Waters) by washing away salts and contaminants with 5% (v/v) ACN, 0.1% (v/v) FA and eluting bound peptides with 80% (v/v) ACN and 0.1% (v/v) FA. The extracted peptides were dried under vacuum and resuspended in 30 µl 30% (v/v) ACN and 0.1% (v/v) trifluoroacetic acid (TFA). Size separation of peptides was performed with a Superdex Peptide PC3.2/30 column (GE Healthcare) at flow rate of 50 µl min−1 30% (v/v) ACN and 0.1% (v/v) TFA. Fractions (100 µl) corresponding to elution volume 1.1–2 ml were collected, dried under vacuum and resuspended in 20 µl 2% (v/v) ACN and 0.05% (v/v) TFA.

Mass spectrometry analysis was performed on the Q Exactive HF-X Mass Spectrometer (Thermo Fisher Scientific) coupled with the Dionex UltiMate 3000 UHPLC system (Thermo Fisher Scientific). Online chromatographical separation was achieved with an in-house packed C18 column (ReproSil-Pur 120 C18-AQ, 1.9-µm pore size, 75-µm inner diameter and 30 cm in length; Dr. Maisch). Samples were analysed as three 5-µl injections, separated on a 75-min gradient: flow rate of 300 nl min−1; mobile phase A was 0.1% (v/v) FA; mobile phase B was 80% (v/v) ACN and 0.08% (v/v) ACN. The gradient was formed with an increase from 8%/12%/18% mobile phase B to 38%/42%/48% (depending on the fraction). MS1 acquisition was achieved with the following settings: resolution of 120,000; mass range of 380–1,580 m/z; injection time of 50 ms; and automatic gain control target of 1 × 106. MS2 fragment spectra were collected with dynamic exclusion of 10 s and varying normalized collision energy for the different injection replicates (28%/30%/28–32%) and the following settings: isolation window of 1.4 m/z; resolution of 30,000; injection time of 128 ms; and automatic gain control target of 2 × 105.

Result raw files were converted to the mgf format with ProteomeDiscoverer 2.1.0.81 (Thermo Fisher Scientific): signal-to-noise ratio of 1.5, and precursor mass of 350–7,000 Da. Crosslinked peptides were identified with pLink v2.3.9 (pFind group44) and the following parameters: missed cleavage sites was 3; fixed modification was carbamidomethylation of cysteines; variable modification was oxidation of methionines; peptide tolerance was 10 p.p.m.; fragment tolerance was 20 p.p.m.; peptide length was 5–60 amino acids; and the spectral false discovery rate was 1%. The sequence database was assembled from all proteins within the complex. Crosslink sites were visualized with XiNet45 and the Xlink Analyzer46 plugin in Chimera.

The samples for the ubiquitylation analysis were produced by an in vitro ubiquitylation assay as described above. Control sample was prepared in the same way but without the addition of ubiquitin to make sure that endogenously purified Pol II was not already ubiquitylated. In addition to site-specific Pol II ubiquitylation, promiscuous ubiquitylation of free CSB and UVSSA was observed that probably resulted from a population of TCR factors not bound to Pol II.

For mass spectrometry, the samples were reduced with 5 mM DTT for 30 min at 37 °C and alkylated with 20 mM chloroacetamide for 30 min at room temperature. Unreacted chloroacetamide was quenched by supplementing an additional 5 mM DTT. Proteolytic digestion was performed overnight in denaturing conditions (1 M urea) with trypsin (Promega) in a 1:20 (w/w) protein ratio. The digestion mixtures were acidified with FA to 1% (v/v) end concentration and ACN was added to 5% (v/v) final concentration. Reversed-phase chromatographical purification for mass spectrometric analysis was performed with Harvard Apparatus Micro SpinColumns C18 by washing away salts and contaminants with 5% (v/v) ACN and 0.1% (v/v) FA. Purified peptides were eluted with 50% (v/v) ACN and 0.1% (v/v) FA. The peptide mixture was dried under vacuum and resuspended in 2% (v/v) ACN and 0.05% (v/v) TFA (5 µl for 1 µg of estimated protein amount before digestion).

Liquid chromatography with tandem mass spectrometry analysis was performed by injecting 4 µl of the samples in the Dionex UltiMate 3000 UHPLC system (Thermo Fisher Scientific) coupled with the Orbitrap Fusion Tribrid Mass Spectrometer (Thermo Fisher Scientific). Peptides were separated on an in-house packed C18 column (ReproSil-Pur 120 C18-AQ, 1.9-µm pore size, 75-µm inner diameter and 31 cm in length; Dr. Maisch). Chromatographical separation was achieved with 0.1% (v/v) FA (mobile phase A) and 80% (v/v) ACN and 0.08% (v/v) ACN (mobile phase B). A gradient was formed by the increase of mobile phase B from 5% to 42% in 43 min. Eluting peptides were analysed by data-dependent acquisition with the following MS1 parameters: resolution of 60,000; scan range of 350–1,500 m/z; injection time of 50 ms; and automatic gain control target of 4 × 105. Analytes with charge states 2–7 were selected for higher-energy collisional dissociation with 30% normalized collision energy. Dynamic exclusion was set to 10 s. Fragment MS2 spectra were acquired with the following settings: isolation window of 1.6 m/z; detector type was orbitrap; resolution of 15,000; injection time of 120 ms; and automatic gain control target of 5 × 104.

The resulting acquisition files were analysed with MaxQuant47 (v1.6.17.0). Fragment peptide spectra were searched against a database containing all proteins of the complex and common protein contaminants. Oxidation of methionines, acetylation of protein N terminus and ubiquitylation residue on lysines were set as variable modifications. Carbamidomethylation of cysteines was set as a fixed modification. Default settings were used with the following exceptions: main search peptide tolerance was set to 6 p.p.m.; trypsin was selected for digestion enzyme; and maximum missed cleavages were increased to 3.

Cryo-EM sample preparation and image processing

The same DNA scaffolds were used for all structures19: CGC TCT GCT CCT TCT CCC ATC CTC TCG ATG GCT ATG AGA TCA ACT AG (template strand) and CTA GTT GAT CTC ATA TTT CAT TCC TAC TCA GGA GAA GGA GCA GAG CG (non-template strand). In the case of Pol II complex formation with TCR factors only, the shorter RNA was used: rArUrC rGrArG rArGrG rA. If SPT6, PAF and RTF1 were also present, longer RNA was used: rArCrA rUrCrA rUrArA rCrArU rUrUrG rArArC rArArG rArArU rArUrA rUrArU rArCrA rArArA rUrCrG rArGrA rGrGrA. The elongation complex was formed as in the RNA extension assays. For the Pol II–CSB–CSA–DDB1–UVSSA structure, the pre-formed elongation complex was mixed with twofold excess of TCR factors in complex formation buffer containing 100 mM NaCl, 20 mM HEPES pH 7.5, 1 mM MgCl2, 4% glycerol and 1 mM DTT. The protein solution was incubated at room temperature for 10 min and purified by the Superose 6 Increase 3.2/300 column equilibrated in complex formation buffer. Peak fractions were crosslinked with 0.1% glutaraldehyde on ice for 10 min and quenched with a mixture of lysine (50 mM final) and aspartate (20 mM final). The quenched protein solution was dialysed in Slide-A-Lyzer MINI Dialysis Device of 20K MWCO (Thermo Fisher Scientific) for 6 h against the complex formation buffer without glycerol. For the Pol II–CSB–CSA–DDB1–UVSSA–ADP•BeF3 structure, the complex was supplemented with 0.5 mM ADP•BeF3 before complex purification by size-exclusion chromatography. In the case of complex formation between Pol II, TCR factors, PAF, SPT6 and RTF1, the pre-formed elongation complex was mixed with twofold excess of all proteins in complex formation buffer. In addition, the reaction was supplemented with P-TEFb and ATP (1 mM final), as previously described30. Because ATP was present, we used a CSB ATPase-deficient mutant for complex formation. The complex was incubated at 30 °C for 1 h and purified by a Superose 6 Increase 3.2/300 column equilibrated in complex formation buffer. Downstream steps including crosslinking and dialysis were the same as for the previous samples. Dialysed samples were immediately used for the preparation of cryo-EM grids. Of the sample, 4 µl was applied to glow-discharged R2/1 carbon grids (Quantifoil), which were blotted for 5 s and plunge-frozen in liquid ethane with a Vitrobot Mark IV (FEI) operated at 4 °C and 100% humidity.

Micrographs were acquired on a FEI Titan Krios transmission electron microscope with a K3 summit direct electron detector (Gatan) and a GIF quantum energy filter (Gatan) operated with a slit width of 20 eV. Data collection was automated using SerialEM48 and micrographs were taken at a magnification of ×81,000 (1.05 Å per pixel) with a dose of 1–1.05 e/Å2 per frame over 40 frames. For Pol II–CSA–DDB1–CSB–UVSSA, a total of 10,300 micrographs were acquired; for Pol II–CSA–DDB1–CSB–UVSSA–ADP•BeF3, 10,940 micrographs were acquired; for Pol II–CSA–DDB1–CSB–UVSSA–SPT6–PAF, 8,365 micrographs were acquired; and for Pol II–CRL4CSA–CSB–UVSSA–Spt6–PAF, 19,472 micrographs were acquired. Estimation of the contrast-transfer function, motion correction and particle picking was done on-the-fly using Warp49. Initial 2D classification and 3D classification steps were done in CryoSPARC50, followed by further processing in RELION 3.0 (refs 51,52,53). Owing to the flexibility of proteins on the Pol II surface, many rounds of signal subtraction and focused classifications were performed, as detailed for every dataset in Extended Data Figs. 3, 5, 7, 9. As a result, the focused classified maps were assembled into a final composite map for each structure. Masks were created with UCSF Chimera54. The final composite maps were created from focused refined maps and denoised in Warp49.

Model building and refinement

The focused refined maps and the final composite maps were used for model building. For the Pol II–CSB–CSA–DDB1–UVSSA structure, we first docked existing structures into the density. An initial CSB model was produced with SWISS-MODEL55,56 using the Rad26 structure (Protein Data Bank (PDB) code: 5VVR19) as the template. The model was fitted into the CSB focused refined map in Chimera54 and rebuilt in Coot57, followed by real-space refinement in PHENIX58. The CSA–DDB1 crystal structure (PDB code: 4A11 (ref. 14)) was fitted into the CSA–DDB1 focused refined map and real-space refinement in PHENIX58. During 3D classifications, the β-propeller B of DDB1 was found to adopt many different conformations, apparently rotating around the junction with the rest of the protein, and the final model reflects the most commonly observed conformation. The N-terminal VHS domain of UVSSA was predicted with SWISS-MODEL55,56 using the GGA3 VHS domain as a template (PDB code: 1JPL59). Guided by the crosslinking mass spectrometry data and EM density, the model was fitted into the CSA–UVSSA focused refined map, followed by several rounds of flexible fitting in Namdinator60 and real-space refinement in PHENIX58. The Pol II model (PDB code: 7B0Y61) was fitted into the final map and nucleic acids were modified and built in Coot. All protein models were combined in Coot and real-space refined in PHENIX into the final composite map using secondary structure, base-pairing and base-stacking restrains. For the Pol II–CSB–CSA–DDB1–UVSSA–ADP•BeF3 model, ADP•BeF3 was fitted into the density together with the Pol II–CSB–CSA–DDB1–UVSSA model and real-space refined in PHENIX into the final composite map using secondary structure, base-pairing and base-stacking restrains.

For the Pol II–CSB–CSA–DDB1–UVSSA–SPT6–PAF structure, the SPT6 and PAF models (PDB code: 6TED21) were fitted into corresponding focused refined maps, adjusted in Coot and real-space refined in PHENIX. Owing to the improved resolution of the SPT6 core, we built an atomic model for it (the SPT6 core was previously modelled on the backbone level). The C-terminal part of LEO1 was displaced in our structure, and therefore these elements were manually built in Coot and deposited as polyalanine because the register could not be determined with certainty. RNA outside Pol II was poorly resolved, presumably due to the absence of DSIF, so we modelled it on the basis of the previous structure (PDB code: 6TED21). All models were combined in Coot and real-space refined in PHENIX in the final composite map. In the case of the Pol II–CSB–CRL4CSA–UVSSA–SPT6–PAF complex, 3D classification of the stably bound CSA–DDB1–CSB complex revealed two distinct conformations of CUL4A–RBX1. In the first conformation (state 1), CUL4A interacts with UVSSA; in the second conformation (state 2), CUL4A interacts with CSB. Owing to increased flexibility of CUL4A–RBX1, only a smaller subset of particles was used for the final focused refinement of this region. Both focused refinement rounds yielded reconstructions with well-resolved CSA–DDB1, which was then used to resample maps on the map of CSA–DDB1–CSB reconstructed from all particles with stably bound TCR proteins. The crystal structure of the CUL4A–RBX1 (PDB code: 4A0K14) complex was fitted into the corresponding focused refined maps, followed by several rounds of flexible fitting in Namdinator60 and real-space refinement in PHENIX58. The β-propeller B of DDB1 was manually adjusted in Chimera and Coot for both CRL4CSA conformations. The model of Pol II–CSB–CSA–DDB1–UVSSA–SPT6–PAF was combined with CUL4A–RBX1 in Coot and the complete models were real-space refined in corresponding composite maps in PHENIX using secondary structure, base-pairing and base-stacking restrains. For Fig. 3, full RBX1 was modelled on the basis of a CUL4A–RBX1 structure (PDB code: 2HYE)62 due to lower map quality in this region, and the E2 enzyme–donor ubiquitin complex was not present in the complex and was modelled on the basis of a RNF4 RING–UbcH5a–ubiquitin structure (PDB code: 4AP4)63. In the case of structures containing a CSB ATPase-deficient mutant, the ATPase lobe 2 of CSB is very flexible. Since the complex was incubated with ATP, it is likely that the structure contains a mixture of empty and ATP-bound CSB molecules, resulting in both pre- translocated and post-translocated states of CSB. Final models were validated in Molprobity64 and the figures were generated with Chimera54 and ChimeraX65.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.