The cloning and expression of Human serum albumin in competent E.coli cells. Applicable on many known proteins.

The procedure is typical and applicable for many known proteins and everything you need is easily available via online retail. If you find insulin expensive here is your alternative, just grow it in E.coli at home!

Cloning and expression of Human serum albumin in competent E.coli cells.

1. Protein properties

2. DNA analysis and informatics tools & Cloning vector

3. Cloning vector & Expression

4. Primer design 5. PCR and PCR tuning

6. Restriction enzyme treatment and purification & isolation

7. Ligation & Transformation

8. Analysis 9. DNA sequence 10. Amino Acid sequence

Quickly explaining the procedure of identifying and cloning Human serum albumin or into an E.coli strain for expression via PCR amplification and a cloning vector. Different aspects of cloning and its problems will be discussed in general as they are encountered. The goal here is also to answer how to best express HSA in E.coli and what to think about when choosing vector, PCR-cycles, E.coli strain and primers for good expression and transformation rates.

Recommended reading: Gene cloning & DNA analyses

Protein Properties

HSA, human serum albumin is the most common protein in the human blood plasma and can it can carry numerous metabolites and hormones throughout the body. HSA is made up of 60939 amino acids when synthesized and is then cut in the ER to achieve its proactive form. The final form is secreted by the Golgi complex where it also is cut to its final and active form. HAS is made up of 3 sub domains that are very similar both in function and structure called the albumin 1-3 domains. It has a total molecular mass at 66.5kd in its mature form. HAS is produced in the liver39 of humans and has a half-life of about 20 days. The full gene coding for albumin is located on chromosome 4 and is almost 1700039 bases long. The gene is split into 15 exons prior to transcription. The protein is naturally expressed in eukaryotic cells, in humans to be exact. Under physiological conditions it´s secondary structure consists of 70% alpha-helix6.

DNA analysis & informatics tools

Identifying the protein from a DNA sequence can be done with Expasy translate5 to receive the proper amino acid sequence along with information such as molecular weight and theoretical values such as pI. The amino sequence and the rest of the information as well, can be matched right away with a function called BLAST. The tool gives a 99% match with the protein HAS, human serum albumin3. The sequence matches other data-banks as well e.g drugbank.ca40 As can be seen in the attachment, the database Uniprot4 gives the ID and sequence for the protein and the sequence of amino acids match perfectly with the sequence from Expasy translate5. It is the most common protein in the blood plasma or at least most abundant. The DNA sequence and complementary mRNA sequence coding for HSA is 18627 nucleotides long or 1862 base pairs when in a double stranded gene or vector. According to Expasy16 these data are true, and good to know when it comes to analyze the protein.

Amino acids: 585

Molecular weight: 66401.13

Theoretical PI:  5.67

Cloning vector

When choosing a vector for this certain project, it is important to focus on an expression vector to achieve high protein yield / expression of the desired protein. The length of the inserted gene is about 1862 bps, which will be taken into consideration when picking the proper vector. The desired fragment to be cloned will be shorter than 1862 bps, since the full sequence codes for the inactive protein. The required sequence will lack 72 bps and be 1790 bps long, but the change is small and pretty much irrelevant, since plasmids are most common when cloning fragments of 0-5000bps in size. When looking for a vector to express proteins promoters, especially the T7 promotor is important, and must be in proximity to the cloning sites or the MCS, multiple cloning sites. Pet 22 can be treated with the common restriction enzymes as BamH1 & Nde1 and will work well for the intended gene size, since it is a plasmid is intended for this sort of cloning and size of fragment. Pet 22b(+) from Novagen38 is an expression vector prefect for the intended cloning procedure based on the criteria stated above and easy to order right away. Pet 22b(+) also comes with an ampicillin resistance gene ampR, which can be seen in fig 2. This makes it easy to know that the plasmids have been transformed properly into the E.coli and that; at least the resistance has been expressed.

Expression

IPTG Isopropyl β-D-1-thiogalactopyranoside14 will be added so the repressor proteins in E.coli won´t suppress transcription of the lac operon, which follows directly after the T7 promotor. IPTG is similar to allolactose which naturally triggers transcription of the lac operon. HAS is a cytosolic protein and should pose no other problems when it comes to expressing it in this specific E.coli strain. When expressing eukaryotic proteins in bacteria the vector must contain replication origin so that it can replicate in the host, which it has. Bacteria are for that matter the best choice for expressing this protein since they are very common and available ready for transformation. The lifecycle of E.coli is only hours long which is also beneficial.

Primer design

Designing two primers that will anneal to the designated spots and nowhere else is important. They must be exactly complementary to the annealing sequence to not carry mutations and with the right PCR annealing temperature they will attach to the exact right spots. To hot and they will not bind and while to cold they lose speficicity.41 The yield of the PCR can be analyzed to see if the temperatures seem right and then be tweaked to achieve maximal yield before moving on to ligation with the treated cloning vector. The targeted gene will be isolated from a cDNA library which comes with stop codons each gene, which makes it easy to isolate and to amplify! with only the right primers and PCR-cycles. Since the mature protein is only 585 amino acids long while the full DNA sequence given to clone, codes for the 609 amino acid long precursor protein, consisting of an extra signal peptide 1-19 and another propeptide 19-24. Naturally these are peptides are cut off in the ER but since this can´t happen in E.coli the primer will be designed anneal to the DNA sequence coding for amino acid number 25 and forth as well as number 609 and backwards. This will cause the PCR fragments to contain only the DNA coding for the mature and active protein. The length of the primes must be 20-30 base pairs so a primer of 18 respectively 20 bases to start with is good. Primers designed to not include the DNA coding for the signal peptide and propeptide will look like this:

Forward primer: 5´- AGATGCACACAAGAGTGA- 3´

Reverse primer: 5´- TTATAAGCCTAAGGCAGCTT- 3´

The primers melting temperature, Tm is important to calculate since it gives a good indication of the best annealing temperature. The Tm of a primer can be calculated using following equations, the first of which gives a good indication of the true Tm:

Tm=4C °x(G+C) + 2C °x(A+T) :

Basic Tm: G,C,T,A stands for the number the specific oligo tide.

Or Tm(oC) = {ΔH/ ΔS + R ln(C)} - 273.15

The GC value for a functional primer, seen as a % of the total number of bases should be 40-60%8, And the primer should preferably have a G or C base at or close to the 3´end, which increases it´s binding effectivity. The G/C bases binds slightly stronger and this will increase the specificity of the primer, while having more than 2 G or C´s in the last five bases will decrease it.

Forward / Reverse

Oligo nucleotides in primer (no attachment ) 18 20

Attachment size 12 12

GC-value 46.67% 43.75%

GC-value (Annealing sequence) 44.44% 40.00%

Tm calculated 77.5 C 72.7 C

Tm calculated (Annealing sequence) 55.5 C 57.9 C

To finalize the primers the forward primer must contain the sequence recognized by BamHI that will be partially cut away. BamHI22 recognizes GGATCC22 so this will be added to the forward primer 5´end as well as ATG12 coding for a Methionine, serving as a start codon. 3 Bases will be added at the forward primer for the efficiency of BamHI, which cuts better 3 to 5 bases from the end, these bases will also be chosen to give the right Tm and GC percentage. The reverse primer needs the corresponding sequence for NdeI. NdeI23 recognizes CATATG23 so this will be added at the 5´ end of the reverse primer. 3 bases will be added since Ndel has higher efficiency 3 to 5 bases from the end or rather binds better to the designated sequence. The codon for methionine will be added as well. Generally seen, 6 bases can be added.11 When choosing the 3 extra bases for both the primers, they should be chosen to decrease the chance of dimers forming. These two primers match all the criteria above.

Forward primer: 5´- TTC -GGATCC-ATG - AGATGCACACAAGAGTGA- 3´

Reverse primer: 5´- GGC-CATATG-ATG- TTATAAGCCTAAGGCAGCTT- 3´

The final primers will be 30 respectively 32 bases long which is good. ThermoFisher19 was used to get the values above in table 1 and Primer-BLAST18 to see if they anneal.

PCR and PCR tuning

The PCR will be carried out in 3 steps designed to amplify the selected gene and conserve the sequence totally. The steps are then repeated to reach a high enough concentration DNA fragments.The annealing temperature should not exceed 72 degrees according to ThermoFIsher19 since the lowest Tm is 72 C. Common DNA polymerases can write about 1000 bases each minute leavening no alternatives but to go for 120 seconds annealing at least, this could be increased if faced with low yield. The polymerase: Vent20 DNA polymerase is faster than taq polymerase and uses 72 C for the elongation steps and the final elongation21. It needs 95 C for initial denaturation for 2- 5 minutes.21

Temperature(C) / Duration (S)

Initial denaturation 95c 200s

Denaturation 92c 30s

Annealing 64c 30s

Elongation 62c 120s

Final elongation 62c 300s

Values for the PCR cycles according to Neb, manufacturer of the used Vent DNA polymerase, and in line with the desired temperature for the primers.

PCR-mixture contains: Vent DNA-Polymerase, dNTPs, for/rev primers, ThermoPol20 buffer and optionally MgSO4 mixed in regular PCR tubes. This tube is then simply loaded on the PCR machine which has been preprogrammed accordingly to the settings in table 2. Figure 442: Typical PCR. After running the PCR, the concentration of DNA can be measured by using Nano-drop, an absorbance measure system that takes very small volumes. When the concentration of the PCR solution is to low or not even measurable, the Annealing temperature will probably have to be tweaked.

Isolation & Purification

When working with DNA fragments and plasmids one must often preform isolation and purification measures. Buffers and reactants that still are present with the DNA after a PCR or after the treatment of a vector with e.g. enzymes or alkaline phosphate one must isolate the DNA for further usage of it. PCR products and plasmids (the vector) can be isolated and purified with DNA cleaning or isolation kits like the QIAamp DNA Mini Kit25 from qiagen designed to isolate DNA fragments. It works by a spin column and is a fast process specified by the Kit. For the vector after treatment and such, the QIAprep Spin Miniprep26 Kit will isolate the vector, also a spin kit for fast use.

Restriction-Enzymes & Vector treatment

The vector Pet-22b has a site called the MSC or the multiple cloning site with a T7 promotor and terminator to promote transcriptions of genes placed in-between. In-between the promotors there are many different sites designated for different enzymes to bind in and digest / cut the plasmid. The Ndel and BamHI21 have a single site to bind to at Pet-22b and lies at a good distances from each other and will be chosen for these reasons and its accessibility and activity in the designated buffert2. Mixing these enzymes with the vector will leave it cut open and ready to be ligated with the PCR fragment. The treatment itself with 2 enzymes will prevent self-attachment by leaving the ends with different over hangs. The primers will be designed so that these enzymes can bind and cut the PCR fragment at the right place, leaving the opposite over hang or rather under hang on the fragment. The fragment will now be a perfect match and the ligation will be easier.

When using two enzymes there is no big need for further treatment like with alkaline phosphate, since self-attachment will be unlikely. Mixing the vector / PCR fragments along with Cut-smart buffer24 from New England BioLabs and the restriction enzymes will assure good effectivity. Heat-inactivation works for these enzymes, which makes it easy and fast to do.

Ligation

T4 DNA ligase along with T4 DNA Ligase Reaction buffer34 was chosen for its speed and availability at New England BioLabs. T4 Ligase is extracted from viruses and their ligases are generally faster. To promote circle formation DNA concentration should be low35, this is important to get a good transformation. The molar ratio between vector and fragment should be between 1:1 and 1:1035, while 1:3 is typical and will be used. The fragment concentration was measured by Nano drop so adding the right proportions poses no problem at all. ATP is an essential co-factor here, and the mixture of vectors, enzymes and fragments in the buffer should have 1ul of T4 DNA ligase added per 20ul of mixture and then be kept at 20-25 C for 10 minutes36. This is a fast procedure compared to the ligation of blunt ended DNA, and will save a lot of time. This enzyme is heat-inactivate at 65 C for 20 minutes.

Transformation

First, choosing a good strain of E.coli is important, since it’s hard to transform plasmids into non-competent cells or rather regular cells.

BL21(DE3)1 Competent E. Coli is designed to have high transformation rate which is excellent, as well as it being a T7 expression strain and is easily accessible, makes it a good choice.

To make the transformation happen according to a high efficiency protocol27 heat-shocking a mixture of BL21(DE3)1 Competent E. Coli and the ligated vector for 30 seconds at 42 degrees Celsius is the best and easiest way. The mixture should be placed on ice for 5 minutes prior to adding SOC28, which gives a higher rate than LB medium would, and then incubated at 37 degrees Celsius for 1 hour. After transformation the E.coli will be grown on a medium for 24 h to give the protein time to be expressed, but not too long time. Growth temperature will be 37 degrees since it mimics physiological condition, under which the protein usually is found. To not waste any material, one can run a new PCR with the E.coli after the ligation to see if the plasmid actually contains the gene for HSA. The bacteria only need to be denaturized in the PCR mixture and if the gene is present, it will amplify in this PCR run. This is good to do since using both the vector and bacteria with failed ligation will be a wasted procedure. The transformation will be confirmed to be working, once the E.coli has been spotted at the selective ampicillin medium.

Analysis

If the E.coli cultures have grown at all we know that the plasmid has been transformed into the E.coli, which has become immune to the selective medium. The ampR gene must have been expressed if the E.coli of choice has grown on an ampicillin containing medium. To confirm the existence of human serum albumin in the E.coli the cells can be lysed and the proteins separated out first by SDS-PAGE separating on molecular weight, 14% gel will separate out 10-8032kDa molecules. This might leave the protein along with other cell-rests, which has similar or the same molecular weight. Gels29 are common, easy to buy and even easier to get ready for usage, simply mix it29 with water and run in a microwave oven as instructed. When preparing the sample its common to add coomassie blue to proteins to highlight the bands in the gel, and make the results directly visible. Picture 3 Using Ion-exchange chromatography on DEAE cellulose31, the protein can be isolated further base on its pI value and will be close to or be isolated totally after this procedure. This will confirm or disconfirm the presence of human serum albumin in the E.coli cultures, when measuring the absorbance at 280 nm where the amino acids will absorb light17. This could off course be another protein with the same exact mass and same pI value, but that is unlikely in this case, after a SDS-PAGE. It is easy to run the same diagnostics on the E.coli prior to transformation, this will rule out there being another protein at the same mass, if it’s not seen in this test run and not cloned in that is. When cloning with common vectors like pet 22, there are tags incorporated in the vector, in this case a tag30. This tag can be used for identification/ purification but will be unnecessary since a more common and still effective method for identification and purification already have been put forth. There is also a possibility to use the pelB37 tag in pet based vectors for periplasmic localization but will be unnecessary as well.

Attachment

DNA sequence The highlighted a, sequence number 72, indicates the start of the sequence coding for the mature protein only and it is the first base of the forward primer, without attachment.

atgaagtgggtaacctttatttcccttctttttctctttagctcggcttattccagggg tgtgtttcgtcgagatgcacacaagagtgaggttgctcatcggtttaaagatttgggag aagaaaatttcaaagccttggtgttgattgcctttgctcagtatcttcagcagtgtcca tttgaagatcatgtaaaattagtgaatgaagtaactgaatttgcaaaaacatgtgttgc tgatgagtcagctgaaaattgtgacaaatcacttcataccctttttggagacaaattat gcacagttgcaactcttcgtgaaacctatggtgaaatggctgactgctgtgcaaaacaa gaacctgggagaaatgaatgcttcttgcaacacaaagatgacaacccaaacctcccccg attggtgagaccagaggttgatgtgatgtgcactgcttttcatgacaatgaagagacat ttttgaaaaaatacttatatgaaattgccagaagacatccttacttttatgccccggaa ctccttttctttgctaaaaggtataaagctgcttttacagaatgttgccaagctgctga taaagctgcctgcctgttgccaaagctcgatgaacttcgggatgaagggaaggcttcgt ctgccaaacagagactcaagtgtgccagtctccaaaaatttggagaaagagctttcaaa gcatgggcagtagctcgcctgagccagagatttcccaaagctgagtttgcagaagtttc caagttagtgacagatcttaccaaagtccacacggaatgctgccatggagatctgcttg aatgtgctgatgacagggcggaccttgccaagtatatctgtgaaaatcaagattcgatc tccagtaaactgaaggaatgctgtgaaaaacctctgttggaaaaatcccactgcattgc cgaagtggaaaatgatgagatgcctgctgacttgccttcattagctgctgattttgttg aaagtaaggatgtttgcaaaaactatgctgaggcaaaggatgtcttcttgggcatgttt ttgtatgaatatgcaagaaggcatcctgattactctgtcgtgctgctgctgagacttgc caagacatatgaaaccactctagagaagtgctgtgccgctgcagatcctcatgaatgct atgccaaagtgttcgatgaatttaaacctcttgtggaagagcctcagaatttaatcaaa caaaattgtgagctttttgagcagcttggagagtacaaattccagaatgcgctgttagt tcgttacaccaagaaagtacccgaagtgtcaactccaactcttgtagaggtctcaagaa acctaggaaaagtgggcagcaaatgttgtaaacatcctgaagcaaaaagaatgccctgt gcagaagactatctatccgtggtcctgaaccagttatgtgtgttgcatgagaaaacgcc agtaagtgacagagtcaccaaatgctgcacagaatccttggtgaacaggcgaccatgct tttcagctctggaagtcgatgaaacatacgttcccaaagagtttaatgctgaaacattc accttccatgcagatatatgcacactttctgagaaggagagacaaatcaagaaacaaac tgcacttgttgagctcgtgaaacacaagcccaaggcaacaaaagagcaactgaaagctg ttatggatgatttcgctgcttttgtagagaagtgctgcaaggctgacgataaggagacc tgctttgccgaggagggtaaaaaacttgttgctgcaagtcaagctgccttaggcttata a

Amino acid sequence from Expasy D indicates start of mature protein.

MKWVTFISLL FLFSSAYSRG VFRRDAHKSE VAHRFKDLGE ENFKALVLIA FAQYLQQCPF EDHVKLVNEV TEFAKTCVAD ESAENCDKSL HTLFGDKLCT VATLRETYGE MADCCAKQEP GRNECFLQHK DDNPNLPRLV RPEVDVMCTA FHDNEETFLK KYLYEIARRH PYFYAPELLF FAKRYKAAFT ECCQAADKAA CLLPKLDELR DEGKASSAKQ RLKCASLQKF GERAFKAWAV ARLSQRFPKA EFAEVSKLVT DLTKVHTECC HGDLLECADD RADLAKYICE NQDSISSKLK ECCEKPLLEK SHCIAEVEND EMPADLPSLA ADFVESKDVC KNYAEAKDVF LGMFLYEYAR RHPDYSVVLL LRLAKTYETT LEKCCAAADP HECYAKVFDE FKPLVEEPQN LIKQNCELFE QLGEYKFQNA LLVRYTKKVP EVSTPTLVEV SRNLGKVGSK CCKHPEAKRM PCAEDYLSVV LNQLCVLHEK TPVSDRVTKC CTESLVNRRP CFSALEVDET YVPKEFNAET FTFHADICTL SEKERQIKKQ TALVELVKHK PKATKEQLKA VMDDFAAFVE KCCKADDKET CFAEEGKKLV AASQAALGL http://web.expasy.org/cgi-bin/translate/dna_sequences?/work/expasy/tmp/http/seqdna.2552,1,1

SEQUENCE FOR MATURE PROTEIN also from Expasy Translated from the 72 base pair shorter sequence, compare to the cDNA sequence.

DAHKSEVAHR FKDLGEENFK ALVLIAFAQY LQQCPFEDHV KLVNEVTEFA KTCVADESAE NCDKSLHTLF GDKLCTVATL RETYGEMADC CAKQEPGRNE CFLQHKDDNP NLPRLVRPEV DVMCTAFHDN EETFLKKYLY EIARRHPYFY APELLFFAKR YKAAFTECCQ AADKAACLLP KLDELRDEGK ASSAKQRLKC ASLQKFGERA FKAWAVARLS QRFPKAEFAE VSKLVTDLTK VHTECCHGDL LECADDRADL AKYICENQDS ISSKLKECCE KPLLEKSHCI AEVENDEMPA DLPSLAADFV ESKDVCKNYA EAKDVFLGMF LYEYARRHPD YSVVLLLRLA KTYETTLEKC CAAADPHECY AKVFDEFKPL VEEPQNLIKQ NCELFEQLGE YKFQNALLVR YTKKVPEVST PTLVEVSRNL GKVGSKCCKH PEAKRMPCAE DYLSVVLNQL CVLHEKTPVS DRVTKCCTES LVNRRPCFSA LEVDETYVPK EFNAETFTFH ADICTLSEKE RQIKKQTALV ELVKHKPKAT KEQLKAVMDD FAAFVEKCCK ADDKETCFAE EGKKLVAASQ AALGL

http://web.expasy.org/cgi-bin/translate/dna_sequences?/work/expasy/tmp/http/seqdna.20223,2,1

Uniprot ID and sequence ID: P02768 DAHKSEVAHRFKDLGEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAKTCVADESAE NCDKSLHTLFGDKLCTVATLRETYGEMADCCAKQEPERNECFLQHKDDNPNLPRLVRPEV DVMCTAFHDNEETFLKKYLYEIARRHPYFYAPELLFFAKRYKAAFTECCQAADKAACLLP KLDELRDEGKASSAKQRLKCASLQKFGERAFKAWAVARLSQRFPKAEFAEVSKLVTDLTK VHTECCHGDLLECADDRADLAKYICENQDSISSKLKECCEKPLLEKSHCIAEVENDEMPA DLPSLAADFVESKDVCKNYAEAKDVFLGMFLYEYARRHPDYSVVLLLRLAKTYETTLEKC CAAADPHECYAKVFDEFKPLVEEPQNLIKQNCELFEQLGEYKFQNALLVRYTKKVPQVST PTLVEVSRNLGKVGSKCCKHPEAKRMPCAEDYLSVVLNQLCVLHEKTPVSDRVTKCCTES LVNRRPCFSALEVDETYVPKEFNAETFTFHADICTLSEKERQIKKQTALVELVKHKPKAT KEQLKAVMDDFAAFVEKCCKADDKETCFAEEGKKLVAASQAALGL