core

The core package contains all functions, variables, and classes needed to train a deep neural network model and inspect its activations.

Load Protein Solubility Data

The example dataset is from the DeepSol paper by Khurana et al. and was obtained at https://zenodo.org/records/1162886.

train_sqs = open('sol_data/train_src', 'r').read().splitlines()
train_tgs = list(map(int, open('sol_data/train_tgt', 'r').read().splitlines()))
train_sqs[:2], train_tgs[:2]

(['GMILKTNLFGHTYQFKSITDVLAKANEEKSGDRLAGVAAESAEERVAAKVVLSKMTLGDLRNNPVVPYETDEVTRIIQDQVNDRIHDSIKNWTVEELREWILDHKTTDADIKRVARGLTSEIIAAVTKLMSNLDLIYGAKKIRVIAHANTTIGLPGTFSARLQPNHPTDDPDGILASLMEGLTYGIGDAVIGLNPVDDSTDSVVRLLNKFEEFRSKWDVPTQTCVLAHVKTQMEAMRRGAPTGLVFQSIAGSEKGNTAFGFDGATIEEARQLALQSGAATGPNVMYFETGQGSELSSDAHFGVDQVTMEARCYGFAKKFDPFLVNTVVGFIGPEYLYDSKQVIRAGLEDHFMGKLTGISMGCDVCYTNHMKADQNDVENLSVLLTAAGCNFIMGIPHGDDVMLNYQTTGYHETATLRELFGLKPIKEFDQWMEKMGFSENGKLTSRAGDASIFLK',
  'MAHHHHHHMSFFRMKRRLNFVVKRGIEELWENSFLDNNVDMKKIEYSKTGDAWPCVLLRKKSFEDLHKLYYICLKEKNKLLGEQYFHLQNSTKMLQHGRLKKVKLTMKRILTVLSRRAIHDQCLRAKDMLKKQEEREFYEIQKFKLNEQLLCLKHKMNILKKYNSFSLEQISLTFSIKKIENKIQQIDIILNPLRKETMYLLIPHFKYQRKYSDLPGFISWKKQNIIALRNNMSKLHRLY'],
 [1, 0])

valid_sqs = open('sol_data/val_src', 'r').read().splitlines()
valid_tgs = list(map(int, open('sol_data/val_tgt', 'r').read().splitlines()))
valid_sqs[:2], valid_tgs[:2]

(['SRLYRHNLMEDVFNMENESFMQETRLMENEYSVNLPTRFYYKKRWNNGFVNIVNIFRACMVIGTPGSGKSYAIVNSYIRQLIAKGFAIYIYDYKFDDLSTIAYNSLLKNMDKYEVKPRFYVINFDDPRRSHRCNPINPEFMTDISDAYEASYTIMLNLNRTWIEKQGDFFVESPIILLAAIIWYLKIYKNGIYCTFPHAVELLNKPYSDLFTILTSYPELENYLSPFMDAWKGNAQDQLQGQIASAKIPLTRMISPQLYWVMTGNDFSLDINNPKEPKLLCVGNNPDRQNIYSAALGLYNSRIVKLINKKKQLKCAVIIDELPTIYFRGLDNLIATARSNKVGVLLGFQDFSQLTRDYGEKESKVIQNTVGNIFSGQVVGETAKTLSERFGKVLQQRQSVSINRQDVSTSINTQLDSLIPASKIANLSQGTFVGAVADNFDERIEQKIFHAEIVVDHTKISAEEKAYQKIPVINDFKDRNGNDIMMQQIQRNYDQIKADAQAIINEEMRRIKNDPELRKRLGLEDEKGKDPDKS',
  'ATTYNAVVSKSSSDGKTFKTIADAIASAPAGSTPFVILIKNGVYNERLTITRNNLHLKGESRNGAVIAAATAAGTLKSDGSKWGTAGSSTITISAKDFSAQSLTIRNDFDFPANQAKSDSDSSKIKDTQAVALYVTKSGDRAYFKDVSLVGYQATLYVSGGRSFFSDCRISGTVDFIFGDGTALFNNCDLVSRYRADVKSGNVSGYLTAPSTNINQKYGLVITNSRVIRESDSVPAKSYGLGRPWHPTTTFSDGRYADPNAIGQTVFLNTSMDNHIYGWDKMSGKDKNGNTIWFNPEDSRFFEYKSYGAGATVSKDRRQLTDAQAAEYTQSKVLGDWTPTLP'],
 [0, 1])

test_sqs = open('sol_data/test_src', 'r').read().splitlines()
test_tgs = list(map(int, open('sol_data/test_tgt', 'r').read().splitlines()))
test_sqs[:2], test_tgs[:2]

(['MLSVRIAAAVARALPRRAGLVSKNALGSSFVGTRNLHASNTRLQKTGTAEMSSILEERILGADTSVDLEETGRVLSIGDGIARVHGLRNVQAEEMVEFSSGLKGMSLNLEPDNVGVVVFGNDKLIKEGDIVKRTGAIVDVPVGDELLGRVVDALGNAIDGKGPVGSKIRRRVGLKAPGIIPRISVREPMQTGIKAVDSLVPIGRGQRELIIGDRQTGKTSIAIDTIINQKRFNDGTDEKKKLYCIYVAIGQKRSTVAQLVKRLTDADAMKYTIVVSATASDAAPLQYLAPYSGCSMGEYFRDNGKHALIIYDDLSKQAVAYRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKMNDSFGGGSLTALPVIETQAGDVSAYIPTNVISITDGQIFLETELFYKGIRPAINVGLSVSRVGSAAQTRAMKQVAGTMKLELAQYREVAAFAQFGSDLDAATQQLLSRGVRLTELLKQGQYSPMAIEEQVAVIYAGVRGYLDKLEPSKITKFESAFLSHVVSQHQSLLGNIRSDGKISEQSDAKLKEIVTNFLAGFEP',
  'MDHMISENGETSAEGSICGYDSLHQLLSANLKPELYQEVNRLLLGRNCGRSLEQIVLPESAKALSSKHDFDLQAASFSADKEQMRNPRVVRVGLIQNSIALPTTAPFSDQTRGIFDKLKPIIDAAGVAGVNILCLQEAWTMPFAFCTRERRWCEFAEPVDGESTKFLQELAKKYNMVIVSPILERDIDHGEVLWNTAVIIGNNGNIIGKHRKNHIPRVGDFNESTYYMEGDTGHPVFETVFGKIAVNICYGRHHPLNWLAFGLNGAEIVFNPSATVGELSEPMWPIEARNAAIANSYFVGSINRVGTEVFPNPFTSGDGKPQHNDFGHFYGSSHFSAPDASCTPSLSRYKDGLLISDMDLNLCRQYKDKWGFRMTARYEVYADLLAKYIKPDFKPQVVSDPLLHKNST'],
 [1, 1])

len(train_sqs), len(train_tgs), len(valid_sqs), len(valid_tgs), len(test_sqs), len(test_tgs)

(62478, 62478, 6942, 6942, 1999, 1999)

Create a sorted list of amino acid sequences aas including an empty string for padding and determine the size of the vocabulary.

aas = sorted(list(set("".join(train_sqs))) + [""])
vocab_size = len(aas)
aas, vocab_size

(['',
  'A',
  'C',
  'D',
  'E',
  'F',
  'G',
  'H',
  'I',
  'K',
  'L',
  'M',
  'N',
  'P',
  'Q',
  'R',
  'S',
  'T',
  'V',
  'W',
  'Y'],
 21)

Create dictionaries that translate between string and integer representations of amino acids and define the corresponding encode and decode functions.

str2int = {aa:i for i, aa in enumerate(aas)}
int2str = {i:aa for i, aa in enumerate(aas)}
encode = lambda s: [str2int[aa] for aa in s]
decode = lambda l: ''.join([int2str[i] for i in l])

print(encode("AYWCCCGGGHH"))
print(decode(encode("AYWCCCGGGHH")))

[1, 20, 19, 2, 2, 2, 6, 6, 6, 7, 7]
AYWCCCGGGHH

Figure out what the lengths of amino acid sequences in the dataset are and inspect the longest sequence.

train_lens = list(map(len, train_sqs))
max(train_lens)

longest = train_sqs[np.argmax(train_lens)]
longest

'MSGEVRLRQLEQFILDGPAQTNGQCFSVETLLDILICLYDECNNSPLRREKNILEYLEWAKPFTSKVKQMRLHREDFEILKVIGRGAFGEVAVVKLKNADKVFAMKILNKWEMLKRAETACFREERDVLVNGDNKWITTLHYAFQDDNNLYLVMDYYVGGDLLTLLSKFEDRLPEDMARFYLAEMVIAIDSVHQLHYVHRDIKPDNILMDMNGHIRLADFGSCLKLMEDGTVQSSVAVGTPDYISPEILQAMEDGKGRYGPECDWWSLGVCMYEMLYGETPFYAESLVETYGKIMNHKERFQFPAQVTDVSENAKDLIRRLICSREHRLGQNGIEDFKKHPFFSGIDWDNIRNCEAPYIPEVSSPTDTSNFDVDDDCLKNSETMPPPTHTAFSGHHLPFVGFTYTSSCVLSDRSCLRVTAGPTSLDLDVNVQRTLDNNLATEAYERRIKRLEQEKLELSRKLQESTQTVQALQYSTVDGPLTASKDLEIKNLKEEIEKLRKQVTESSHLEQQLEEANAVRQELDDAFRQIKAYEKQIKTLQQEREDLNKELVQASERLKNQSKELKDAHCQRKLAMQEFMEINERLTELHTQKQKLARHVRDKEEEVDLVMQKVESLRQELRRTERAKKELEVHTEALAAEASKDRKLREQSEHYSKQLENELEGLKQKQISYSPGVCSIEHQQEITKLKTDLEKKSIFYEEELSKREGIHANEIKNLKKELHDSEGQQLALNKEIMILKDKLEKTRRESQSEREEFESEFKQQYEREKVLLTEENKKLTSELDKLTTLYENLSIHNQQLEEEVKDLADKKESVAHWEAQITEIIQWVSDEKDARGYLQALASKMTEELEALRNSSLGTRATDMPWKMRRFAKLDMSARLELQSALDAEIRAKQAIQEELNKVKASNIITECKLKDSEKKNLELLSEIEQLIKDTEELRSEKGIEHQDSQHSFLAFLNTPTDALDQFERKTHQFFVKSFTTPTKCHQCTSLMVGLIRQGCSCEVCGFSCHITCVNKAPTTCPVPPEQTKGPLGIDPQKGIGTAYEGHVRIPKPAGVKKGWQRALAIVCDFKLFLYDIAEGKASQPSVVISQVIDMRDEEFSVSSVLASDVIHASRKDIPCIFRVTASQLSASNNKCSILMLADTENEKNKWVGVLSELHKILKKNKFRDRSVYVPKEAYDSTLPLIKTTQAAAIIDHERIALGNEEGLFVVHVTKDEIIRVGDNKKIHQIELIPNDQLVAVISGRNRHVRLFPMSALDGRETDFYKLSETKGCQTVTSGKVRHGALTCLCVAMKRQVLCYELFQSKTRHRKFKEIQVPYNVQWMAIFSEQLCVGFQSGFLRYPLNGEGNPYSMLHSNDHTLSFIAHQPMDAICAVEISSKEYLLCFNSIGIYTDCQGRRSRQQELMWPANPSSCCYNAPYLSVYSENAVDIFDVNSMEWIQTLPLKKVRPLNNEGSLNLLGLETIRLIYFKNKMAEGDELVVPETSDNSRKQMVRNINNKRRYSFRVPEEERMQQRREMLRDPEMRNKLISNPTNFNHIAHMGPGDGIQILKDLPMNPRPQESRTVFSGSVSIPSITKSRPEPGRSMSASSGLSARSSAQNGSALKREFSGGSYSAKRQPMPSPSEGSLSSGGMDQGSDAPARDFDGEDSDSPRHSTASNSSNLSSPPSPVSPRKTKSLSLESTDRGSWDP'

Check how many sequences in the training set are longer than 1200 amino acids.

long_sqs = []
for sq in train_sqs:
    if len(sq) > 1200:
        long_sqs.append(sq)
len(long_sqs)

Create a function that drops all sequences above a chosen threshold and also returns a list of indices of the sequences that meet the threshold that can be used to obtain the correct labels.

def drop_long_sqs(sqs, threshold=1200):
    new_sqs = []
    idx = []
    for i, sq in enumerate(sqs):
        if len(sq) <= threshold:
            new_sqs.append(sq)
            idx.append(i)
    return new_sqs, idx

Drop all sequences above your threshold.

trnsqs, trnidx = drop_long_sqs(train_sqs, threshold=200)
vldsqs, vldidx = drop_long_sqs(valid_sqs, threshold=200)
tstsqs, tstidx = drop_long_sqs(test_sqs, threshold=200)

len(trnidx), len(vldidx), len(tstidx)

(18066, 1971, 699)

Make sure that it worked.

trnls = map(len, trnsqs)
vldls = map(len, vldsqs)
tstls = map(len, tstsqs)
max(trnls), max(vldls), max(tstls)

(200, 200, 200)

Create a function for zero padding all sequences.

def zero_pad(sq, length=1200):
    new_sq = sq.copy()
    if len(new_sq) < length:
        new_sq.extend([0] * (length-len(new_sq)))
    return new_sq

Now encode and zero pad all sequences and make sure that it worked out correctly.

trn = list(map(encode, trnsqs))
vld = list(map(encode, vldsqs))
tst = list(map(encode, tstsqs))
print(f"Length of the first two sequences before zero padding: {len(trn[0])}, {len(trn[1])}")
trn = list(map(partial(zero_pad, length=200), trn))
vld = list(map(partial(zero_pad, length=200), vld))
tst = list(map(partial(zero_pad, length=200), tst))
print(f"Length of the first two sequences after zero padding:  {len(trn[0])}, {len(trn[1])}");

Length of the first two sequences before zero padding: 116, 135
Length of the first two sequences after zero padding:  200, 200

Convert the data to torch.tensors unsing dtype=torch.int64 and check for correctness.

trntns = torch.tensor(trn, dtype=torch.int64)
vldtns = torch.tensor(vld, dtype=torch.int64)
tsttns = torch.tensor(tst, dtype=torch.int64)
trntns.shape, trntns[0]

(torch.Size([18066, 200]),
 tensor([11,  9,  1, 10,  2, 10, 10, 10, 10, 13, 18, 10,  6, 10, 10, 18, 16, 16,  9, 17, 10,  2, 16, 11,  4,  4,  1,  8, 12,  4, 15,  8, 14,
          4, 18,  1,  6, 16, 10,  8,  5, 15,  1,  8, 16, 16,  8,  6, 10,  4,  2, 14, 16, 18, 17, 16, 15,  6,  3, 10,  1, 17,  2, 13, 15,  6,
          5,  1, 18, 17,  6,  2, 17,  2,  6, 16,  1,  2,  6, 16, 19,  3, 18, 15,  1,  4, 17, 17,  2,  7,  2, 14,  2,  1,  6, 11,  3, 19, 17,
          6,  1, 15,  2,  2, 15, 18, 14, 13, 10,  4,  7,  7,  7,  7,  7,  7,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
          0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
          0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
          0,  0]))

trntns.shape, vldtns.shape, tsttns.shape

(torch.Size([18066, 200]), torch.Size([1971, 200]), torch.Size([699, 200]))

Obtain the correct labels using the lists of indices obtained from the drop_long_sqs function and convert the lists of labels to tensors in torch.float32 format.

trnlbs = torch.tensor(train_tgs, dtype=torch.float32)[trnidx]
vldlbs = torch.tensor(valid_tgs, dtype=torch.float32)[vldidx]
tstlbs = torch.tensor(test_tgs, dtype=torch.float32)[tstidx]
trnlbs.shape, vldlbs.shape, tstlbs.shape

(torch.Size([18066]), torch.Size([1971]), torch.Size([699]))

trnlbs.sum().item()/trnlbs.shape[0], vldlbs.sum().item()/vldlbs.shape[0], tstlbs.sum().item()/tstlbs.shape[0]

(0.4722129967895494, 0.4657534246575342, 0.5665236051502146)

Above ratios tell us that there are slightly less than half soluble proteins in the training an validation data, and slightly more than half in the test set.

Dataset and DataLoaders

Create a Dataset class and combine tokens and labels into datasets.

	Type	Default	Details
sym			Symbol to document
renderer	NoneType	None	Optional renderer (defaults to markdown)
name	str \| None	None	Optionally override displayed name of `sym`
title_level	int	3	Heading level to use for symbol name

	Type	Default	Details
im
ax	NoneType	None
figsize	NoneType	None
title	NoneType	None
noframe	bool	True
cmap	NoneType	None	The Colormap instance or registered colormap name used to map scalar data to colors. This parameter is ignored if X is RGB(A).
norm	NoneType	None	The normalization method used to scale scalar data to the [0, 1] range before mapping to colors using cmap. By default, a linear scaling is used, mapping the lowest value to 0 and the highest to 1. If given, this can be one of the following: - An instance of `.Normalize` or one of its subclasses (see :ref:`colormapnorms`). - A scale name, i.e. one of “linear”, “log”, “symlog”, “logit”, etc. For a list of available scales, call `matplotlib.scale.get_scale_names()`. In that case, a suitable `.Normalize` subclass is dynamically generated and instantiated. This parameter is ignored if X is RGB(A).
aspect	NoneType	None	The aspect ratio of the Axes. This parameter is particularly relevant for images since it determines whether data pixels are square. This parameter is a shortcut for explicitly calling `.Axes.set_aspect`. See there for further details. - ‘equal’: Ensures an aspect ratio of 1. Pixels will be square (unless pixel sizes are explicitly made non-square in data coordinates using extent). - ‘auto’: The Axes is kept fixed and the aspect is adjusted so that the data fit in the Axes. In general, this will result in non-square pixels. Normally, None (the default) means to use :rc:`image.aspect`. However, if the image uses a transform that does not contain the axes data transform, then None means to not modify the axes aspect at all (in that case, directly call `.Axes.set_aspect` if desired).
interpolation	NoneType	None	The interpolation method used. Supported values are ‘none’, ‘antialiased’, ‘nearest’, ‘bilinear’, ‘bicubic’, ‘spline16’, ‘spline36’, ‘hanning’, ‘hamming’, ‘hermite’, ‘kaiser’, ‘quadric’, ‘catrom’, ‘gaussian’, ‘bessel’, ‘mitchell’, ‘sinc’, ‘lanczos’, ‘blackman’. The data X is resampled to the pixel size of the image on the figure canvas, using the interpolation method to either up- or downsample the data. If interpolation is ‘none’, then for the ps, pdf, and svg backends no down- or upsampling occurs, and the image data is passed to the backend as a native image. Note that different ps, pdf, and svg viewers may display these raw pixels differently. On other backends, ‘none’ is the same as ‘nearest’. If interpolation is the default ‘antialiased’, then ‘nearest’ interpolation is used if the image is upsampled by more than a factor of three (i.e. the number of display pixels is at least three times the size of the data array). If the upsampling rate is smaller than 3, or the image is downsampled, then ‘hanning’ interpolation is used to act as an anti-aliasing filter, unless the image happens to be upsampled by exactly a factor of two or one. See :doc:`/gallery/images_contours_and_fields/interpolation_methods` for an overview of the supported interpolation methods, and :doc:`/gallery/images_contours_and_fields/image_antialiasing` for a discussion of image antialiasing. Some interpolation methods require an additional radius parameter, which can be set by filterrad. Additionally, the antigrain image resize filter is controlled by the parameter filternorm.
alpha	NoneType	None	The alpha blending value, between 0 (transparent) and 1 (opaque). If alpha is an array, the alpha blending values are applied pixel by pixel, and alpha must have the same shape as X.
vmin	NoneType	None
vmax	NoneType	None
origin	NoneType	None	Place the [0, 0] index of the array in the upper left or lower left corner of the Axes. The convention (the default) ‘upper’ is typically used for matrices and images. Note that the vertical axis points upward for ‘lower’ but downward for ‘upper’. See the :ref:`imshow_extent` tutorial for examples and a more detailed description.
extent	NoneType	None	The bounding box in data coordinates that the image will fill. These values may be unitful and match the units of the Axes. The image is stretched individually along x and y to fill the box. The default extent is determined by the following conditions. Pixels have unit size in data coordinates. Their centers are on integer coordinates, and their center coordinates range from 0 to columns-1 horizontally and from 0 to rows-1 vertically. Note that the direction of the vertical axis and thus the default values for top and bottom depend on origin: - For `origin == 'upper'` the default is `(-0.5, numcols-0.5, numrows-0.5, -0.5)`. - For `origin == 'lower'` the default is `(-0.5, numcols-0.5, -0.5, numrows-0.5)`. See the :ref:`imshow_extent` tutorial for examples and a more detailed description.
interpolation_stage	NoneType	None	If ‘data’, interpolation is carried out on the data provided by the user. If ‘rgba’, the interpolation is carried out after the colormapping has been applied (visual interpolation).
filternorm	bool	True	A parameter for the antigrain image resize filter (see the antigrain documentation). If filternorm is set, the filter normalizes integer values and corrects the rounding errors. It doesn’t do anything with the source floating point values, it corrects only integers according to the rule of 1.0 which means that any sum of pixel weights must be equal to 1.0. So, the filter function must produce a graph of the proper shape.
filterrad	float	4.0	The filter radius for filters that have a radius parameter, i.e. when interpolation is one of: ‘sinc’, ‘lanczos’ or ‘blackman’.
resample	NoneType	None	When True, use a full resampling method. When False, only resample when the output image is larger than the input image.
url	NoneType	None	Set the url of the created `.AxesImage`. See `.Artist.set_url`.
data	NoneType	None

	Type	Default	Details
nrows	int	1	Number of rows in returned axes grid
ncols	int	1	Number of columns in retruned axes grid
figsize	NoneType	None	Width, height in inches of the returned figure
imsize	int	3	Size (in inches) of images that will be displayed in the returned figure
suptitle	NoneType	None	Title to be set to returned figure
sharex	bool \| Literal[‘none’, ‘all’, ‘row’, ‘col’]	False
sharey	bool \| Literal[‘none’, ‘all’, ‘row’, ‘col’]	False
squeeze	bool	True
width_ratios	Sequence[float] \| None	None
height_ratios	Sequence[float] \| None	None
subplot_kw	dict[str, Any] \| None	None
gridspec_kw	dict[str, Any] \| None	None
kwargs

	Type	Default	Details
n			Number of axes
nrows	NoneType	None	Number of rows, defaulting to `int(math.sqrt(n))`
ncols	NoneType	None	Number of columns, defaulting to `ceil(n/rows)`
title	NoneType	None	If passed, title set to the figure
weight	str	bold	Title font weight
size	int	14	Title font size
figsize	NoneType	None	Width, height in inches of the returned figure
imsize	int	3	Size (in inches) of images that will be displayed in the returned figure
suptitle	NoneType	None	Title to be set to returned figure
sharex	bool \| Literal[‘none’, ‘all’, ‘row’, ‘col’]	False
sharey	bool \| Literal[‘none’, ‘all’, ‘row’, ‘col’]	False
squeeze	bool	True
width_ratios	Sequence[float] \| None	None
height_ratios	Sequence[float] \| None	None
subplot_kw	dict[str, Any] \| None	None
gridspec_kw	dict[str, Any] \| None	None

Load Protein Solubility Data

Dataset and DataLoaders

Dataset

DataLoaders

get_dls

Learner Framework and Callbacks

to_device

to_cpu

CancelEpochException

CancelBatchException

CancelFitException

Callback

with_cbs

run_cbs

Learner

TrainLearner

DeviceCB

SingleBatchCB

TrainCB

RecorderCB

EpochSchedCB

BatchSchedCB

BaseSchedCB

MetricsCB

ProgressCB

LRFinderCB

show_doc

Functions for Convenient Plotting of Images

show_image

subplots

get_grid

show_images

Activation Statistics using Hooks

append_stats

Hook

Hooks

HooksCallback

append_stats

get_hist

get_min

ActivationStats

Functions for Convenient Memory Management

clean_ipython_hist

clean_tb

clean_mem

Weight Initialization and General Relu

init_weights

GeneralRelu

Training a Model

Tiny Resnet

ResBlock1d

conv1d

Reshape

Transformer Model with Skip Connections and LayerNorm

Head

FeedForward

MultiHeadAttention

Block

TransformerModel