变压器

! [ -e /content ] && pip install -Uqq fastai  # 在Colab上升级fastai

一个如何将HuggingFace的transformers库与fastai结合使用的示例

在本教程中,我们将看到如何使用fastai库对HuggingFace的transformers库中的预训练变换模型进行微调。我们将使用中级API来收集数据。尽管本教程是自包含的,但查看imagenette教程可能会有所帮助,以便进一步了解计算机视觉中的中级API(使用更高级API的温和介绍)。

导入预训练的变换器模型

首先,我们需要安装transformers库。如果您还没有安装,请运行以下命令:

!pip install -Uq transformers

然后让我们导入需要的内容:我们将对预训练的GPT2模型进行微调,并在wikitext-2数据集上进行微调。为此,我们需要GPT2LMHeadModel(因为我们需要一个语言模型)和GPT2Tokenizer来准备数据。

from transformers import GPT2LMHeadModel, GPT2TokenizerFast

我们可以使用几个版本的这个GPT2模型,详细信息请查看transformers文档。在这里,我们将使用基本版本(它已经占用了大量内存空间!)您可以通过更改pretrained_weights的内容来更改使用的模型(如果不是GPT2模型,您当然需要更改模型和分词器使用的类)。

pretrained_weights = 'gpt2'
tokenizer = GPT2TokenizerFast.from_pretrained(pretrained_weights)
model = GPT2LMHeadModel.from_pretrained(pretrained_weights)

在我们进入微调部分之前,先来看一下这个 tokenizer 和这个 model。HuggingFace 中的分词器通常在一步中完成分词和数字化(我们暂时忽略填充警告):

ids = tokenizer.encode('This is an example of text, and')
ids
[1212, 318, 281, 1672, 286, 2420, 11, 290]

像fastai的Transform一样,tokenizer有一个decode方法,可以根据id返回文本:

tokenizer.decode(ids)
'This is an example of text, and'

模型可以用于生成预测(它是预训练的)。它有一个 generate 方法,期望输入一批提示,因此我们将 ID 传递给它并添加一个批处理维度(还有一个可以忽略的填充警告):

import torch
t = torch.LongTensor(ids)[None]
preds = model.generate(t)
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

默认情况下,预测的长度为20:

preds.shape,preds[0]
(torch.Size([1, 20]),
 tensor([1212,  318,  281, 1672,  286, 2420,   11,  290,  340,  338,  407,  257,
          922,  530,   13,  198,  198,  464,  717, 1517]))

我们可以使用decode方法(它更喜欢numpy数组而不是张量):

tokenizer.decode(preds[0].numpy())
"This is an example of text, and it's not a good one.\n\nThe first thing"

用fastai弥合差距

现在让我们看看如何使用fastai在wikitext-2上微调这个模型,使用所有训练工具(学习率查找器、1cycle策略等……)。首先,我们导入所有文本工具:

from fastai.text.all import *

准备数据

然后我们下载数据集(如果尚未下载),数据集包含两个csv文件:

path = untar_data(URLs.WIKITEXT_TINY)
path.ls()
(#2) [Path('/home/jhoward/.fastai/data/wikitext-2/test.csv'),Path('/home/jhoward/.fastai/data/wikitext-2/train.csv')]

让我们看看那些csv文件的样子:

df_train = pd.read_csv(path/'train.csv', header=None)
df_valid = pd.read_csv(path/'test.csv', header=None)
df_train.head()
0
0 \n = 2013 – 14 York City F.C. season = \n \n The 2013 – 14 season was the <unk> season of competitive association football and 77th season in the Football League played by York City Football Club , a professional football club based in York , North Yorkshire , England . Their 17th @-@ place finish in 2012 – 13 meant it was their second consecutive season in League Two . The season ran from 1 July 2013 to 30 June 2014 . \n Nigel Worthington , starting his first full season as York manager , made eight permanent summer signings . By the turn of the year York were only above the relegation z...
1 \n = Big Boy ( song ) = \n \n " Big Boy " <unk> " I 'm A Big Boy Now " was the first single ever recorded by the Jackson 5 , which was released by Steeltown Records in January 1968 . The group played instruments on many of their Steeltown compositions , including " Big Boy " . The song was neither a critical nor commercial success , but the Jackson family were delighted with the outcome nonetheless . \n The Jackson 5 would release a second single with Steeltown Records before moving to Motown Records . The group 's recordings at Steeltown Records were thought to be lost , but they were re...
2 \n = The Remix ( Lady Gaga album ) = \n \n The Remix is a remix album by American recording artist Lady Gaga . Released in Japan on March 3 , 2010 , it contains remixes of the songs from her first studio album , The Fame ( 2008 ) , and her third extended play , The Fame Monster ( 2009 ) . A revised version of the track list was prepared for release in additional markets , beginning with Mexico on May 3 , 2010 . A number of recording artists have produced the songs , including Pet Shop Boys , Passion Pit and The Sound of Arrows . The remixed versions feature both uptempo and <unk> composit...
3 \n = New Year 's Eve ( Up All Night ) = \n \n " New Year 's Eve " is the twelfth episode of the first season of the American comedy television series Up All Night . The episode originally aired on NBC in the United States on January 12 , 2012 . It was written by Erica <unk> and was directed by Beth McCarthy @-@ Miller . The episode also featured a guest appearance from Jason Lee as Chris and Reagan 's neighbor and Ava 's boyfriend , Kevin . \n During Reagan ( Christina Applegate ) and Chris 's ( Will <unk> ) first New Year 's Eve game night , Reagan 's competitiveness comes out causing Ch...
4 \n = Geopyxis carbonaria = \n \n Geopyxis carbonaria is a species of fungus in the genus Geopyxis , family <unk> . First described to science in 1805 , and given its current name in 1889 , the species is commonly known as the charcoal loving elf @-@ cup , dwarf <unk> cup , <unk> <unk> cup , or pixie cup . The small , <unk> @-@ shaped fruitbodies of the fungus are reddish @-@ brown with a whitish fringe and measure up to 2 cm ( 0 @.@ 8 in ) across . They have a short , tapered stalk . Fruitbodies are commonly found on soil where brush has recently been burned , sometimes in great numbers ....

我们将所有文本聚集在一个numpy数组中(因为这样与fastai一起使用会更方便):

all_texts = np.concatenate([df_train[0].values, df_valid[0].values])

为了处理这些数据以训练模型,我们需要构建一个将被懒惰应用的 Transform。在这种情况下,我们可以一次性完成预处理,仅在解码时使用该变换(稍后我们将看到如何做到这一点),不过HuggingFace的快速分词器如其名所示,运行速度很快,因此以这种方式处理不会对性能产生真正影响。

在fastai的 Transform 中,您可以定义:

  • 一个 encodes 方法,该方法在您调用变换时应用(有点类似于 nn.Module 中的 forward 方法)
  • 一个 decodes 方法,该方法在您调用变换的 decode 方法时应用,如果您需要解码以进行显示(比如这里将id转换为文本)
  • 一个 setups 方法,该方法设置 Transform 的一些内部状态(这里不需要,所以我们跳过它)
class TransformersTokenizer(Transform):
    def __init__(self, tokenizer): self.tokenizer = tokenizer
    def encodes(self, x): 
        toks = self.tokenizer.tokenize(x)
        return tensor(self.tokenizer.convert_tokens_to_ids(toks))
    def decodes(self, x): return TitledStr(self.tokenizer.decode(x.cpu().numpy()))

对上述代码的两个注释:

  • encodes中,我们不使用tokenizer.encode方法,因为它在进行标记化和数值化后会对模型进行一些额外的预处理(之前抛出警告的部分)。在这里我们不需要任何后处理,因此可以跳过它。
  • decodes中,我们返回一个TitledStr对象,而不仅仅是一个普通字符串。那是一个fastai类,它为字符串添加了一个show方法,这将允许我们使用所有fastai的显示方法。

您可以使用这个 Transform 通过 TfmdLists 对数据进行分组。它的名称中有一个 s,因为它包含训练集和验证集。我们通过 splits 指明训练集和验证集的索引(这里是所有第一部分的索引直到 len(df_train),然后是所有剩余的索引):

splits = [range_of(df_train), list(range(len(df_train), len(all_texts)))]
tls = TfmdLists(all_texts, TransformersTokenizer(tokenizer), splits=splits, dl_type=LMDataLoader)
Token indices sequence length is longer than the specified maximum sequence length for this model (4576 > 1024). Running this sequence through the model will result in indexing errors

我们指定 dl_type=LMDataLoader 用于将这个 TfmdLists 转换为 DataLoaders:我们将使用 LMDataLoader,因为我们面临的是一个语言建模问题,而不是通常的 fastai TfmdDL

TfmdLists 中,您可以很容易地访问训练集或验证集的元素:

tls.train[0],tls.valid[0]
(tensor([220, 198, 796,  ..., 198, 220, 198]),
 tensor([220, 198, 796,  ..., 198, 220, 198]))

它们看起来是一样的,但只是因为它们的开始和结束方式相同。我们可以看到它们的形状是不同的:

tls.tfms(tls.train.items[0]).shape, tls.tfms(tls.valid.items[0]).shape
(torch.Size([4576]), torch.Size([1485]))

我们可以使用 show_at 来查看两种解码结果:

show_at(tls.train, 0)
 
 = 2013 – 14 York City F.C. season = 
 
 The 2013 – 14 season was the <unk> season of competitive association football and 77th season in the Football League played by York City Football Club, a professional football club based in York, North Yorkshire, England. Their 17th @-@ place finish in 2012 – 13 meant it was their second consecutive season in League Two. The season ran from 1 July 2013 to 30 June 2014. 
 Nigel Worthington, starting his first full season as York manager, made eight permanent summer signings. By the turn of the year York were only above the relegation zone on goal difference, before a 17 @-@ match unbeaten run saw the team finish in seventh @-@ place in the 24 @-@ team 2013 – 14 Football League Two. This meant York qualified for the play @-@ offs, and they were eliminated in the semi @-@ final by Fleetwood Town. York were knocked out of the 2013 – 14 FA Cup, Football League Cup and Football League Trophy in their opening round matches. 
 35 players made at least one appearance in nationally organised first @-@ team competition, and there were 12 different <unk>. Defender Ben Davies missed only five of the fifty @-@ two competitive matches played over the season. Wes Fletcher finished as leading scorer with 13 goals, of which 10 came in league competition and three came in the FA Cup. The winner of the <unk> of the Year award, voted for by the club's supporters, was <unk> Oyebanjo. 
 
 = = Background and pre @-@ season = = 
 
 The 2012 – 13 season was York City's first season back in the Football League, having won the Conference Premier play @-@ offs in 2011 – 12 after <unk> years in the Football Conference. Manager Gary Mills was sacked in March 2013 following an 11 @-@ match run without a victory, and was replaced by former Northern Ireland manager Nigel Worthington. Despite being in the relegation zone with three matches remaining, Worthington led the team to safety from relegation after a 1 – 0 win away to Dagenham & Redbridge on the final day of the season. York finished the season in 17th @-@ place in the 2012 – 13 League Two table. 
 Following the previous season's conclusion Lee <unk>, Jon <unk>, Chris <unk>, Ben Everson, Scott Kerr, David <unk>, Patrick <unk>, Michael Potts, Jamie Reed and Jason Walker were released by York, while <unk> Blair departed for Fleetwood Town. David McGurk, <unk> Oyebanjo, Danny Parslow, Tom Platt and Chris Smith signed new contracts with the club. New players signed ahead of the start of the season were goalkeeper Chris <unk> on a season @-@ long loan from Blackpool, defender Ben Davies on loan from Preston North End, midfielders Craig Clay from Chesterfield and Lewis Montrose from Gillingham, winger <unk> Puri from St <unk> and strikers Ryan Bowman from Hereford United, Richard Cresswell from Sheffield United, Wes Fletcher from Burnley and Ryan Jarvis from Torquay United. Defender Mike Atkinson and striker Chris Dickinson entered the first @-@ team squad from the youth team after agreeing professional contracts. 
 York retained the previous season's home and away kits. The home kit comprised red shirts with white sleeves, light blue shorts and white socks. The away kit included light blue shirts with white sleeves, white shorts and light blue socks. <unk> Health continued as shirt sponsors for the second successive season. 
 
 = = Review = = 
 
 
 = = = August = = = 
 
 York began the season with a 1 – 0 home win over the previous season's play @-@ off finalists, Northampton Town, with <unk> Jarvis scoring the winning goal in the 90th @-@ minute. However, defeat came in York's match against Championship side Burnley in the first round of the League Cup, going down 4 – 0 at home. The team endured their first league defeat of the season in the following game after being beaten 2 – 0 away by Dagenham & Redbridge, the home team scoring in each half. York then held Hartlepool United to a 0 – 0 home draw, before being beaten 3 – 2 away by Bristol Rovers, in which Jarvis scored twice before John @-@ Joe O 'Toole scored the winning goal for the home team in the 67th @-@ minute. Two signings were made shortly before the transfer deadline ; defender George Taft was signed on a one @-@ month loan from Leicester City, while Middlesbrough midfielder Ryan Brobbel joined on a one @-@ month loan. <unk> John <unk>, who had been told he had no future with the club, departed after signing for FC Halifax Town. Jarvis gave York the lead away at Exeter City before Alan <unk> scored in each half to see the home team win 2 – 1. 
 
 = = = September = = = 
 
 York suffered their first home league defeat of the season after AFC Wimbledon won 2 – 0, with Michael Smith scoring in each half. Former Ipswich Town midfielder Josh Carson, who had a spell on loan with York the previous season, signed a contract until the end of 2013 – 14 and Sheffield United midfielder Elliott <unk> signed on a one @-@ month loan. Brobbel opened the scoring in the second minute of his home debut against Mansfield Town, although the away team went on to score twice to win 2 – 1. York's run of four defeats ended following a 1 – 1 draw away to Wycombe Wanderers, in which McGurk gave York the lead before the home team levelled through Dean Morgan. Taft was sent back to Leicester after he fell behind McGurk, Parslow and Smith in the pecking order for a central defensive berth. York achieved their first win since the opening day of the season after beating Portsmouth 4 – 2 at home, with Fletcher ( 2 ), Montrose and Jarvis scoring. 
 
 = = = October = = = 
 
 Defender Luke O 'Neill was signed from Burnley on a 28 @-@ day emergency loan. He made his debut in York's 3 – 0 win away at Torquay, which was the team's first successive win of the season. York were knocked out of the Football League Trophy in the second round after being beaten 3 – 0 at home by League One team Rotherham United, before their winning streak in the league was ended with a 3 – 0 defeat away to Newport County. York drew 2 – 2 away to Chesterfield, having taken a two @-@ goal lead through O 'Neill and Jarvis, before the home team fought back through Armand <unk> and Jay O <unk>. The team then hosted Fleetwood Town, and the visitors won 2 – 0 with goals scored in each half by Gareth Evans and <unk> Matt. Scunthorpe United were beaten 4 – 1 at home to end York's three @-@ match run without a win, with all the team's goals coming in the first half from Carson, Fletcher and Brobbel ( 2 ). 
 
 = = = November = = = 
 
 Bowman scored his first goals for York away to Cheltenham Town, as York twice fought back from behind to draw 2 – 2. York drew 3 – 3 away to Bristol Rovers to earn a first round replay in the FA Cup, taking the lead through Jarvis before Eliot Richards equalised for the home team. Carson scored a 30 yard volley to put York back in the lead, and after Bristol Rovers goals from Matt <unk> and Chris <unk>, Fletcher scored an 86th @-@ minute equaliser for York. Bowman scored with a header from an O 'Neill cross to open the scoring at home to Plymouth Argyle, which was the first goal the visitors had conceded in 500 minutes of action. However, Plymouth equalised 11 minutes later through <unk> <unk> and the match finished a 1 – 1 draw. York were knocked out of the FA Cup after losing 3 – 2 at home to Bristol Rovers in a first round replay ; the visitors were 3 – 0 up by 50 @-@ minutes before Fletcher pulled two back for York with a penalty and a long @-@ range strike. 
 Defender Keith Lowe, of Cheltenham, and goalkeeper Nick Pope, of Charlton Athletic, were signed on loan until January 2014. They both played in York's first league defeat in four weeks, 2 – 1 away, to Southend United. <unk> <unk> gave Southend the lead early into the match and Bowman equalised for York with a low strike during the second half, before Luke Prosser scored the winning goal for the home side in stoppage time. With Pope preferred in goal, <unk> returned to Blackpool on his own accord, although his loan agreement would stay in place until January 2014. York then drew 0 – 0 away to Morecambe. After Pope was recalled from his loan by Charlton, York signed Wolverhampton Wanderers goalkeeper Aaron McCarey on loan until January 2014. McCarey kept a clean sheet in York's 0 – 0 home draw with Rochdale. 
 
 = = = December = = = 
 
 Cresswell retired from playing as a result of an eye complaint and a knee injury. York drew 1 – 1 away to Burton Albion, with an own goal scored by Shane <unk> @-@ <unk> giving York the lead in the 64th @-@ minute before the home team equalised eight minutes later through Billy <unk>. Atkinson was released after failing to force himself into the first team and signed for Scarborough Athletic, with whom he had been on loan. York drew 0 – 0 at home with second @-@ placed Oxford United, in which Carson came closest to scoring with a volley that <unk> across the face of the goal. This was followed by another draw after the match away to Accrington Stanley finished 1 – 1, with the home team <unk> 10 minutes after a Fletcher penalty had given York the lead in the 35th @-@ minute. Striker <unk> McDonald, who had been released by Peterborough United, was signed on a contract until the end of the season. York's last match of 2013 was a 2 – 1 defeat away at Bury, a result that ended York's run of consecutive draws at five. The home team were 2 – 0 up by the 19th @-@ minute, before Michael Coulson scored York's goal in the 73rd @-@ minute. This result meant York would begin 2014 in 22nd @-@ position in the table, only out of the relegation zone on goal difference. 
 
 = = = January = = = 
 
 Jarvis scored the only goal in York's first win since October 2013, a 1 – 0 home victory over Morecambe on New Year's Day. McCarey was recalled by Wolverhampton Wanderers due to an injury to one of their <unk>, while O 'Neill was recalled by Burnley to take part in their FA Cup match. York achieved back @-@ to @-@ back wins for the first time since October 2013 after Dagenham & Redbridge were beaten 3 – 1 at home, with Bowman opening the scoring in the second half before Fletcher scored twice. Adam Reed, who had a spell on loan with York in the previous season, was signed on a contract until the end of the season after parting company with Burton. Davies'loan was extended, while Brobbel and <unk> returned to their parent clubs. Cheltenham club captain Russell Penn, a midfielder, was signed on a two @-@ and @-@ a @-@ half @-@ year contract for an undisclosed fee. Lowe was subsequently signed permanently from Cheltenham on a two @-@ and @-@ a @-@ half @-@ year contract for an undisclosed fee. Having been allowed to leave the club on a free transfer, Ashley Chambers signed for Conference Premier club Cambridge United. 
 York achieved three successive wins for the first time in 2013 – 14 after beating Northampton 2 – 0 away, with Bowman and Fletcher scoring in three @-@ second half minutes. Defender John McCombe was signed on a two @-@ and @-@ a @-@ half @-@ year contract following his release from Mansfield, before Clay and Jamal <unk> left York by mutual consent. Pope returned to York on loan from Charlton for the remainder of the season. York's run of wins ended with a 0 – 0 draw at home to Bristol Rovers, before their first defeat of the year came after losing 2 – 0 away to Hartlepool. Preston winger Will Hayhurst, a Republic of Ireland under @-@ 21 international, was signed on a one @-@ month loan. York fell to a successive defeat for the first time since September 2013 after being beaten 2 – 0 at home by Chesterfield. Shortly after the match, Smith left the club by mutual consent to pursue first @-@ team football. 
 
 = = = February = = = 
 
 Fletcher scored a 90th @-@ minute winner for York away to Fleetwood in a 2 – 1 win, a result that ended Fleetwood's five @-@ match unbeaten run. York then drew 0 – 0 at home to fellow mid @-@ table team Cheltenham, before beating Plymouth 4 – 0 away with goals from Fletcher, McCombe ( 2 ) and Carson as the team achieved successive away wins for the first time in 2013 – 14. York went without scoring for a fourth consecutive home match after drawing 0 – 0 with Southend. Having worn the <unk> since an injury to McGurk, Penn was appointed captain for the rest of the season, a position that had earlier been held by Smith and Parslow. 
 
 = = = March = = = 
 
 York achieved their first home win in five matches after beating Exeter 2 – 1, with first half goals scored by McCombe and Coulson. Hayhurst's loan was extended to the end of the season, having impressed in his six appearances for the club. Coulson scored again with the only goal, a 41st @-@ minute header, in York's 1 – 0 away win over AFC Wimbledon. Bowman scored the only goal with a 32nd @-@ minute penalty as York won 1 – 0 away against Mansfield, in which Fletcher missed the opportunity to extend the lead when his stoppage time penalty was saved by Alan Marriott. York moved one place outside the play @-@ offs with a 2 – 0 home win over Wycombe, courtesy of a second Bowman penalty in as many matches and a Carson goal from the edge of the penalty area. Coulson scored York's only goal in a 1 – 0 away win over struggling Portsmouth with a low volley in the fifth @-@ minute ; this result meant York moved into the play @-@ offs in seventh @-@ place with eight fixtures remaining. 
 Striker Calvin Andrew, who had been released by Mansfield in January 2014, was signed on a contract for the remainder of the season. He made his debut as a substitute in York's 1 – 0 home win over bottom of the table Torquay, in which Hayhurst scored the only goal in the 11th @-@ minute with an 18 yard shot that <unk> off Aaron <unk>. Middlesbrough winger Brobbel rejoined on loan until the end of the season, following an injury to Carson. York's run of successive wins ended on six matches after a 0 – 0 home draw with Burton, and this result saw York drop out of the play @-@ offs in eighth @-@ place. With the team recording six wins and one draw in March 2014, including six clean sheets, Worthington was named League Two Manager of the Month. 
 
 = = = April = = = 
 
 Pope made a number of saves as York held league leaders Rochdale to a 0 – 0 away draw, with a point being enough to lift the team back into seventh @-@ place. York were prevented from equalling a club record of eight consecutive clean sheets when Accrington scored a stoppage time equaliser in a 1 – 1 home draw, in which York had taken earlier taken the lead with a Coulson penalty. A 1 – 0 win away win over Oxford, which was decided by a second half Coulson penalty, resulted in York moving one place above their opponents and back into seventh @-@ place. York consolidated their place in a play @-@ off position after beating Bury 1 – 0 at home with a fifth @-@ minute goal scored by Lowe from a Hayhurst corner. The result meant York opened up a five @-@ point lead over eighth @-@ placed Oxford with two fixtures remaining. A place in the League Two play @-@ offs was secured following a 1 – 0 win over Newport at home, in which Coulson scored the only goal in the 77th @-@ minute with a 25 yard free kick. Pope earned a nomination for League Two Player of the Month for April 2014, having conceded only one goal in five matches in that period. 
 
 = = = May = = = 
 
 The league season concluded with an away match against divisional runners @-@ up Scunthorpe ; having gone two goals down York fought back to draw 2 – 2 with goals scored by Brobbel and Andrew. This result meant York finished the season in seventh @-@ place in League Two, and would thus play fourth @-@ placed Fleetwood in the play @-@ off semi @-@ final on the back of a 17 @-@ match unbeaten run. York lost 1 – 0 to Fleetwood in the first leg at <unk> Crescent ; the goal came from former York player <unk> Blair in the 50th @-@ minute, who scored from close range after Antoni <unk>'s shot was blocked on the line. A 0 – 0 draw away to Fleetwood in the second leg meant York were eliminated 1 – 0 on aggregate, ending the prospect of a second promotion in three seasons. At an awards night held at York Racecourse, Oyebanjo was voted <unk> of the Year for 2013 – 14. 
 
 = = Summary and aftermath = = 
 
 York mostly occupied the bottom half of the table before the turn of the year, and dropped as low as 23rd in September 2013. During February 2014 the team broke into the top half of the table and with one match left were in sixth @-@ place. York's defensive record was the third best in League Two with 41 goals conceded, bettered only by Southend ( 39 ) and Chesterfield ( 40 ). Davies made the highest number of appearances over the season, appearing in 47 of York's 52 matches. Fletcher was York's top scorer in the league and in all competitions, with 10 league goals and 13 in total. He was the only player to reach double figures, and was followed by Jarvis with nine goals. 
 After the season ended York released Tom Allan, Andrew, Dickinson, McDonald, Puri and Reed, while McGurk retired from professional football. Bowman and Oyebanjo left to sign for Torquay and Crawley Town respectively while Coulson signed a new contract with the club. York's summer signings included goalkeeper Jason <unk> from Tranmere Rovers, defenders <unk> <unk> from Dagenham, Marvin McCoy from Wycombe and Dave Winfield from Shrewsbury Town, midfielders <unk> <unk> from Mansfield, Anthony <unk> from Southend and Luke <unk> from Shrewsbury and striker Jake Hyde from <unk>. 
 
 = = Match details = = 
 
 League positions are sourced by <unk>, while the remaining information is referenced individually. 
 
 = = = Football League Two = = = 
 
 
 = = = League table ( part ) = = = 
 
 
 = = = FA Cup = = = 
 
 
 = = = League Cup = = = 
 
 
 = = = Football League Trophy = = = 
 
 
 = = = Football League Two play @-@ offs = = = 
 
 
 = = <unk> = = 
 
 
 = = = In = = = 
 
 <unk> around club names denote the player's contract with that club had expired before he joined York. 
 
 = = = Out = = = 
 
 <unk> around club names denote the player joined that club after his York contract expired. 
 
 = = = Loan in = = = 
 
 
 = = = Loan out = = = 
 
 
 = = Appearances and goals = = 
 
 Source : 
 Numbers in parentheses denote appearances as substitute. 
 Players with names struck through and marked left the club during the playing season. 
 Players with names in italics and marked * were on loan from another club for the whole of their season with York. 
 Players listed with no appearances have been in the <unk> squad but only as unused <unk>. 
 Key to positions : <unk> – <unk> ; <unk> – Defender ; <unk> – <unk> ; <unk> – Forward 
 
show_at(tls.valid, 0)
 
 = Tropical Storm <unk> ( 2008 ) = 
 
 Tropical Storm <unk> was the tenth tropical storm of the 2008 Atlantic hurricane season. <unk> developed out of a strong tropical wave which moved off the African coast on August 31. The wave quickly became organized and was declared Tropical Depression Ten while located 170 mi ( 270 km ) to the south @-@ southeast of the Cape Verde Islands on September 2. The depression was quickly upgraded to Tropical Storm <unk> around noon the same day. Over the next several days, <unk> moved in a general west @-@ northwest direction and reached its peak intensity early on September 3. Strong wind shear, some due to the outflow of Hurricane Ike, and dry air caused the storm to weaken. On September 6, the combination of wind shear, dry air, and cooling waters caused <unk> to weaken into a tropical depression. <unk> deteriorated into a remnant low shortly after as convection continued to dissipate around the storm. The low ultimately dissipated while located 520 mi ( 835 km ) east of <unk> on September 10. However, the remnant moisture led to minor flooding on the island of St. Croix. 
 
 = = Meteorological history = = 
 
 Tropical Storm <unk> formed as a tropical wave that emerged off the west coast of Africa near the end of August 2008. It tracked south of Cape Verde and slowly developed, and on September 2 the disturbance became Tropical Depression Ten while located south @-@ southeast of the Cape Verde islands. As the depression became more organized, an eye @-@ like feature developed in the upper levels of the system. The depression was upgraded to Tropical Storm <unk> six hours after forming. <unk> was located in an area which was supportive for rapid intensification but was not forecast to intensify quickly. 
 <unk> continued to intensify throughout the afternoon as the storm became more symmetrical. However, due to the location of the storm, there was a lack of accurate wind speed readings, and the National Hurricane Center was uncertain of its actual intensity. Despite the lack of wind shear around the storm, the center became slightly exposed and ceased further intensification. The storm was also heading into an area where shear was <unk> to significantly increase due to an upper @-@ level trough diving southward. Despite convection being partially removed from the center of <unk>, the storm intensified slightly in the early morning hours on September 3 as thunderstorm activity to the south of the center became more organized. The intensification was forecast to be short in duration as the trough to the north was deepening, causing the wind shear to the west to become stronger. 
 <unk> reached its peak intensity of 65 mph ( 100 km / h ) around 8 a.m. ( <unk> ) as it continued to become more organized. However, there were indications that it had already begun to weaken. <unk> towards the north was becoming restricted and arc clouds began emanating from the storm, a sign that dry air was entering the system. During the afternoon hours, the structure of <unk> began to rapidly deteriorate as strong wind shear and dry air took their toll. By the late night, the center was almost completely exposed and only a band of convection persisted near the center. 
 Despite continuing effects from the strong wind shear, a large, deep burst of convection formed in the northern <unk> of <unk>. The center was found to have shifted towards the new convection leading to an increase in intensity. The forecast showed a slight decrease in wind shear as <unk> continued westward and no change in intensity over the 5 @-@ day forecast was predicted. However, the convection decreased once more and the low became completely exposed by the late morning hours and <unk> weakened again. By the afternoon, the center of <unk> was only a <unk> of clouds, devoid of convection. During the overnight hours on September 4 into the morning of September 5, convection associated with <unk> began to <unk> somewhat, mostly to the north of the circulation, due to the strong <unk> wind shear. By mid @-@ morning, <unk> re @-@ intensified slightly due to the redevelopment of some convection. However, the redevelopment was short lived and wind shear again took its toll on <unk> by late morning. The convection around the system became <unk> from the center and <unk> weakened slightly. 
 The weakening trend continued through the afternoon as the storm was being affected by strong <unk> shear. <unk> became almost fully devoid of any convection by mid @-@ afternoon and the storm weakened to 40 mph ( 65 km / h ), barely holding on to tropical storm status. <unk> regained a small amount of convection in the late night hours, but not enough to still be classified a tropical storm. Due to the lack of convection, <unk> was downgraded to a Tropical Depression at <unk> ( <unk> ) with winds of 35 mph ( 55 km / h ). Since there was no convection around the system, it would have normally been classified a remnant low but, due to the possibility of the storm <unk> over the next several days, it was considered a tropical depression. The next morning, <unk> was downgraded to a remnant low as strong wind shear and dry air caused the demise of the storm. No redevelopment was expected with <unk> as it began to move over colder waters and remain under strong wind shear until it dissipated. 
 However, the remnant low associated with <unk> began to show signs of redevelopment during the afternoon on September 7. <unk> around the system increased significantly and the low was no longer exposed. On September 8, wind shear took over the system again. <unk> around the remnant low was torn away and the low was exposed once more. The National Hurricane Center did not state the chance of regeneration once the low became exposed. Finally, on September 9, wind shear and dry air led to the remnants of <unk> deteriorating into an open wave. However, on September 10, the remnants of <unk> redeveloped and global models picked up on the reformed system. Once more, the chance of regeneration was possible as the remnants of <unk> headed towards the Bahamas. However, on September 14, dry air and wind shear caused the remnants to dissipate entirely. 
 
 = = Impact = = 
 
 As <unk> passed to the south of the Cape Verde islands on September 2, outer rain bands produced minor rainfall, totaling around 0 @.@ 55 inches ( 14 mm ). There were no reports of damage or flooding from the rain and overall effects were minor. 
 Several days after the low dissipated, the remnant moisture from <unk> brought showers and thunderstorms to St. Croix where up to 1 in ( 25 @.@ 4 mm ) of rain fell. The heavy rains led to minor street flooding and some urban flooding. No known damage was caused by the flood. 
 

fastai库期望数据被组装成一个DataLoaders对象(即包含训练和验证数据加载器的对象)。我们可以通过使用dataloaders方法来获得一个。我们只需指定批大小和序列长度。我们将使用256的序列大小进行训练(GPT2使用序列长度1024,但并非每个人都有足够的GPU RAM来支持这一点):

bs,sl = 4,256
dls = tls.dataloaders(bs=bs, seq_len=sl)

注意,您可能需要根据您的GPU内存减少批量大小。

在fastai中,一旦我们有了DataLoaders,我们可以使用show_batch来查看数据(这里的输入是文本,而验证用的文本是向右移动一个标记后的同一文本):

dls.show_batch(max_n=2)
text text_
0 \n = Jacqueline Fernandez = \n \n Jacqueline Fernandez ( born 11 August 1985 ) is a Sri Lankan actress, former model, and the winner of the 2006 Miss Universe Sri Lanka pageant. As Miss Universe Sri Lanka she represented her country at the 2006 world Miss Universe pageant. She graduated with a degree in mass communication from the University of Sydney, and worked as a television reporter in Sri Lanka. \n While on a modelling assignment in India in 2009, Fernandez successfully auditioned for <unk> <unk>'s fantasy drama <unk>, which marked her acting debut. Fernandez'breakthrough role was in <unk> <unk>'s psychological thriller Murder 2 ( 2011 ), her first commercial success. This was followed by glamorous roles in the ensemble @-@ comedy Housefull 2 ( 2012 ) and its sequel Housefull 3, and the action thriller Race 2 ( 2013 ), all of which were box @-@ office \n = Jacqueline Fernandez = \n \n Jacqueline Fernandez ( born 11 August 1985 ) is a Sri Lankan actress, former model, and the winner of the 2006 Miss Universe Sri Lanka pageant. As Miss Universe Sri Lanka she represented her country at the 2006 world Miss Universe pageant. She graduated with a degree in mass communication from the University of Sydney, and worked as a television reporter in Sri Lanka. \n While on a modelling assignment in India in 2009, Fernandez successfully auditioned for <unk> <unk>'s fantasy drama <unk>, which marked her acting debut. Fernandez'breakthrough role was in <unk> <unk>'s psychological thriller Murder 2 ( 2011 ), her first commercial success. This was followed by glamorous roles in the ensemble @-@ comedy Housefull 2 ( 2012 ) and its sequel Housefull 3, and the action thriller Race 2 ( 2013 ), all of which were box @-@ office successes.
1 small farms in between small residential subdivisions. In the community of Freeland, M @-@ 47 runs near the <unk> International Airport off Freeland Road. North of town, M @-@ 47 leaves Midland Road and becomes a freeway near <unk> Park. The freeway section of M @-@ 47 runs through rural farm land. There is a diamond interchange with <unk> Road before the terminal interchange at US 10. \n As part of its maintenance duties, the Michigan Department of Transportation ( MDOT ) tracks the volume of traffic on the highways it maintains. This number is expressed in terms of annual average daily traffic ( AADT ), a calculation of the average traffic for a segment of roadway on any average day of the year. In 2009, the department measured a peak of 19 @,@ <unk> vehicles daily on the stretch north of <unk> Road. The section south of the farms in between small residential subdivisions. In the community of Freeland, M @-@ 47 runs near the <unk> International Airport off Freeland Road. North of town, M @-@ 47 leaves Midland Road and becomes a freeway near <unk> Park. The freeway section of M @-@ 47 runs through rural farm land. There is a diamond interchange with <unk> Road before the terminal interchange at US 10. \n As part of its maintenance duties, the Michigan Department of Transportation ( MDOT ) tracks the volume of traffic on the highways it maintains. This number is expressed in terms of annual average daily traffic ( AADT ), a calculation of the average traffic for a segment of roadway on any average day of the year. In 2009, the department measured a peak of 19 @,@ <unk> vehicles daily on the stretch north of <unk> Road. The section south of the US

另一种收集数据的方法是一次性预处理文本,只使用转换将张量解码为文本:

def tokenize(text):
    toks = tokenizer.tokenize(text)
    return tensor(tokenizer.convert_tokens_to_ids(toks))

tokenized = [tokenize(t) for t in progress_bar(all_texts)]
100.00% [662/662 00:12<00:00]

现在我们将之前的 Tokenizer 更改为如下:

class TransformersTokenizer(Transform):
    def __init__(self, tokenizer): self.tokenizer = tokenizer
    def encodes(self, x): 
        return x if isinstance(x, Tensor) else tokenize(x)
        
    def decodes(self, x): return TitledStr(self.tokenizer.decode(x.cpu().numpy()))

encodes方法中,我们仍然考虑到得到一些尚未标记化的内容的情况,以防我们想要使用该变换构建一个包含新文本的数据集。

tls = TfmdLists(tokenized, TransformersTokenizer(tokenizer), splits=splits, dl_type=LMDataLoader)
dls = tls.dataloaders(bs=bs, seq_len=sl)

我们可以检查它是否仍然正常工作以展示目的:

dls.show_batch(max_n=2)
text text_
0 \n = Otra Nota = \n \n Otra Nota ( English : Another Note ) is the debut album by American singer Marc Anthony that was released on January 26, 1993, by RMM Records. Produced by Sergio George, it was the first album by Anthony to record in salsa after starting his career as a freestyle musician. Recording of the album began after Anthony asked RMM president Ralph Mercado to record Juan Gabriel's " Hasta Que Te Conocí " in salsa after hearing it on the radio during a taxi ride. Recorded on a low budget, the album peaked at No. 2 on the Billboard Tropical Albums chart and reached No. 30 on the Billboard Top Latin Albums chart. \n The album was well received by critics who complimented George's production and Anthony's youthful voice. Anthony received two awards for " Best New Artists " at the Billboard Latin \n = Otra Nota = \n \n Otra Nota ( English : Another Note ) is the debut album by American singer Marc Anthony that was released on January 26, 1993, by RMM Records. Produced by Sergio George, it was the first album by Anthony to record in salsa after starting his career as a freestyle musician. Recording of the album began after Anthony asked RMM president Ralph Mercado to record Juan Gabriel's " Hasta Que Te Conocí " in salsa after hearing it on the radio during a taxi ride. Recorded on a low budget, the album peaked at No. 2 on the Billboard Tropical Albums chart and reached No. 30 on the Billboard Top Latin Albums chart. \n The album was well received by critics who complimented George's production and Anthony's youthful voice. Anthony received two awards for " Best New Artists " at the Billboard Latin Music
1 reactions and prejudices ", which leaves no room for any further interest. Donoghue complained that Lessing has not made up her mind on whether her characters are " the salt of the earth or its <unk> ". In a review in the Chicago Tribune, Kuehn felt that the work has little impact and is not memorable. He said Lessing's real interest is character development, but complained that the characters are " trivial or two @-@ dimensional or crippled by self @-@ <unk> ". \n The Good Terrorist was shortlisted for the 1985 Booker Prize, and in 1986 won the <unk> Prize and the <unk> Smith Literary Award. In 2007 Lessing was awarded the Nobel Prize in Literature for being " part of both the history of literature and living literature ". In the award ceremony speech by Swedish writer Per <unk>, The Good Terrorist was cited as " an and prejudices ", which leaves no room for any further interest. Donoghue complained that Lessing has not made up her mind on whether her characters are " the salt of the earth or its <unk> ". In a review in the Chicago Tribune, Kuehn felt that the work has little impact and is not memorable. He said Lessing's real interest is character development, but complained that the characters are " trivial or two @-@ dimensional or crippled by self @-@ <unk> ". \n The Good Terrorist was shortlisted for the 1985 Booker Prize, and in 1986 won the <unk> Prize and the <unk> Smith Literary Award. In 2007 Lessing was awarded the Nobel Prize in Literature for being " part of both the history of literature and living literature ". In the award ceremony speech by Swedish writer Per <unk>, The Good Terrorist was cited as " an in

微调模型

HuggingFace模型将返回一个包含实际预测和一些额外激活的元组(如果我们想在某些正则化方案中使用它们)。为了在fastai训练循环内工作,我们需要使用Callback来丢弃这些:我们使用这些来改变训练循环的行为。

在这里,我们需要编写事件after_pred并将self.learn.pred(包含将传递给损失函数的预测)替换为其第一个元素。在回调中,有一个快捷方式可以让你访问任何底层Learner属性,因此我们可以写self.pred[0]而不是self.learn.pred[0]。该快捷方式仅适用于读取访问,而不适用于写入,因此在右侧我们必须写self.learn.pred(否则我们将会在Callback中设置一个pred属性)。

class DropOutput(Callback):
    def after_pred(self): self.learn.pred = self.pred[0]

当然,我们可以使这个过程变得更复杂,并使用预测元组的另一部分给损失加上一些惩罚,比如 RNNRegularizer

现在,我们准备创建我们的 Learner,这是一个 fastai 对象,组合了数据、模型和损失函数,并处理模型的训练或推理。由于我们处在语言模型的设置中,我们将困惑度作为一个指标,并且需要使用我们刚刚定义的回调。最后,我们使用混合精度以节省尽可能多的内存(如果你有一块现代的 GPU,这也会加快训练速度):

learn = Learner(dls, model, loss_func=CrossEntropyLossFlat(), cbs=[DropOutput], metrics=Perplexity()).to_fp16()

我们可以检查模型的表现,而无需进行任何微调步骤(剧透,效果相当不错!)

learn.validate()
(#2) [3.2537169456481934,25.88637924194336]

这列出了验证损失和指标(因此26.6的困惑度相当惊人)。

现在我们有了一个 Learner,可以利用fastai的所有训练循环功能:学习率查找器、1cycle训练等…

learn.lr_find()
SuggestedLRs(lr_min=0.017378008365631102, lr_steep=0.14454397559165955)

学习率寻找器曲线建议选择介于 1e-4 和 1e-3 之间的值。

learn.fit_one_cycle(1, 1e-4)
epoch train_loss valid_loss perplexity time
0 2.986238 2.721945 15.209874 04:56

现在只进行了一次微调且没有过多的正则化,我们的模型并没有真正改进,因为它已经非常优秀。为了查看一些生成的文本,让我们取一个看起来像维基百科文章的提示:

df_valid.head(1)
0
0 \n = Tropical Storm <unk> ( 2008 ) = \n \n Tropical Storm <unk> was the tenth tropical storm of the 2008 Atlantic hurricane season . <unk> developed out of a strong tropical wave which moved off the African coast on August 31 . The wave quickly became organized and was declared Tropical Depression Ten while located 170 mi ( 270 km ) to the south @-@ southeast of the Cape Verde Islands on September 2 . The depression was quickly upgraded to Tropical Storm <unk> around noon the same day . Over the next several days , <unk> moved in a general west @-@ northwest direction and reached its peak...

文章似乎以换行符开头,标题位于等号之间,因此我们将模仿这一点:

prompt = "\n = Unicorn = \n \n A unicorn is a magical creature with a rainbow tail and a horn"

提示需要进行标记化和数值化,因此我们在使用模型的generate方法之前,使用与之前相同的函数来执行此操作。

prompt_ids = tokenizer.encode(prompt)
inp = tensor(prompt_ids)[None].cuda()
inp.shape
torch.Size([1, 21])
preds = learn.model.generate(inp, max_length=40, num_beams=5, temperature=1.5)
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
tokenizer.decode(preds[0].cpu().numpy())
'\n = Unicorn = \n \n A unicorn is a magical creature with a rainbow tail and a horn @-@ shaped head. It is a member of the <unk> family of <unk'

完 -