from nbdev.showdoc import show_doc
从头构建一个 DataBlock
from fastai.data.all import *
from fastai.vision.all import *
= untar_data(URLs.PETS) path
= get_image_files(path/"images") fnames
让我们从一个空的 DataBlock
= DataBlock() dblock
只是一个组装数据的蓝图。它在您传递给它一个源之前不执行任何操作。然后您可以选择通过使用 DataBlock.datasets
或 DataBlock.dataloaders
方法将该源转换为 Datasets
或 DataLoaders
。由于我们还没有对数据进行任何准备以获取批次,因此 dataloaders
方法在这里会失败,但我们可以查看它是如何在 Datasets
= dblock.datasets(fnames)
dsets 0] dsets.train[
默认情况下,数据块 API 假设我们有一个输入和一个目标,这就是我们看到文件名重复两次的原因。
我们可以做的第一件事是使用 get_items
= DataBlock(get_items = get_image_files) dblock
= dblock.datasets(path/"images")
dsets 0] dsets.train[
def label_func(fname):
return "cat" if fname.name[0].isupper() else "dog"
然后,我们可以告诉我们的数据块使用它来标注我们的目标,通过将其作为 get_y
= DataBlock(get_items = get_image_files,
dblock = label_func)
= dblock.datasets(path/"images")
dsets 0] dsets.train[
现在我们的输入和目标已经准备好了,我们可以指定类型来告诉数据块 API 我们的输入是图像,目标是类别。类型通过数据块 API 中的块表示,这里我们使用 ImageBlock
和 CategoryBlock
= DataBlock(blocks = (ImageBlock, CategoryBlock),
dblock = get_image_files,
get_items = label_func)
= dblock.datasets(path/"images")
dsets 0] dsets.train[
(PILImage mode=RGB size=361x500, TensorCategory(1))
我们可以看到 DataBlock
是如何自动添加必要的变换以打开图像,或者它是如何将名称 “dog” 改为索引 1(使用特殊的张量类型 TensorCategory(1))。为此,它创建了一个从类别到索引的映射,称为 “vocab”,我们可以通过以下方式访问它:
['cat', 'dog']
= DataBlock(blocks = (ImageBlock, CategoryBlock),
dblock = get_image_files,
get_items = label_func,
get_y = RandomSplitter())
= dblock.datasets(path/"images")
dsets 0] dsets.train[
(PILImage mode=RGB size=320x480, TensorCategory(0))
= DataBlock(blocks = (ImageBlock, CategoryBlock),
dblock = get_image_files,
get_items = label_func,
get_y = RandomSplitter(),
splitter = Resize(224)) item_tfms
通过这个调整,我们现在能够将项目批量处理在一起,并最终可以调用 dataloaders
将我们的 DataBlock
转换为 DataLoaders
= dblock.dataloaders(path/"images")
dls dls.show_batch()
- 你的输入/目标是什么类型?这里是图像和类别
- 你的数据在哪里?这里在子文件夹中的文件名中
- 输入需要应用什么吗?这里不需要
- 目标需要应用什么吗?这里是
函数 - 如何分割数据?这里是随机分割
- 我们需要对形成的项应用什么吗?这里是调整大小
- 我们需要对形成的批次应用什么吗?这里不需要
= DataBlock(blocks = (ImageBlock, CategoryBlock),
dblock = get_image_files,
get_items = label_func,
get_y = RandomSplitter(),
splitter = Resize(224)) item_tfms
对于两个答案为“否”的问题,如果答案不同,我们将传递的相应参数是 get_x
和 batch_tfms
from fastai.vision.all import *
MNIST 是一个包含从 0 到 9 的手写数字的数据集。我们可以通过回答以下问题非常容易地在数据块 API 中加载它:
- 我们的输入和目标是什么类型?黑白图像和标签。
- 数据在哪里?在子文件夹中。
- 我们如何知道一个样本是训练集还是验证集?通过查看祖父文件夹。
- 我们如何知道一张图像的标签?通过查看父文件夹。
在 API 中,这些答案可以转化为:
= DataBlock(blocks=(ImageBlock(cls=PILImageBW), CategoryBlock),
mnist =get_image_files,
splitter=parent_label) get_y
我们的类型变成了块:一个用于图像(使用黑白 PILImageBW
类),一个用于类别。通过 get_image_files
函数在子文件夹中搜索所有图像文件名。使用 GrandparentSplitter
进行训练/验证集的拆分。获取我们的目标(通常称为 y
)的函数是 parent_label
要了解 fastai 库提供的用于读取、标记或拆分的对象,可以查看 data.transforms
数据块本身只是一个蓝图。它不执行任何操作,也不检查错误。您必须将数据的源提供给它以实际收集内容。这是通过 .dataloaders
= mnist.dataloaders(untar_data(URLs.MNIST_TINY))
dls =9, figsize=(4,4)) dls.show_batch(max_n
如果在上一步出现了问题,或者你只想了解幕后发生了什么,可以使用 summary
Setting-up type transforms pipelines
Collecting items from /home/jhoward/.fastai/data/mnist_tiny
Found 2856 items
2 datasets of sizes 1418,1398
Setting up Pipeline: PILBase.create
Setting up Pipeline: parent_label -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
Building one sample
Pipeline: PILBase.create
starting from
applying PILBase.create gives
PILImageBW mode=L size=28x28
Pipeline: parent_label -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
starting from
applying parent_label gives
applying Categorize -- {'vocab': None, 'sort': True, 'add_na': False} gives
Final sample: (PILImageBW mode=L size=28x28, TensorCategory(1))
Collecting items from /home/jhoward/.fastai/data/mnist_tiny
Found 2856 items
2 datasets of sizes 1418,1398
Setting up Pipeline: PILBase.create
Setting up Pipeline: parent_label -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
Setting up after_item: Pipeline: ToTensor
Setting up before_batch: Pipeline:
Setting up after_batch: Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}
Building one batch
Applying item_tfms to the first sample:
Pipeline: ToTensor
starting from
(PILImageBW mode=L size=28x28, TensorCategory(1))
applying ToTensor gives
(TensorImageBW of size 1x28x28, TensorCategory(1))
Adding the next 3 samples
No before_batch transform to apply
Collating items in a batch
Applying batch_tfms to the batch built
Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1}
starting from
(TensorImageBW of size 4x1x28x28, TensorCategory([1, 1, 1, 1], device='cuda:0'))
applying IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} gives
(TensorImageBW of size 4x1x28x28, TensorCategory([1, 1, 1, 1], device='cuda:0'))
- 我们的输入和目标是什么类型?图像和标签。
- 数据存放在哪里?在子文件夹中。
- 我们如何知道一个样本是在训练集还是验证集中?我们将进行随机划分。
- 我们如何知道图像的标签?通过查看父文件夹。
- 我们是否想对给定的样本应用一个函数?是的,我们需要将所有图像调整为给定的大小。
- 我们是否想在批次创建后对其应用一个函数?是的,我们想要数据增强。
= DataBlock(blocks=(ImageBlock, CategoryBlock),
pets =get_image_files,
splitter=Pipeline([attrgetter("name"), RegexLabeller(pat = r'^(.*)_\d+.jpg$')]),
item_tfms=aug_transforms()) batch_tfms
= pets.dataloaders(untar_data(URLs.PETS)/"images")
dls =9) dls.show_batch(max_n
= untar_data(URLs.PASCAL_2007)
pascal_source = pd.read_csv(pascal_source/"train.csv") df
fname | labels | is_valid | |
0 | 000005.jpg | chair | True |
1 | 000007.jpg | car | True |
2 | 000009.jpg | horse person | True |
3 | 000012.jpg | car | False |
4 | 000016.jpg | bicycle | True |
将其放入 DataBlock
- 我们的输入和目标是什么类型?图像和多个标签。
- 数据在哪里?在一个数据框中。
- 我们如何知道样本是属于训练集还是验证集?我们数据框的一列。
- 我们如何获取图像?通过查看 fname 列。
- 我们如何知道图像的标签?通过查看 labels 列。
- 我们想对给定样本应用一个函数吗?是的,我们需要将所有内容调整为给定的大小。
- 我们想对创建后的批次应用一个函数吗?是的,我们想进行数据增强。
请注意,与之前相比多了一个问题:我们不需要在这里使用 get_items
函数,因为我们已经将所有数据集中在一个地方。但我们需要对原始数据框做一些处理,以获取我们的输入,读取第一列,并在文件名前添加适当的文件夹。这就是我们作为 get_x
= DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
pascal =ColSplitter(),
splitter=ColReader(0, pref=pascal_source/"train"),
get_x=ColReader(1, label_delim=' '),
item_tfms=aug_transforms()) batch_tfms
= pascal.dataloaders(df)
dls dls.show_batch()
另一种方法是直接使用 get_x
和 get_y
= DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
pascal =ColSplitter(),
splitter=lambda x:pascal_source/"train"/f'{x[0]}',
get_x=lambda x:x[1].split(' '),
= pascal.dataloaders(df)
dls dls.show_batch()
= DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
pascal =ColSplitter(),
splitter=lambda o:f'{pascal_source}/train/'+o.fname,
get_x=lambda o:o.labels.split(),
= pascal.dataloaders(df)
dls dls.show_batch()
避免遍历数据框的行(这可能需要很长时间)最有效的方法是使用 from_columns
方法。它将使用 get_items
将列转换为 numpy 数组。缺点是,由于在提取相关列后我们失去了数据框,无法再使用 ColSplitter
。在这里,我们在手动从数据框中提取验证集的索引后使用了 IndexSplitter
def _pascal_items(x): return (
f'{pascal_source}/train/'+x.fname, x.labels.str.split())
= df[df['is_valid']].index.values
= DataBlock.from_columns(blocks=(ImageBlock, MultiCategoryBlock),
pascal =_pascal_items,
item_tfms=aug_transforms()) batch_tfms
= pascal.dataloaders(df)
dls dls.show_batch()
让我们看看每种情况的示例,以及如何在每种情况下使用数据块 API。
我们将使用 CamVid 数据集 的一个小子集作为我们的示例。
= untar_data(URLs.CAMVID_TINY) path
- 我们的输入和目标类型是什么?图像和分割掩码。
- 数据在哪里?在子文件夹中。
- 我们如何知道一个样本是在训练集还是验证集中?我们将进行随机拆分。
- 我们如何知道图像的标签?通过查看“labels”文件夹中的对应文件。
- 我们想在创建批处理后对其应用函数吗?是的,我们想进行数据增强。
= DataBlock(blocks=(ImageBlock, MaskBlock(codes = np.loadtxt(path/'codes.txt', dtype=str))),
camvid =get_image_files,
splitter=lambda o: path/'labels'/f'{o.stem}_P{o.suffix}',
get_y=aug_transforms()) batch_tfms
是由 codes
= camvid.dataloaders(path/"images")
dls dls.show_batch()
对于这个例子,我们将使用 BiWi Kinect 头部姿态数据集 的一个小样本。它包含了人们的照片,任务是预测他们头部的中心位置。我们已将这个小数据集保存为字典文件,命名为 center:
= untar_data(URLs.BIWI_SAMPLE)
biwi_source = load_pickle(biwi_source/'centers.pkl') fn2ctr
- 我们的输入和目标类型是什么?图像和点。
- 数据在哪里?在子文件夹中。
- 我们如何知道样本是在训练集还是验证集中?我们将进行随机划分。
- 我们如何知道图像的标签?通过使用
字典。 - 我们需要在创建批量后应用一个函数吗?是的,我们希望进行数据增强。
= DataBlock(blocks=(ImageBlock, PointBlock),
biwi =get_image_files,
splitter=lambda o:fn2ctr[o.name].flip(0),
get_y=aug_transforms()) batch_tfms
我们可以用它来创建一个 DataLoaders
= biwi.dataloaders(biwi_source)
dls =9) dls.show_batch(max_n
= untar_data(URLs.COCO_TINY)
coco_source = get_annotations(coco_source/'train.json')
images, lbl_bbox = dict(zip(images, lbl_bbox)) img2bbox
- 我们的输入和目标的类型是什么?图像和边界框。
- 数据在哪里?在子文件夹中。
- 我们如何知道一个样本是在训练集还是验证集中?我们将采取随机划分。
- 我们如何知道图像的标签是什么?通过使用
字典。 - 我们想对给定样本应用一个函数吗?是的,我们需要将所有内容调整为给定大小。
- 我们想在创建批次后对批次应用函数吗?是的,我们希望进行数据增强。
= DataBlock(blocks=(ImageBlock, BBoxBlock, BBoxLblBlock),
coco =get_image_files,
splitter=[lambda o: img2bbox[o.name][0], lambda o: img2bbox[o.name][1]],
batch_tfms=1) n_inp
注意,我们提供三种类型,因为我们有两个目标:边界框和标签。这就是为什么我们在最后传递 n_inp=1
这也是为什么我们传递一个列表给 get_y
:由于我们有两个目标,我们必须告诉库如何为每个目标进行标记(如果你不想为其之一做任何事情,可以使用 noop
= coco.dataloaders(coco_source)
dls =9) dls.show_batch(max_n
我们将展示两个示例:语言模型和文本分类。请注意,使用数据块 API,您可以将之前的多标签示例调整为输入为文本的问题。
from fastai.text.all import *
= untar_data(URLs.IMDB_SAMPLE)
path = pd.read_csv(path/'texts.csv')
df df.head()
label | text | is_valid | |
0 | negative | Un-bleeping-believable! Meg Ryan doesn't even look her usual pert lovable self in this, which normally makes me forgive her shallow ticky acting schtick. Hard to believe she was the producer on this dog. Plus Kevin Kline: what kind of suicide trip has his career been on? Whoosh... Banzai!!! Finally this was directed by the guy who did Big Chill? Must be a replay of Jonestown - hollywood style. Wooofff! | False |
1 | positive | This is a extremely well-made film. The acting, script and camera-work are all first-rate. The music is good, too, though it is mostly early in the film, when things are still relatively cheery. There are no really superstars in the cast, though several faces will be familiar. The entire cast does an excellent job with the script.<br /><br />But it is hard to watch, because there is no good end to a situation like the one presented. It is now fashionable to blame the British for setting Hindus and Muslims against each other, and then cruelly separating them into two countries. There is som... | False |
2 | negative | Every once in a long while a movie will come along that will be so awful that I feel compelled to warn people. If I labor all my days and I can save but one soul from watching this movie, how great will be my joy.<br /><br />Where to begin my discussion of pain. For starters, there was a musical montage every five minutes. There was no character development. Every character was a stereotype. We had swearing guy, fat guy who eats donuts, goofy foreign guy, etc. The script felt as if it were being written as the movie was being shot. The production value was so incredibly low that it felt li... | False |
3 | positive | Name just says it all. I watched this movie with my dad when it came out and having served in Korea he had great admiration for the man. The disappointing thing about this film is that it only concentrate on a short period of the man's life - interestingly enough the man's entire life would have made such an epic bio-pic that it is staggering to imagine the cost for production.<br /><br />Some posters elude to the flawed characteristics about the man, which are cheap shots. The theme of the movie "Duty, Honor, Country" are not just mere words blathered from the lips of a high-brassed offic... | False |
4 | negative | This movie succeeds at being one of the most unique movies you've seen. However this comes from the fact that you can't make heads or tails of this mess. It almost seems as a series of challenges set up to determine whether or not you are willing to walk out of the movie and give up the money you just paid. If you don't want to feel slighted you'll sit through this horrible film and develop a real sense of pity for the actors involved, they've all seen better days, but then you realize they actually got paid quite a bit of money to do this and you'll lose pity for them just like you've alr... | False |
- 我们的输入和目标类型是什么?文本,我们实际上没有目标,因为目标是从输入中衍生出来的。
- 数据在哪里?在一个数据框中。
- 我们如何知道一个样本是在训练集中还是验证集中?我们有一个
列。 - 我们如何获取我们的输入?在
= DataBlock(blocks=TextBlock.from_df('text', is_lm=True),
imdb_lm =ColReader('text'),
get_x=ColSplitter()) splitter
与其他 TransformBlock
有一点不同:为了能够在设置过程中高效地对所有文本进行分词,您需要使用类方法 from_folder
或 from_df
的分词过程将分词输入放入名为 text
的 ColReader
将始终引用 text
然后,我们可以通过将数据框传递给 dataloaders
方法来将数据放入 DataLoaders
= imdb_lm.dataloaders(df, bs=64, seq_len=72)
dls =6) dls.show_batch(max_n
text | text_ | |
0 | xxbos xxmaj not sure if it was right or wrong , but i read thru the other comments before watching the xxunk have to say i disagree with most of the negative comments or problems people have had with it . \n\n xxmaj as a first time " lone xxmaj wolf " director / producer , i like to see things that i can xxunk to , not necessarily from the pro | xxmaj not sure if it was right or wrong , but i read thru the other comments before watching the xxunk have to say i disagree with most of the negative comments or problems people have had with it . \n\n xxmaj as a first time " lone xxmaj wolf " director / producer , i like to see things that i can xxunk to , not necessarily from the pro 's |
1 | and each and every actor . xxmaj it 's like they all think they 're the main part of the movie and scream " notice xxup me ! " over and over again . xxmaj the bad guy has his bad - guy music going on and says sinister bad - guy - like things , just in case you did n't quite catch on . xxmaj the good guy does brave | each and every actor . xxmaj it 's like they all think they 're the main part of the movie and scream " notice xxup me ! " over and over again . xxmaj the bad guy has his bad - guy music going on and says sinister bad - guy - like things , just in case you did n't quite catch on . xxmaj the good guy does brave and |
2 | innocently helps the xxmaj confederate hide . xxmaj later , when he returns to kill her father , the little girl 's xxunk is remembered . a sweet , small story from director xxup xxunk . xxmaj griffith . xxmaj location footage and humanity are xxunk displayed . \n\n▁ xxrep 4 * xxmaj in the xxmaj border xxmaj states ( 6 / 13 / 10 ) xxup xxunk . xxmaj griffith ~ | helps the xxmaj confederate hide . xxmaj later , when he returns to kill her father , the little girl 's xxunk is remembered . a sweet , small story from director xxup xxunk . xxmaj griffith . xxmaj location footage and humanity are xxunk displayed . \n\n▁ xxrep 4 * xxmaj in the xxmaj border xxmaj states ( 6 / 13 / 10 ) xxup xxunk . xxmaj griffith ~ xxmaj |
3 | when they real winner should of been xxmaj xxunk xxmaj fiennes for " sunshine " . xxmaj if you have n't seen this movie yet , watch it and you 'll agree . " eyes xxmaj wide xxmaj shut " when released xxunk no nominations . xxmaj and as far as this year goes , well , the bad choices were all over the place ! xxmaj xxunk xxmaj xxunk gets no | they real winner should of been xxmaj xxunk xxmaj fiennes for " sunshine " . xxmaj if you have n't seen this movie yet , watch it and you 'll agree . " eyes xxmaj wide xxmaj shut " when released xxunk no nominations . xxmaj and as far as this year goes , well , the bad choices were all over the place ! xxmaj xxunk xxmaj xxunk gets no " |
4 | xxmaj in this case , however , when combined with the moody atmosphere , and the fact that the small town of xxmaj red xxmaj rock seems almost empty of normal daily life , the coincidences and unlikely timing suggest a story that , beyond " xxunk " , is … surreal . xxmaj it 's almost as if fate deliberately xxunk with improbable events so as to force xxmaj michael to | in this case , however , when combined with the moody atmosphere , and the fact that the small town of xxmaj red xxmaj rock seems almost empty of normal daily life , the coincidences and unlikely timing suggest a story that , beyond " xxunk " , is … surreal . xxmaj it 's almost as if fate deliberately xxunk with improbable events so as to force xxmaj michael to come |
5 | is not over the top and enough twists and turns to keep you interested until the end . \n\n xxmaj well directed , well acted and a good story . xxbos xxmaj not the worst movie xxmaj i 've seen but definitely not very good either . i myself am a paintball player , used to play airball a lot and going from woods to airball is quite a large change . | not over the top and enough twists and turns to keep you interested until the end . \n\n xxmaj well directed , well acted and a good story . xxbos xxmaj not the worst movie xxmaj i 've seen but definitely not very good either . i myself am a paintball player , used to play airball a lot and going from woods to airball is quite a large change . xxmaj |
- 输入和目标的类型是什么?文本和类别。
- 数据在哪里?在一个数据框中。
- 我们如何知道一个样本是在训练集还是验证集中?我们有一个
列。 - 我们如何获取输入?在
列中。 - 我们如何获取目标?在
= DataBlock(blocks=(TextBlock.from_df('text', seq_len=72, vocab=dls.vocab), CategoryBlock),
imdb_clas =ColReader('text'),
get_y=ColSplitter()) splitter
如前面的示例中,我们使用类方法构建一个 TextBlock
。我们可以传入语言模型的词汇(对于 ULMFit 方法非常有用)。我们还展示了 seq_len
参数(默认为 72),只是因为你需要确保在这里和你的 text_classifier_learner
您需要确保在稍后定义的 TextBlock
和 Learner
中使用相同的 seq_len
= imdb_clas.dataloaders(df, bs=64)
dls dls.show_batch()
text | category | |
0 | xxbos xxmaj raising xxmaj victor xxmaj vargas : a xxmaj review \n\n xxmaj you know , xxmaj raising xxmaj victor xxmaj vargas is like sticking your hands into a big , xxunk bowl of xxunk . xxmaj it 's warm and gooey , but you 're not sure if it feels right . xxmaj try as i might , no matter how warm and gooey xxmaj raising xxmaj victor xxmaj vargas became i was always aware that something did n't quite feel right . xxmaj victor xxmaj vargas suffers from a certain xxunk on the director 's part . xxmaj apparently , the director thought that the ethnic backdrop of a xxmaj latino family on the lower east side , and an xxunk storyline would make the film critic proof . xxmaj he was right , but it did n't fool me . xxmaj raising xxmaj victor xxmaj vargas is | negative |
1 | xxbos xxup the xxup shop xxup around xxup the xxup corner is one of the xxunk and most feel - good romantic comedies ever made . xxmaj there 's just no getting around that , and it 's hard to actually put one 's feeling for this film into words . xxmaj it 's not one of those films that tries too hard , nor does it come up with the xxunk possible scenarios to get the two protagonists together in the end . xxmaj in fact , all its charm is xxunk , contained within the characters and the setting and the plot … which is highly believable to xxunk . xxmaj it 's easy to think that such a love story , as beautiful as any other ever told , * could * happen to you … a feeling you do n't often get from other romantic comedies | positive |
2 | xxbos xxmaj now that xxmaj che(2008 ) has finished its relatively short xxmaj australian cinema run ( extremely limited xxunk screen in xxmaj xxunk , after xxunk ) , i can xxunk join both xxunk of " at xxmaj the xxmaj movies " in taking xxmaj steven xxmaj soderbergh to task . \n\n xxmaj it 's usually satisfying to watch a film director change his style / subject , but xxmaj soderbergh 's most recent stinker , xxmaj the xxmaj girlfriend xxmaj xxunk ) , was also missing a story , so narrative ( and editing ? ) seem to suddenly be xxmaj soderbergh 's main challenge . xxmaj strange , after 20 - odd years in the business . xxmaj he was probably never much good at narrative , just xxunk it well inside " edgy " projects . \n\n xxmaj none of this excuses him this present , | negative |
3 | xxbos xxmaj this film sat on my xxmaj xxunk for weeks before i watched it . i xxunk a self - indulgent xxunk flick about relationships gone bad . i was wrong ; this was an xxunk xxunk into the screwed - up xxunk of xxmaj new xxmaj xxunk . \n\n xxmaj the format is the same as xxmaj max xxmaj xxunk ' " la xxmaj xxunk , " based on a play by xxmaj arthur xxmaj xxunk , who is given an " inspired by " credit . xxmaj it starts from one person , a prostitute , standing on a street corner in xxmaj brooklyn . xxmaj she is picked up by a home contractor , who has sex with her on the hood of a car , but ca n't come . xxmaj he refuses to pay her . xxmaj when he 's off xxunk , she | positive |
4 | xxbos i really wanted to love this show . i truly , honestly did . \n\n xxmaj for the first time , gay viewers get their own version of the " the xxmaj bachelor " . xxmaj with the help of his obligatory " hag " xxmaj xxunk , xxmaj james , a good looking , well - to - do thirty - something has the chance of love with 15 suitors ( or " mates " as they are referred to in the show ) . xxmaj the only problem is half of them are straight and xxmaj james does n't know this . xxmaj if xxmaj james picks a gay one , they get a trip to xxmaj new xxmaj zealand , and xxmaj if he picks a straight one , straight guy gets $ 25 , xxrep 3 0 . xxmaj how can this not be fun | negative |
5 | xxbos xxmaj many neglect that this is n't just a classic due to the fact that it 's the first 3d game , or even the first xxunk - up . xxmaj it 's also one of the first xxunk games , one of the xxunk definitely the first ) truly claustrophobic games , and just a pretty well - xxunk gaming experience in general . xxmaj with graphics that are terribly dated today , the game xxunk you into the role of xxunk even * think * xxmaj i 'm going to attempt spelling his last name ! ) , an xxmaj american xxup xxunk . caught in an underground bunker . xxmaj you fight and search your way through xxunk in order to achieve different xxunk for the six xxunk , let 's face it , most of them are just an excuse to hand you a weapon | positive |
6 | xxbos xxmaj the xxmaj blob starts with one of the most bizarre theme songs ever , xxunk by an uncredited xxmaj burt xxmaj xxunk of all people ! xxmaj you really have to hear it to believe it , xxmaj the xxmaj blob may be worth watching just for this song alone & my user comment summary is just a little taste of the classy lyrics … xxmaj after this xxunk opening credits sequence xxmaj the xxmaj blob introduces us , the viewer that is , to xxmaj steve xxmaj xxunk ( steve mcqueen as xxmaj steven mcqueen ) & his girlfriend xxmaj jane xxmaj martin ( xxunk xxmaj xxunk ) who are xxunk on their own somewhere & witness what looks like a meteorite falling to xxmaj earth in nearby woods . xxmaj an old man ( xxunk xxmaj xxunk as xxmaj xxunk xxmaj xxunk ) who lives in | negative |
7 | xxbos xxmaj the year 2005 saw no xxunk than 3 filmed productions of xxup h. xxup g. xxmaj wells ' great novel , " war of the xxmaj worlds " . xxmaj this is perhaps the least well - known and very probably the best of them . xxmaj no other version of xxunk has ever attempted not only to present the story very much as xxmaj wells wrote it , but also to create the atmosphere of the time in which it was supposed to take place : the last year of the 19th xxmaj century , 1900 …▁ using xxmaj wells ' original setting , in and near xxmaj xxunk , xxmaj england . \n\n imdb seems xxunk to what they regard as " spoilers " . xxmaj that might apply with some films , where the ending might actually be a surprise , but with regard to | positive |
8 | xxbos xxmaj well , what can i say . \n\n " what the xxmaj xxunk do we xxmaj know " has achieved the nearly impossible - leaving behind such masterpieces of the genre as " the xxmaj xxunk " , " the xxmaj xxunk xxmaj master " , " xxunk " , and so fourth , it will go down in history as the single worst movie i have ever seen in its xxunk . xxmaj and that , ladies and gentlemen , is impressive indeed , for i have seen many a bad movie . \n\n xxmaj this masterpiece of modern cinema consists of two xxunk parts , xxunk between a silly and contrived plot about an extremely annoying photographer , abandoned by her husband and forced to take anti - xxunk to survive , and a bunch of talking heads going on about how quantum physics supposedly xxunk | negative |
from fastai.tabular.core import *
= untar_data(URLs.ADULT_SAMPLE)
adult_source = pd.read_csv(adult_source/'adult.csv')
df df.head()
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | salary | |
0 | 49 | Private | 101320 | Assoc-acdm | 12.0 | Married-civ-spouse | NaN | Wife | White | Female | 0 | 1902 | 40 | United-States | >=50k |
1 | 44 | Private | 236746 | Masters | 14.0 | Divorced | Exec-managerial | Not-in-family | White | Male | 10520 | 0 | 45 | United-States | >=50k |
2 | 38 | Private | 96185 | HS-grad | NaN | Divorced | NaN | Unmarried | Black | Female | 0 | 0 | 32 | United-States | <50k |
3 | 38 | Self-emp-inc | 112847 | Prof-school | 15.0 | Married-civ-spouse | Prof-specialty | Husband | Asian-Pac-Islander | Male | 0 | 0 | 40 | United-States | >=50k |
4 | 42 | Self-emp-not-inc | 82297 | 7th-8th | NaN | Married-civ-spouse | Other-service | Wife | Black | Female | 0 | 0 | 50 | United-States | <50k |
= ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cat_names = ['age', 'fnlwgt', 'education-num'] cont_names
= [Categorify, FillMissing, Normalize] procs
= RandomSplitter()(range_of(df)) splits
然后所有内容都放入一个 TabularPandas
= TabularPandas(df, procs, cat_names, cont_names, y_names="salary", splits=splits, y_block=CategoryBlock) to
我们将 y_block=CategoryBlock
= to.dataloaders()
dls dls.show_batch()
workclass | education | marital-status | occupation | relationship | race | education-num_na | age | fnlwgt | education-num | salary | |
0 | Self-emp-not-inc | Prof-school | Never-married | Prof-specialty | Not-in-family | White | False | 34.0 | 204374.999924 | 15.0 | >=50k |
1 | Private | Some-college | Never-married | Adm-clerical | Not-in-family | White | False | 62.0 | 141307.999756 | 10.0 | <50k |
2 | Private | Assoc-acdm | Never-married | Other-service | Not-in-family | White | False | 23.0 | 152188.999004 | 12.0 | <50k |
3 | Private | HS-grad | Divorced | Craft-repair | Unmarried | White | False | 38.0 | 27407.999090 | 9.0 | <50k |
4 | Private | Bachelors | Never-married | Prof-specialty | Not-in-family | White | False | 32.0 | 340917.004812 | 13.0 | >=50k |
5 | Private | Bachelors | Never-married | Prof-specialty | Not-in-family | White | False | 22.0 | 153515.999598 | 13.0 | <50k |
6 | Self-emp-not-inc | Doctorate | Never-married | Prof-specialty | Not-in-family | White | False | 46.0 | 165754.000335 | 16.0 | <50k |
7 | Private | Masters | Married-civ-spouse | Prof-specialty | Husband | White | False | 33.0 | 202050.999896 | 14.0 | <50k |
8 | Private | Assoc-acdm | Divorced | Sales | Unmarried | White | False | 40.0 | 197919.000079 | 12.0 | <50k |
9 | ? | Some-college | Never-married | ? | Own-child | White | False | 18.0 | 264924.000434 | 10.0 | <50k |
